Nucleic acid constructs for site-specific homologous recombination
FIELD OF THE INVENTION
[0001] This invention relates generally to nucleic acid constructs and methods for targeting the integration of an exogenous nucleic acid molecule to a specific location within the genome of a host cell. More particularly, the present invention relates to targeting constructs and methods which permit the expression of a selectable marker gene when the exogenous nucleic acid molecule is integrated into the genome via site-specific homologous recombination but which inhibit or suppress the expression of the selectable marker gene when the exogenous nucleic acid molecule is integrated into the host cell genome via non-homologous recombination or random insertion. BACKGROUND OF THE INVENTION
[0002] Foreign or heterologous nucleic acid molecules are typically introduced into the genomes of organisms through either random genomic insertion or site-specific homologous recombination. Random integration involves the insertion of a linearised nucleic acid fragment into the genome of the host cell at locations that are, for the most part, non-site-specifϊc. These insertions tend to exist as multimers or concatemers and most often do not result in the disruption and inactivation of a particular locus. However, the possibility exists that endogenous loci can be disrupted by the random insertion event, thus often making analysis of the effect of the exogenous gene on the cell or organism derived from the transformed cell difficult.
[0003] Site-specific homologous recombination allows for the targeting of particular regions of the host genome for single copy integration of an exogenous nucleic acid molecule. The exogenous nucleic acid molecule is inserted by means of homologous recombination occurring between nucleic acid sequences in a 'targeting' nucleic acid construct and the corresponding homologous sequence in the genome. However, while this type of recombination occurs at a high frequency naturally in yeast and other fungal organisms, in higher eukaryotic organisms it is a rare event. For example, in mammalian cells, the frequency of homologous versus non-homologous (random integration) recombination is reported to range from 1/100 to 1/5000 (for example, see Capecchi, 1989, Science, 244:1288-1292; Morrow and Kucherlapati, 1993, Curr. Op. Biotech., 4:577-582), which requires the screening of numerous cells for a successful homologous recombination event.
[0004] Accordingly, it is highly desirable to develop a gene targeting construct that provides for enrichment of correctly targeted events.
SUMMARY OF THE INVENTION
[0005] Thus, in one aspect, the present invention provides targeting constructs for site-specific homologous recombination at a target site in the genome of a host cell that is capable of undergoing homologous recombination. These constructs generally comprise a targeting cassette and a modulator gene that is positioned external (e.g., upstream and/or downstream) of the targeting cassette. The targeting cassette comprises (i) a marker gene that produces a marker gene expression product in the absence of the modulator gene at an 'unmodulated' level or functional activity that confers an identifying characteristic on cells containing the marker gene, and (ii) two flanking portions that are sufficiently homologous with regions of the target site to permit homologous recombination between the targeting cassette and the target site. The modulator gene modulates the level or functional activity of the marker gene expression product, whereby the modulation suppresses, attenuates or alters the identifying characteristic. In certain embodiments, the modulator gene produces an expression product that has at least one activity selected from: a transcript-degrading activity that degrades a transcript product of the marker gene; a transcript-interacting activity that inhibits translation of a transcript product of the marker gene; a polypeptide-interacting activity that inhibits the functional activity of a polypeptide product of the marker gene; a transcript enhancing activity that enhances the transcription of the marker gene; and a nucleic acid-excising activity that mediates the excision of at least a portion of the marker gene. The identifying characteristic is suitably but not exclusively selected from antibiotic resistance, antibiotic sensitivity, cell death, enzymatic activity and light emission or absorbance.
[0006] In operation, site-specific homologous recombination will result in the insertion at the target site of the targeting cassette but not of the modulator gene, thereby permitting the production of the marker gene expression product at the unmodulated level which confers the identifying characteristic on transformed cells that have successfully undergone site-specific homologous recombination. By contrast, random integration of the construct into the genome will result in the insertion of the modulator gene in addition to the targeting cassette, thereby modulating the level or functional activity of the marker gene expression product to suppress, attenuate or alter the identifying characteristic. Thus, it is the utilisation of the combination of both the marker gene and the modulator gene that allows for the selection or enrichment of cells that have successfully undergone site-specific homologous recombination.
[0007] In some embodiments, the flanking portions of the targeting cassette are homologous to regions flanking at least a portion of an endogenous gene and are, therefore, suitable for producing knock-out organisms having a partial or complete loss of function in at least one allele of the endogenous gene.
[0008] In other embodiments, the targeting cassette further comprises a foreign or heterologous nucleotide sequence of interest (e.g., a transgene or regulatory element) for insertion at the target site. These constructs are suitable for producing transgenic or knock-in organisms containing at least one copy of the nucleotide sequence of interest in the genome of the organism. [0009] In certain embodiments, the marker gene is a positive marker gene, which permits detection and isolation of cells containing that gene via production of a positive marker that facilitates the differentiation of these cells from cells which contain either no positive marker or which contain a modulator gene that inhibits or otherwise reduces the level or functional activity of the positive marker. In these embodiments, the level or functional activity of the positive marker is generally inhibited or otherwise reduced by the modulator gene. In illustrative embodiments of this type, the modulator gene comprises a nucleic acid sequence encoding a site-specific recombinase protein and the targeting cassette comprises target sites located within or adjacent to the positive marker gene that are specifically recognised by the recombinase protein. In these embodiments, the target sites may flank an excisable sequence selected from at least a portion of a transcribable sequence of the positive marker gene or of a regulatory element (e.g., a promoter or czs-acting sequence) operably connected to the transcribable sequence, whereby production of the recombinase protein in the host cell causes excision of the excisable sequence and modulation of the level or functional activity of the positive marker. Typically, the excision of the excisable sequence results in a reduction in the level or functional activity of the positive marker. [0010] In other illustrative embodiments, the modulator gene comprises a nucleic acid sequence encoding an antigen-binding molecule that is interactive with a positive marker protein.
[0011] In still other illustrative embodiments, the modulator gene comprises a nucleic acid sequence encoding a ribozyme or antisense nucleic acid that is specific to a transcript product of the positive marker gene. [0012] In specific embodiments, the modulator gene comprises a nucleic acid sequence encoding a RNA molecule that directly or indirectly attenuates or otherwise disrupts the expression of the positive marker gene by post-transcriptional gene silencing (PTGS). In illustrative embodiments of this type, the modulator gene comprises a nucleic acid sequence, which when expressed in the host cell produces a RNA molecule that comprises a targeting region having sequence identity with a nucleotide sequence of the positive marker gene and that attenuates or otherwise disrupts the expression of the positive marker gene. For example, the targeting region can have sequence identity with the sense strand of the positive marker gene. Alternatively, the targeting region can have sequence identity with the antisense strand of the positive marker gene. Optionally, the RNA molecule is unpolyadenylated.
[0013] In other illustrative embodiments, the RNA molecule further comprises one or more other targeting regions (e.g., from about 1 to about 10 other targeting regions) each of which has sequence identity with a nucleotide sequence of the positive marker gene. In other embodiments, a second modulator gene is provided external of the targeting cassette, which comprises another nucleic acid sequence from which a further RNA molecule is producible, comprising at least one targeting region having sequence identity with a nucleotide sequence of the positive marker gene.
[0014] In still other illustrative embodiments, the RNA molecule further comprises a reverse complement of the targeting region. Typically, in these embodiments, the RNA molecule further comprises a spacer sequence that spaces the targeting region from the reverse complement. In other embodiments, the modulator gene further comprises a nucleic acid sequence from which a further RNA molecule is producible, comprising the reverse complement of the targeting region. Alternatively, in embodiments in which the modulator gene is a duplex, the duplex to be transcribed may be flanked by two promoters, one controlling the transcription of one of the strands, and the other that of the complementary strand. Transcription of both strands produces a pair of RNA molecules, one RNA molecule comprising a targeting region having sequence identity with a nucleotide sequence of the positive marker gene and the other RNA molecule comprising a region that is complementary to and that hybridises with the targeting region, especially under physiological conditions.
[0015] In other illustrative embodiments, the RNA molecule further comprises two complementary RNA regions which are unrelated to any endogenous RNA in the host cell and which are in proximity to the targeting region. For example, the RNA molecule can further comprise two complementary RNA regions which are encoded by any nucleic acid sequence in the nucleome of the host cell provided that the sequence does not have sequence identity with the nucleotide sequence of the positive marker gene, wherein the regions are in proximity to the targeting region. In these embodiments, one of the complementary RNA regions can be located upstream of the targeting region and the other downstream of the targeting region. Alternatively, both the complementary regions can be located either upstream or downstream of the targeting region. In another example, the complementary regions are located within the targeting region.
[0016] In certain embodiments, the marker gene comprises a nucleic acid sequence that encodes a negative selectable marker whose level or functional activity is generally increased by the presence in the host cell of the modulator gene, which causes the host cell to lose viability in the presence or absence of a selection agent. In illustrative embodiments of this type, the modulator gene comprises a nucleic acid sequence encoding a site-specific recombinase protein and the negative selectable marker gene comprises a removable intervening sequence, which suppresses or prevents transcription of the transcribable sequence of the negative selectable marker gene or production of a functional selectable marker, or which only permits that transcription or production to levels that do not cause loss of cell viability. In one illustrative example of this type, the intervening sequence is interposed between a
promoter and the transcribable sequence of the negative selectable marker gene, which suppresses transcription of the transcribable sequence from the promoter. The removable intervening sequence typically comprises a transcriptional terminator that inhibits or otherwise suppresses transcription of downstream sequences. Expression of a site-specific recombinase protein excises the removable intervening sequence to thereby render the transcribable sequence in operable linkage with the promoter and to permit transcription of the transcribable sequence. In another illustrative example, the marker gene is in a split or divided form comprising an upstream portion and a downstream portion and a removable intervening sequence as broadly described above therebetween. The upstream portion is operably connected to a promoter. However, the removable intervening sequence inhibits or otherwise suppresses transcription of the downstream portion, thereby preventing expression of a functional negative selectable marker. Expression of a site-specific recombinase protein removes the removable intervening sequence to thereby render a transcribable marker gene sequence, which permits the expression of a functional negative selectable marker. Typically, the excision of the removable intervening sequence results in an increase or elevation in the level or functional activity of the negative selectable marker. Generally, when a negative selectable marker gene is employed, the targeting cassette further comprises a positive marker gene (e.g., a positive selectable marker), which permits detection, selection and/or isolation of cells containing that gene.
[0017] In still other embodiments, the modulator gene comprises a nucleic acid sequence encoding a transcriptional inducer and the targeting cassette comprises a promoter that is operably connected to a sequence encoding the negative selectable marker and to a binding site for the transcriptional inducer, whereby production of the transcriptional inducer in the host cell causes an increase or elevation in the level of the negative selectable marker. In illustrative embodiments of this type, the transcriptional inducer comprises (a) at least one transcriptional activation domain, and (b) at least one DNA-binding domain that binds to, or otherwise interacts with, a promoter which is operably connected to the negative selectable marker gene and with which the DNA-binding domain(s) interact(s) to activate transcription of the negative selectable marker gene. In operation, transcription of the modulator gene results in the production of the transcription inducer which, in turn, interacts via its DNA-binding domain(s) with the negative selectable marker gene promoter and via its transcriptional activation domain with transcriptional machinery of the host cell to activate transcription of the negative selectable marker gene, which results in an increase or elevation in the level or functional activity of the negative selectable marker.
[0018] In embodiments in which the negative selectable marker gene is a toxicity gene, an antidote gene can be optionally provided, either as part of the targeting construct or in a separate construct, whereby an expression product of the antidote gene suppresses the toxic effects of the toxin produced from the toxicity gene. These embodiments are particularly useful when the suppression of expression of the toxicity gene, in an unmodulated state, is partial or incomplete (i.e., leaky), which
permits the expression of the negative selectable marker gene to a level that leads to the death of the host cell or to loss of cell function. In these instances, the antidote gene is typically placed under the control of a promoter with a transcriptional activity in the host cell that results in the expression of the antidote gene to a level which prevents the death of the host cell or the loss of cell function in the absence of the modulator gene expression product but which does not prevent the death of the host cell or the loss of cell function in the presence of a modulator gene expression product.
[0019] In certain embodiments, the marker gene is flanked by target sites that are specifically recognised by a site-specific recombinase protein that is not encoded by the modulator gene. These embodiments are useful for selectively excising the marker gene from the genome of host cells having a correctly targeted event (i.e., containing the targeting cassette but not the modulator gene).
[0020] In another aspect, the invention provides methods for identifying a genetically modified cell that has undergone homologous recombination with a targeting construct as broadly described above. These methods generally comprise introducing the targeting construct into host cells and identifying or selecting a cell in which the marker gene expression product is produced at the unmodulated level or functional activity that confers the identifying characteristic on that cell. Such a method can further include a step of characterising the genomic DNA of the cell for the insertion of the targeting cassette but not of the modulator gene into the host cell genome. In some embodiments, the targeting construct is introduced into the host cells by contacting the host cells with the construct under conditions suitable for transformation of the cells with the construct. In some embodiments, the methods further comprise propagating or maintaining the host cells in which the targeting construct has been introduced under conditions selective for the presence of the marker gene expression product that confers the identifying characteristic to select or enrich for those cells which have successfully undergone site-specific homologous recombination.
[0021] In yet another aspect, the invention contemplates methods for producing a genetically modified host cell. These methods generally comprise introducing into host cells a targeting construct as broadly described so as to yield genetically modified host cells and identifying or selecting a genetically modified host cell in which the marker gene expression product is produced at the unmodulated level or functional activity that confers the identifying characteristic on that cell.
[0022] In still another aspect, the invention provides genetically modified host cells resulting from the methods as broadly described above. [0023] In a further aspect, the present invention provides methods for producing a genetically modified organism. These methods generally comprise generating a genetically modified organism from a genetically modified cell resulting from a method as broadly described above.
[0024] Accordingly, the present invention extends to the production of genetically modified organisms including genetically modified animals and plants. In some embodiments, the host cells
employed for generating genetically modified cells are mammalian cells. Suitably, the mammalian cells are embryonic stem cells, illustrative examples of which include embryonic stem cells from a mammal within the order Rodentia, e.g., mouse embryonic stem cells. In certain embodiments ofthis type, the methods comprise injecting an embryonic stem cell having the identifying characteristic or derivatives of such cell into the blastocyst or other early developmental stage of a mammal, especially a non-human mammal. Suitably, the method further comprises introducing the injected blastocyst into a pseudo-pregnant mammal and permitting the pseudo-pregnant mammal to deliver progeny containing at least one homologously recombined targeting cassette or portion thereof. In some embodiments of this type, the methods further include the step of breeding a genetically modified mammal so generated and producing progeny of that mammal. For example, mammals having the same genetic modification as the embryonic stem cell from which they were derived can be inbred to produce mammals that are homozygous for the genetic modification. Alternatively or additionally, genetically modified mammals containing different genetic modifications can be interbred to produce mammals containing two or more different genetic modifications. Suitably, any of these transgenic mammals can be crossbred with any other genetically modified, wild-type or mutant mammals of the same species in order to obtain mammals containing the genetic modification as well as the desired genetic characteristics of the other mammals used in the crossbreeding strategy. When the genetically modified mammal is a mouse, crossbreeding strategies may include crossbreeding the genetically modified mouse with another mouse including, but not restricted to, a nude mouse, a SCID mouse, an inbred strain of mouse such as BALB/c, a mouse designed to mimic a specific human disease or a mouse with a useful reporter construct. In a related aspect, the invention provides a genetically modified mammal or mammalian cell containing in its genome a targeting cassette as broadly defined above or a derivative of the targeting cassette but not the modulator gene as broadly defined above.
[0025] In some embodiments, the host cells employed for generating genetically modified cells are plant cells. Suitably, the methods comprise introducing into regenerable plant cells the targeting construct as broadly described above so as to yield transformed plant cells having the identifying characteristic. Desirably, these methods further comprise selecting stable genetic transformants from transformed plant cells. These methods typically comprise introducing into regenerable plant cells the targeting construct as broadly described above so as to yield transformed plant cells and identifying or selecting a transformed plant cell line from the transformed plant cells. In some embodiments, the regenerable cells are selected from regenerable dicotyledonous plant cells and regenerable monocotyledonous plant cells (e.g., regenerable graminaceous monocotyledonous plant cells and regenerable non-graminaceous monocotyledonous plant cells). Suitably, the methods further comprise producing a differentiated genetically modified plant by identifying or selecting a population of transformed cells and regenerating a differentiated genetically modified plant from the population. In some embodiments, the genetic modification introduced with the targeting construct renders the
differentiated genetically modified plant identifiable over the corresponding non-genetically modified plant. In a related aspect, the invention provides a genetically modified mammal or mammalian cell containing in its genome a targeting cassette as broadly defined above or a derivative of the targeting cassette but not the modulator gene broadly defined above. In a related aspect, the invention provides a genetically modified plant or plant cell containing in its genome a targeting cassette as broadly defined above or a derivative of the targeting cassette but not the modulator gene broadly defined above. The genetic modification introduced with the targeting construct as broadly described above is transmitted through a complete cycle of the differentiated genetically modified plant to its progeny so that they it is contained within the genome of cells of the progeny plants. Thus, the invention also provides seed, other plant parts, tissue, and progeny plants derived from the differentiated genetically modified plant.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] Figure 1 is a schematic representation of the vector pLOz referenced in Example 1.
[0027] Figure 2 is a schematic representation of the vector pLOzIH referenced in Example 1.
[0028] Figure 3 is a schematic representation of the pCR2.1-TOPO_mU6 PCR fragment referenced in Example 1.
[0029] Figure 4 is a schematic representation of the vector 232jpBSK_5'mU6 referenced in
Example 1.
[0030] Figure 5 is a schematic representation showing the vector 232_pBSK_5'mU6_siRNA referenced in Example 1. [0031] Figure 6 is a schematic representation showing the vector 232_pLOz_5'RNAi referenced in Example 1.
[0032] Figure 7 is a schematic representation showing the vector 232_pLOzIII_5'RNAi referenced in Example 1.
[0033] Figure 8 is a schematic representation showing the vector 232_pBSK_3'mU6 referenced in Example 1.
[0034] Figure 9 is a schematic representation showing the vector 232_pBSK_3 'RNAi referenced in Example 1.
[0035] Figure 10 is a schematic representation showing the vector 232_pLOz_RNAi referenced in Example 1. [0036] Figure 11 is a schematic representation showing the vector 232_pLOzIII_RNAi referenced in Example 1.
[0037] Figure 12 is a schematic representation showing the vector 232_pPCR_Amp_hom referenced in Example 1.
[0038] Figure 13 is a schematic representation showing the vector 232_pLiN referenced in Example 1.
[0039] Figure 14 is a schematic representation showing the vector 232_pFLiN referenced in
Example 1.
[0040] Figure 15 is a schematic representation showing the vector 232_pBSK_5'RNAi#2 referenced in Example 2. [0041] Figure 16 is a schematic representation showing the vector 232jpLiN_π referenced in
Example 2.
[0042] Figure 17 is a schematic representation showing the vector 232_pBSK_3'RNAi#2 referenced in Example 2.
[0043] Figure 18 is a schematic representation showing the vector 232_pLiN_IH referenced in
Example 2. [0044] Figure 19 is a schematic representation showing the vector 232_pBSK_5'RNAi#3 referenced in Example 2.
[0045] Figure 20 is a schematic representation showing the vector 232_pLOz_5'RNAi#3 referenced in Example 2.
[0046] Figure 21 is a schematic representation showing the vector 232_pBSK_3'RNAi#3 referenced in Example 2.
[0047] Figure 22 is a schematic representation showing the vector 232_pLiN_IV referenced in
Example 2.
[0048] Figure 23 is a schematic showing how the inclusion of the mU6-siRNA cassettes in the construct prepared according to Examples 1 and 2 aids in enriching for correctly targeted clones. [0049] Figure 24 is a schematic representation showing the vector pABU referenced in Example
3.
[0050] Figure 25 is a schematic showing how the inclusion of the SV40-Cre-ERT cassette in a targeting vector of the invention aids in enriching for correctly targeted clones.
[0051] Figure 26 is a schematic representation showing one embodiment of a switch expressible negative selectable marker targeting construct.
[0052] Figure 27 is a schematic representation showing another embodiment of a switch expressible negative selectable marker targeting construct.
[0053] Figure 28 is a schematic representation showing one embodiment of a trαws-activator- controlled thymidine kinase gene targeting construct. [0054] Figure 29 is a schematic representation showing one embodiment of a Cre-mediated neo excisable targeting construct.
DETATLED DESCRIPTION OF THE INVENTION
1. Definitions
[0055] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, preferred methods and materials are described. For the purposes of the present invention, the following terms are defined below.
[0056] The articles "a" and "an" are used herein to refer to one or to more than one (i.e. to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element.
[0057] "Amplification product" refers to a nucleic acid product generated by nucleic acid amplification techniques.
[0058] "Antigen-binding molecule" means a molecule that has binding affinity for a target antigen. It will be understood that this term extends to immunoglobulins, immunoglobulm fragments and non-immunoglobulin derived protein frameworks that exhibit antigen-binding activity.
[0059] The term "biologically active fragment", as applied to fragments of a reference or full- length polynucleotide or polypeptide sequence, refers to a fragment that has at least about 0.1, 0.5, 1,
2, 5, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, 99% of the activity of a reference sequence. Included within the scope of the present invention are biologically active fragments of at least about 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000 nucleotides or residues in length, which comprise an activity of a reference polynucleotide or polypeptide.
[0060] "Cells," "host cells," "transformed host cells," "regenerable host cells" and the like are terms that not only refer to the particular subject cell but to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.
[0061] The terms "chimeric construct " "chimeric gene," "chimeric nucleic acid'' and the like are used herein to refer to a gene or nucleic acid sequence or segment comprising at least two nucleic acid sequences or segments from species which do not combine those sequences or segments under natural conditions, or which sequences or segments are positioned or linked in a manner which does not normally occur in the native genome or nucleome of the untransformed host. Thus, a "chimeric gene" refers to any gene that is not a native gene, comprising regulatory and coding or non-coding sequences
that are not found together in nature. In this light, a chimeric gene may comprise regulatory sequences and coding or non-coding sequences that are derived from different sources, or regulatory sequences and coding or non-coding sequences derived from the same source, but arranged in a manner different than that found in nature. [0062] By "coding sequence" is meant any nucleic acid sequence that contributes to the code for the polypeptide product of a gene.
[0063] Throughout this specification, unless the context requires otherwise, the words
"comprise," "comprises" and "comprising" will be understood to imply the inclusion of a stated step or element or group of steps or elements but not the exclusion of any other step or element or group of steps or elements.
[0064] "Constitutive promoter" refers to a promoter that directs expression of an operably linked transcribable sequence in many or all tissues of an organism.
[0065] The terms "complementary" and "complementarity" refer to polynucleotides (i.e., a sequence of nucleotides) related by the base-pairing rules. For example, for the sequence "A-G-T," is complementary to the sequence "T-C-A." Complementarity may be "partial," in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be "complete" or "total" complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridisation between nucleic acid strands. [0066] By "corresponds to" or "corresponding to" is meant a polynucleotide (a) having a nucleotide sequence that is substantially identical or complementary to all or a portion of a reference polynucleotide sequence or (b) encoding an amino acid sequence identical to an amino acid sequence in a peptide or protein. This phrase also includes within its scope a peptide or polypeptide having an amino acid sequence that is substantially identical to a sequence of amino acids in a reference peptide or protein.
[0067] The term "endogenous" refers to a gene or nucleic acid sequence or segment that is normally found in a host organism.
[0068] The term "endogenous genomic nucleic acid sequence" is defined herein as a nucleotide sequence that is normally present within the genome of a cell. As disclosed herein, endogenous genomic nucleic acid sequences are capable of undergoing site-specific homologous recombination with sequences of a targeting construct of the invention and, therefore, can be utilised as a target for modification by the disclosed targeting constructs. Sequences included within this definition can represent any coding or non-coding regions of specific genes present within the cellular genome. Such genes include transcribable nucleic acid sequences as defined herein. Endogenous genomic nucleic
acid sequences can also represent regulatory elements such as promoters, enhancers or repressor elements. The organisation of the endogenous genomic target nucleic acid sequence is generally similar to specific sequences present within the targeting construct. That is, it contains sequences which are substantially homologous to sequences present within the targeting construct that allow for site-specific homologous recombination to occur.
[0069] The term "expression" with respect to a gene sequence refers to transcription of the gene and, as appropriate, translation of the resulting mRNA transcript to a protein. Thus, as will be clear from the context, expression of a coding sequence results from transcription and translation of the coding sequence. Conversely, expression of a non-coding sequence results from the transcription of the non-coding sequence.
[0070] The terms "flanked by," "flanking" and the like as they apply to relationships between two or more nucleotide sequences in the targeting constructs of the invention do not require one of these nucleotide sequences to be located directly adjacent to another nucleotide sequence. For example, three reference nucleotide sequences (A, B and C) may be flanked by recombination target site sequences, or recombination target sites sequences may be flanking those reference sequences, even though reference sequence B is not directly adjacent to these sites. Accordingly, the term "flanked by" is equivalent to being "in between" the recombination sites and the term "flanking" is equivalent to the recombination sites being upstream or downstream of a reference sequence.
[0071] The term "foreign polynucleotide" or "exogenous polynucleotide" or "heterologous polynucleotide" refers to any nucleic acid (e.g. a gene sequence or regulatory sequence) which is introduced into the genome of an organism by experimental manipulations and may include gene sequences found in that organism so long as the introduced gene contains some modification (e.g. a point mutation, the presence of a selectable marker gene, the presence of a loxP site, etc.) relative to the naturally-occurring gene. [0072] As used herein, the terms "function" or "functional activity" refer to a biological, enzymatic, or therapeutic function.
[0073] The term "gene" as used herein refers to any and all discrete coding regions of a host genome, or regions that code for a functional RNA only (e.g., tRNA, rRNA, regulatory RNAs such as ribozymes, PTGS-associated RNAs etc) as well as associated non-coding regions and optionally regulatory regions. In certain embodiments, the term "gene" gene includes within its scope the open reading frame encoding specific polypeptides, introns, and adjacent 5' and 3' non-coding nucleotide sequences involved in the regulation of expression. In this regard, the gene may further comprise control signals such as promoters, enhancers, termination and/or polyadenylation signals that are naturally associated with a given gene, or heterologous control signals. The gene sequences may be
cDNA or genomic DNA or a fragment thereof. The gene may be introduced into an appropriate vector for extrachromosomal maintenance or for integration into the host.
[0074] The terms "growing" or "regeneration" as used herein mean growing a whole, differentiated plant from a plant cell, a group of plant cells, a plant part (including seeds), or a plant piece (e.g., from a protoplast, callus, or tissue part).
[0075] The term "host" refers to any organism, or cell thereof, whether eukaryotic or prokaryotic into which a recombinant construct can be stably or transiently introduced in order to reduce gene expression.
[0076] "Hybridisation" is used herein to denote the pairing of complementary nucleotide sequences to produce a DNA-DNA hybrid or a DNA-RNA hybrid. Complementary base sequences are those sequences that are related by the base-pairing rules. In DNA, A pairs with T and C pairs with G. In RNA U pairs with A and C pairs with G. In this regard, the terms "match" and "mismatch" as used herein refer to the hybridisation potential of paired nucleotides in complementary nucleic acid strands. Matched nucleotides hybridise efficiently, such as the classical A-T and G-C base pair mentioned above. Mismatches are other combinations of nucleotides that do not hybridise efficiently.
[0077] As used herein, the term "interaction", "interactive" and the like shall be taken to refer to a physical association between two or more molecules, wherein the association may involve the formation of an induced magnetic field or paramagnetic field, covalent bond formation such as a disulfide bridge formation between polypeptide molecules, an ionic interaction such as occur in an ionic lattice, a hydrogen bond or alternatively, a van der Waals interaction such as a dipole-dipole interaction, dipole-induced-dipole interaction, induced-dipole-induced-dipole interaction or a repulsive interaction or any combination of the above forces of attraction.
[0078] By "isolated" is meant material that is substantially or essentially free from components that normally accompany it in its native state. For example, an "isolated polynucleotide", as used herein, refers to a polynucleotide, which has been purified from the sequences which flank it in a naturally occurring state, e.g., a DNA fragment which has been removed from the sequences which are normally adjacent to the fragment.
[0079] The term "knock-in" generally refers to a heterologous or foreign gene or part thereof that has been inserted into a genome through homologous recombination. The knock-in gene or gene part may be a mutant form of a gene or gene part that replaces the endogenous, wild-type gene or gene part. Such mutations include insertions of heterologous sequences, deletions, point mutations, frameshift mutations and any other mutations that may prevent, disrupt or alter normal gene expression. Thus, a "knock-in" animal, as used herein, refers to a genetically modified animal in which a specific gene or part thereof is replaced by a foreign gene or DNA sequence. A "conditional knock- in" refers to a heterologous or foreign gene or part thereof that has been inserted into a genome
through homologous recombination and that is expressed at a designated developmental stage or under particular environmental conditions. A "conditional knock-in vector" is a vector including a heterologous or foreign gene or part thereof that can be inserted into a genome through homologous recombination and that can be expressed at a designated developmental stage or under particular environmental conditions.
[0080] By "knock-out" is meant the inactivation or loss-of-function of a gene, which decreases, abrogates or otherwise inhibits the level or functional activity of an expression product of that gene. A "knock-out" animal refers to a genetically modified animal in which a gene is inactivated or loses function. A "conditional knock-out" refers to a gene that is inactivated or loses function under specific conditions, such as a gene that is inactivated or loses function in a tissue-specific or a temporal- specific pattern. A "conditional knock-out vector" is a vector including a gene that can be inactivated or whose function can be lost under specific conditions.
[0081] The term "loss-of-function," is art recognised and, with respect to a gene or gene product, refers to mutations in a gene which ultimately decrease or otherwise inhibit the level or functional activity of an expression product of that gene. For example, a loss-of-function mutation to a gene of interest may be a point mutation, deletion or insertion of sequences in the coding sequence, intron sequence or 5' or 3' flanking sequences of the gene so as to, for example, (i) alter (e.g., decrease) the level gene expression, (ii) alter exon-splicing patterns, (iii) alter the activity of the encoded protein, or (iv) alter (decrease) the stability of the encoded protein. For example, the term "loss-of-function," as it relates to a marker gene inhibited by an expression product of a modulator gene of the invention, refers to a diminishment or abrogation in the functional activity of the marker gene expression product when compared to its functional activity in the absence of the modulator gene expression product.
[0082] The term "mammal" is used herein in its broadest sense and includes rodents, primates, ovines, bovines, ruminants, lagomorphs, porcine, caprices, equines, canines, and felines. Preferred non-human mammals are selected from the order Rodentia that includes murines (e.g. rats and mice), most preferably mice.
[0083] By "marker gene" is meant a gene that imparts a distinct phenotype to cells expressing the marker gene and thus allows such transformed cells to be distinguished from cells that do not have the marker. A selectable marker gene confers a trait for which one can 'select' based on resistance to a selective agent (e.g., an herbicide, antibiotic, radiation, heat, or other treatment damaging to untransformed cells). A screenable marker gene (or reporter gene) confers a trait that one can identify through observation or testing, i.e., by 'screening' (e.g. β-glucuronidase, luciferase, green fluorescent protein or other activity not present in untransformed cells).
[0084] The term "negative selection" refers to the act of selecting against cells through the implementation of methodologies which allow for the killing of those cells.
[0085] The term "non-coding sequence" refers to any nucleic acid sequence that does not contribute to the code for the polypeptide product of a gene.
[0086] The term "5' non-coding region" is used herein in its broadest context to include all nucleotide sequences which are derived from the upstream region of an expressible gene, other than those sequences which encode amino acid residues which comprise the polypeptide product of the gene, wherein 5' non-coding region confers or activates or otherwise facilitates, at least in part, expression of the gene.
[0087] By "nucleome" is meant the total nucleic acid complement and includes the genome, extrachromosomal nucleic acid molecules and all RNA molecules such as mRNA, heterogenous nuclear RNA (hnRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), small cytoplasmic RNA (scRNA), ribosomal RNA (rRNA), translational control RNA (tcRNA), transfer RNA (tRNA), eRNA, messenger-RNA-interfering complementary RNA (micRNA) or interference RNA (iRNA), chloroplast or plastid RNA (cpRNA) and mitochondrial RNA (mtRNA).
[0088] By "obtained from" is meant that a sample such as, for example, a nucleic acid extract is isolated from, or derived from, a particular source of the host. For example, the nucleic acid extract may be obtained from tissue isolated directly from the host.
[0089] The term "oligonucleotide" as used herein refers to a polymer composed of a multiplicity of nucleotide units (deoxyribonucleotides or ribonucleotides, or related structural variants or synthetic analogues thereof) linked via phosphodiester bonds (or related structural variants or synthetic analogues thereof). Thus, while the term "oligonucleotide" typically refers to a nucleotide polymer in which the nucleotides and linkages between them are naturally occurring, it will be understood that the term also includes within its scope various analogues including, but not restricted to, peptide nucleic acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl ribonucleic acids, and the like. The exact size of the molecule may vary depending on the particular application. An oligonucleotide is typically rather short in length, generally from about 10 to 30 nucleotides, but the term can refer to molecules of any length, although the term "polynucleotide" or "nucleic acid" is typically used for large oligonucleotides.
[0090] The terms "open reading frame" and "ORF" refer to the amino acid sequence encoded between translation initiation and termination codons of a coding sequence. The terms "initiation codon" and "termination codon" refer to a unit of three adjacent nucleotides ('codon') in a coding sequence that specifies initiation and chain termination, respectively, of protein synthesis (mRNA translation).
[0091] The terms "operably connected," "operably linked," "in operable linkage," "in operable connection" and the like are used herein to refer to the placement of a transcribable sequence under the regulatory control of a promoter, which controls the transcription and optionally translation of the
sequence. In the construction of heterologous promoter/transcribable sequence combinations, it is generally desirable to position the genetic sequence or promoter at a distance from the gene transcription start site that is approximately the same as the distance between that genetic sequence or promoter and the gene it controls in its natural setting; i.e. the gene from which the genetic sequence or promoter is derived. As is known in the art, some variation in this distance can be accommodated without loss of function. Similarly, the desirable positioning of a regulatory sequence element with respect to a heterologous gene to be placed under its control is defined by the positioning of the element in its natural setting; i.e. the genes from which it is derived.
[0092] As used herein, "plant" and "differentiated plant" refer to a whole plant or plant part containing differentiated plant cell types, tissues and/or organ systems. Plantlets and seeds are also included within the meaning of the foregoing terms. Plants included in the invention are any plants amenable to transformation techniques, including angiosperms, gymnosperms, monocotyledons and dicotyledons.
[0093] The term "plant cell" as used herein refers to any plant cell or cell line including protoplasts, gamete-producing cells, and cells which regenerate into whole plants. Plant cells also include cells in plants as well as protoplasts in culture.
[0094] By "plant tissue" is meant differentiated and undifferentiated tissue derived from roots, shoots, pollen, seeds, tumour tissue, such as crown galls, and various forms of aggregations of plant cells in culture, such as embryos and calluses. [0095] The term "polynucleotide" or "nucleic acid' as used herein designates mRNA, RNA, cRNA, cDNA or DNA. The term typically refers to oligonucleotides greater than 30 nucleotides in length.
[0096] The terms "polynucleotide variant" and "variant" and the like refer to polynucleotides displaying substantial sequence identity with a reference polynucleotide sequence or polynucleotides that hybridise with a reference sequence under stringent conditions that are defined hereinafter. These terms also encompass polynucleotides that are distinguished from a reference polynucleotide by the addition, deletion or substitution of at least one nucleotide. Accordingly, the terms "polynucleotide variant" and "variant" include polynucleotides in which one or more nucleotides have been added or deleted, or replaced with different nucleotides. In this regard, it is well understood in the art that certain alterations inclusive of mutations, additions, deletions and substitutions can be made to a reference polynucleotide whereby the altered polynucleotide retains the biological function or activity of the reference polynucleotide. The terms "polynucleotide variant" and "variant" also include naturally occurring allelic variants.
[0097] "Polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues and to variants and synthetic analogues of the same. Thus, these terms
apply to amino acid polymers in which one or more amino acid residues is a synthetic non-naturally occurring amino acid, such as a chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally-occurring amino acid polymers.
[0098] The term "polypeptide variant" refers to polypeptides that are distinguished from a reference polypeptide by the addition, deletion or substitution of at least one amino acid. In certain embodiments, a polypeptide variant is distinguished from a reference polypeptide by one or more substitutions, which may be conservative or non-conservative. In certain embodiments, the polypeptide variant comprises conservative substitutions and, in this regard, it is well understood in the art that some amino acids may be changed to others with broadly similar properties without changing the nature of the activity of the polypeptide. Polypeptide variants also encompass polypeptides in which one or more amino acids have been added or deleted, or replaced with different amino acid residues.
[0099] By "primer" is meant an oligonucleotide which, when paired with a strand of DNA, is capable of initiating the synthesis of a primer extension product in the presence of a suitable polymerising agent. The primer is typically single-stranded for maximum efficiency in amplification but may alternatively be double-stranded. A primer must be sufficiently long to prime the synthesis of extension products in the presence of the polymerisation agent. The length of the primer depends on many factors, including application, temperature to be employed, template reaction conditions, other reagents, and source of primers. For example, depending on the complexity of the target sequence, the oligonucleotide primer typically contains 15 to 35 or more nucleotides, although it may contain fewer nucleotides. Primers can be large polynucleotides, such as from about 200 nucleotides to several kilobases or more. Primers may be selected to be "substantially complementary" to the sequence on the template to which it is designed to hybridise and serve as a site for the initiation of synthesis. By "substantially complementary", it is meant that the primer is sufficiently complementary to hybridise with a target nucleotide sequence. Suitably, the primer contains no mismatches with the template to which it is designed to hybridise but this is not essential. For example, non-complementary nucleotides may be attached to the 5' end of the primer, with the remainder of the primer sequence being complementary to the template. Alternatively, non-complementary nucleotides or a stretch of non- complementary nucleotides can be interspersed into a primer, provided that the primer sequence has sufficient complementarity with the sequence of the template to hybridise therewith and thereby form a template for synthesis of the extension product of the primer.
[0100] By "promoter" is meant a region of DNA, which controls at least in part the initiation and level of transcription. Reference herein to a "promoter" is to be taken in its broadest context and includes the transcriptional regulatory sequences of a classical genomic gene, including a TATA box and CCAAT box sequences, as well as additional regulatory elements (i.e., activating sequences, enhancers and silencers) that alter gene expression in response to developmental and/or environmental
stimuli, or in a tissue-specific or cell-type-specific manner. A promoter is usually, but not necessarily, positioned upstream or 5', of a transcribable sequence (e.g., a coding sequence or a sequence encoding a functional RNA), the expression of which it regulates. Furthermore, the regulatory elements comprising a promoter are usually positioned within 2 kb of the start site of transcription of the gene. Promoters according to the invention may contain additional specific regulatory elements, located more distal to the start site to further enhance expression in a cell, and/or to alter the timing or inducibility of expression of a structural gene to which it is operably connected. The term "promoter" also includes within its scope inducible, repressible and constitutive promoters as well as minimal promoters. Minimal promoters typically refer to minimal expression control elements that are capable of initiating transcription of a selected DNA sequence to which they are operably linked. In some examples, a minimal promoter is not capable of initiating transcription in the absence of additional regulatory elements (e.g., enhancers or other cis-acting regulatory elements) above basal levels. A minimal promoter frequently consists of a TATA box or TATA-like box. Numerous minimal promoter sequences are known in the literature. For example, minimal promoters may be selected from a wide variety of known sequences, including promoter regions from fos, CMV, SV40 and IL-2, among many others. Illustrative examples are provided which use a minimal CMV promoter or a minimal IL2 gene promoter (-72 to +45 with respect to the start site; Siebenlist, 1986).
[0101] By "recombinase target site" (RTS) is meant a nucleic acid sequence which is by a recombinase for the excision of the intervening sequence. It is to be understood that two RTSs are required for excision. Thus, when a Cre recombinase is used, each RTS comprises a loxP site; when loxP sites are used, the corresponding recombinase is the Cre recombinase. That is, the recombinase must correspond to or recognise the RTSs. When the FLP recombinase is used, each RTS comprises a FLP recombination target site (FRT); when FRT sites are used, the corresponding recombinase is the FLP recombinase. [0102] The term "recombinant polynucleotide" as used herein refers to a polynucleotide formed in vitro by the manipulation of nucleic acid into a form not normally found in nature. For example, the recombinant polynucleotide may be in the form of an expression vector. Generally, such expression vectors include transcriptional and translational regulatory nucleic acid operably linked to the nucleotide sequence. [0103] By "recombinant polypeptide" is meant a polypeptide made using recombinant techniques, i.e., through the expression of a recombinant polynucleotide.
[0104] The term "regulatable promoter" refers to promoters that direct gene expression not constitutively, but in a temporally- and/or spatially-regulated manner, and include both tissue-specific and inducible promoters. It includes natural and synthetic sequences as well as sequences which may be a combination of synthetic and natural sequences. Different promoters may direct the expression of
a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. New promoters of various types useful in host cells are constantly being discovered. Since in most cases the exact boundaries of regulatory sequences have not been completely defined, nucleic acid fragments of different lengths may have identical promoter activity. [0105] "Regulatory sequences" or "regulatory elements" refer to nucleotide sequences located upstream (5' non-coding sequences), within, or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, translation leader sequences, introns, and polyadenylation signal sequences. They include natural and synthetic sequences as well as sequences which may be a combination of synthetic and natural sequences.
[0106] Terms used to describe sequence relationships between two or more polynucleotides or polypeptides include "reference sequence," "comparison window," "sequence identity," "percentage of sequence identity" and "substantial identity". A "reference sequence" is at least 12 but frequently 15 to 18 and often at least 25 monomer units, inclusive of nucleotides and amino acid residues, in length. Because two polynucleotides may each comprise (1) a sequence (i.e., only a portion of the complete polynucleotide sequence) that is similar between the two polynucleotides, and (2) a sequence that is divergent between the two polynucleotides, sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences of the two polynucleotides over a "comparison window" to identify and compare local regions of sequence similarity. A "comparison window" refers to a conceptual segment of at least 50 contiguous positions, usually about 50 to about 100, more usually about 100 to about 150 in which a sequence is compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. The comparison window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by computerised implementations of algorithms (GAP, BESTFIT, FAST A, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA) or by inspection and the best alignment (i.e., resulting in the highest percentage homology over the comparison window) generated by any of the various methods selected. Reference also may be made to the BLAST family of programs as for example disclosed by Altschul et al., 1997, Nucl. Acids Res. 25:3389. A detailed discussion of sequence analysis can be found in Unit 19.3 of Ausubel et al., "Current Protocols in Molecular Biology", John Wiley & Sons Inc, 1994-1998, Chapter 15.
[0107] The terms "sequence identity" and "identity" are used interchangeably herein to refer to the extent that sequences are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis over a window of comparison. Thus, a "percentage of sequence identity" is calculated by
comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, T) or the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, lie, Phe, Tyr, Tip, Lys, Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. For the purposes of the present invention, "sequence identity" will be understood to mean the "match percentage" calculated by the DNASIS computer program (Version 2.5 for windows; available from Hitachi Software engineering Co., Ltd., South San Francisco, California, USA) using standard defaults as used in the reference manual accompanying the software.
[0108] The term "site-specific homologous recombination" refers to strand exchange crossover events between nucleic acid sequences substantially similar in nucleotide composition. These crossover events can take place between sequences contained in the targeting construct of the invention and endogenous genomic nucleic acid sequences. In addition, it is possible that more than one site-specific homologous recombination event can occur, which would result in a replacement event in which nucleic acid sequences contained within the targeting construct have replaced specific sequences present within the endogenous genomic sequences.
[0109] "Stringency" as used herein, refers to the temperature and ionic strength conditions, and presence or absence of certain organic solvents, during hybridisation. The higher the stringency, the higher will be the degree of complementarity between immobilised nucleotide sequences and the labelled polynucleotide sequence.
[0110] "Stringent conditions" refers to temperature and ionic conditions under which only nucleotide sequences having a high frequency of complementary bases will hybridise. The stringency required is nucleotide sequence dependent and depends upon the various components present during hybridisation. Generally, stringent conditions are selected to be about 10 to 20° C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a target sequence hybridises to a complementary probe.
[0111] The term "substantially non-homologous" or "substantially not homologous" refers to segments of the targeting construct, which do not contain nucleotide sequences similar enough to target genomic sequences to allow for the process of site-specific homologous recombination to occur. Dissimilar sequences of this capacity fail to undergo site-specific homologous recombination with target genomic sequences due to the mismatch of base pair composition between the two sequences.
[0112] The term "transcribable nucleic acid sequence" or "transcribed nucleic acid sequence" excludes the non-transcribed regulatory sequence that drives transcription. Depending on the aspect of
the invention, the transcribable sequence may be derived in whole or in part from any source known to the art, including a plant, a fungus, an animal, a bacterial genome or episome, eukaryotic, nuclear or plasmid DNA, cDNA, viral DNA or chemically synthesised DNA. A transcribable sequence may contain one or more modifications in either the coding or the untranslated regions, which could affect the biological activity or the chemical structure of the expression product, the rate of expression or the manner of expression control. Such modifications include, but are not limited to, insertions, deletions and substitutions of one or more nucleotides. The transcribable sequence may contain an uninterrupted coding sequence or it may include one or more introns, bound by the appropriate splice junctions. The transcribable sequence may also encode a fusion protein. In other embodiments, the transcribable sequence comprises non-coding regions only.
[0113] The term "transformation" means alteration of the genotype of a host by the introduction of an expression system according to the invention.
[0114] The term "transgene" is used herein to describe genetic material that has been or is about to be artificially introduced into the nucleome, especially the genome, of a host and that is transmitted to the progeny of the host. The transgene is used to transform a host cell, meaning that a permanent or transient genetic change, especially a permanent genetic change, is induced in a host cell following incorporation of one or more nucleic acid components of the expression system as defined herein.
[0115] As used herein, the term "transgenic" or "transformed" with respect to a host cell, host part, host tissue or host means a host cell, host part, host tissue or host which comprises an targeting cassette or derivative thereof but not the modulator gene of the invention, which has been introduced into the nucleome, especially the genome, of a host cell, host part, host tissue or host.
[0116] By "vector" is meant a nucleic acid molecule, suitably a DNA molecule derived, for example, from a plasmid, bacteriophage, or plant virus, into which a nucleic acid sequence may be inserted or cloned. A vector typically contains one or more unique restriction sites and may be capable of autonomous replication in a defined host cell including a target cell or tissue or a progenitor cell or tissue thereof, or be integrable with the genome of the defined host such that the cloned sequence is reproducible. Accordingly, the vector may be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a linear or closed circular plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector may contain any means for assuring self- replication. Alternatively, the vector may be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. A vector system may comprise a single vector or plasmid, two or more vectors or plasmids, which together contain the total DNA to be introduced into the genome of the host cell, or a transposon. The choice of the vector will typically depend on the compatibility of the vector with the
host cell into which the vector is to be introduced. The vector may also include a selection marker such as an antibiotic resistance gene that can be used for selection of suitable transformants. Examples of such resistance genes are well known to those of skill in the art.
[0117] The terms "wild type," "native" or "non-transgenic" refers to an untransformed host, host cell, host part, host tissue, i.e. one where the nucleome, especially the genome, has not been altered by the presence of one or more nucleic acid components of an expression system as defined herein.
2. Targeting constructs
[0118] The present invention is based in part on the development of targeting constructs useful for introducing modifications into endogenous genomic target nucleic acid sequences via site-specific homologous recombination. Accordingly, targeting constructs including targeting vectors are provided, as are methods of using such constructs to genetically modify cells. Also provided are cells that are genetically modified using a targeting construct of the invention, and transgenic organisms generated from such cells, for example, embryonic stem (ES) cells or plant cells that are genetically modified in a site-specific manner in one or more endogenous genomic nucleic acid sequences using a targeting construct of the invention, and the progeny of such organisms.
[0119] The targeting constructs of the invention take advantage of a marker gene and a modulator gene to select or enrich for cells that have successfully undergone site-specific homologous recombination at a selected target site in the genome of a host cell. The marker gene is contained within a targeting cassette comprising two flanking portions that are sufficiently homologous with regions of the target site to permit homologous recombination between the targeting cassette and the target site. The marker gene produces a marker gene expression product in the absence of the modulator gene at an 'unmodulated' level or functional activity that confers an identifying characteristic (e.g., antibiotic resistance, antibiotic sensitivity, cell death, enzymatic activity and light emission or absorbance) on cells containing the marker gene. Suitably, the marker and/or modulator genes are generally non-homologous to cellular endogenous genomic sequences and are therefore incapable of undergoing site-specific homologous recombination in the host cell. The modulator gene is usually positioned externally or outside of the targeting cassette and modulates the level or functional activity of the marker gene expression product to suppress, attenuate or alter the identifying characteristic. Thus, random integration of the targeting construct into the cellular genome will typically result in insertion of the entire construct including the modulator gene, which thereby modulates the level or functional activity of the marker gene expression product to suppress, attenuate or alter the identifying characteristic. By contrast, site-specific homologous recombination will result in the insertion at the target site of the targeting cassette - typically through strand exchange between the flanking portions of the targeting cassette with endogenous target sequences at the target site - but not of the modulator gene, thereby permitting the production of the marker gene expression product at
the unmodulated level This production confers the identifying characteristic on transformed cells that have successfully undergone site-specific homologous recombination.
[0120] The flanking portions of the targeting cassette, which are substantially homologous to endogenous genomic target sequences of the host cell are crucial parameters that must be correctly addressed for successful gene targeting. In general, one region of homology can be as small as 25 bp (Ayares et al. 1986, Genetics 83:5199), although it is recommended that significantly larger regions of homology be utilised. Up to a certain length, an increase in the amount of homology provided in the targeting cassette increases targeting efficiency (Zhang et al, 1994, Mol. Cell Biol. 14:2404). Generally, the flanking portions have between about 50 nts and 50,000 nts, usually between about 100 and 15,000 nts, more usually between about 1000 and 10,000 nts and still more usually between about 3,000 and 10,000 nts, which are homologous with the target site. The flanking portions may comprise any sequence that is homologous with the target site in the genome of the host cell. Furthermore, the flanking portions may comprise non-coding or coding nucleic acid sequences.
[0121] Desirably, the flanking portions display significantly high sequence identity or homology to cellular endogenous target genomic sequences. High homology allows for efficient base pairing during the crossover and strand exchange process of site-specific homologous recombination. Any mismatch base pairing between the flanking portions and cellular genomic sequences disfavours the recombination reaction. It is desirable, for example, that the flanking portions are 100% homologous (i.e., isogenic) to cellular endogenous genomic sequences, less desirable that they are 80% homologous and even less desirable that they are 50% homologous. When using non-isogenic flanking portions, these portions should be at least about 1,500 nts, 2,000 nts, 2,500 nts, 3,000 nts or more in length. The marker and modulator gene sequences are substantially non-homologous to cellular endogenous genomic sequences and therefore do not undergo site-specific recombination with these sequences. [0122] The marker gene typically comprises a promoter that is operable in the host cell and that is in operable linkage with a nucleic acid sequence that encodes a marker (e.g., transcript or protein), whereby the marker confers a phenotype on a cell in which it is expressed to facilitate the identification and/or selection of cells which contain and express the marker gene. Illustrative markers of this type include signal-producing proteins, epitopes, fluorescent or enzymatic markers, or inhibitors of cellular function. For instance, selectable markers can be selected from marker enzymes such as β-galactosidase, or β-lactamase, reporter or signal-producing proteins such as luciferase or GFP, ribozymes, RNA interference (RNAi) molecules, conditional transcriptional regulators such as a Tet repressor or measurement proteins such as proteins that signals cell state, e.g., a protein that signals intracellular membrane voltage. In certain instances, the markers are "secretable markers" whose secretion can be detected as a means of identifying or selecting for transformed cells. Examples include secretable antigens that can be identified by antigen-binding molecules (e.g., antibodies), or
secretable enzymes that can be detected by their catalytic activity. Secretable proteins include, but are not restricted to, proteins that are inserted or trapped in the cell wall (e.g., proteins that include a leader sequence such as that found in the expression unit of extensin or tobacco PR-S); small, diffusible proteins detectable, e.g. by ELISA; and small active enzymes detectable in extracellular solution (e.g., α-amylase, β-lactamase, phosphinothricin acetyltransferase).
2.1 Positive marker gene embodiments
[0123] In certain embodiments, the marker is a positive marker that permits detection and isolation of cells containing the positive marker gene via production of the positive marker, which allows for the differentiation of these cells from cells which contain no positive marker or which contain a modulator gene that inhibits or otherwise reduces the level or functional activity of the positive marker. In these embodiments, the level or functional activity of the positive marker is generally inhibited or otherwise reduced by the modulator gene.
[0124] In some embodiments, the positive marker is an antigen (e.g., protein-containing epitopes) which is generally selected from proteins and glycoproteins or portions thereof that are not normally detected in the host cell by immunohistological techniques. For example, the antigen can be CD4 (a protein normally expressed in the immune system) and be expressed and detected in non- immune cells (e.g., ES cells or plant cells).
[0125] In other embodiments, the positive marker is a positive selectable marker that confers resistance or tolerance to a selection agent. Illustrative examples of this type (and their selection agents) include, but are not restricted to, kanamycin kinase, neomycin phosphotransferase and aminoglycoside phosphotransferase (kanamycin, paromomycin, G418 and the like), puromycin N- acetyl transferase and puromycin resistance protein (puromycin), hygromycin phosphotransferase (hygromycin), bleomycin resistance protein (bleomycin), phleomycin binding protein (phleomycin), blasticidin deaminase (blasticidin), β-lactamase (ampicillin), tetracycline resistance protein (tetracycline), guanine phosphoribosyltransferase (xanthine), glutamine synthetase and the acetyl transferase gene from Streptomyces viridochromogenes described in EP-A 275 957 (phosphinothricin), hypoxanthine guanine phosphoribosyl transferase (hypoxanthine), chloramphenicol acetyltransferase (chloramphenicol), glutathione-S-transferase (glutathione), histidinol dehydrogenase (histidinol) 5-enolshikimate-3-phosphate synthase (EPSPS) (N- phosphonomethylglycine), barstar (bialaphos), a nitrilase such as Bxn from Klebsiella ozaenae (bromoxynil), dihydrofolate reductase (methotrexate), mutant acetolactate synthase (ALS) as described in EP-A-154 204 (imidazolinone, sulfonylurea or other ALS-inhibiting chemicals), mutated anthranilate synthase (5-methyl tryptophan), and dalapon dehalogenase gene (2,2-dichloropropionic acid) and their biologically active fragments, variants and derivatives.
[0126] In still other embodiments, the positive marker is a screenable marker. Desirable screenable markers include, but are not limited to, β-glucuronidase (GUS) enzyme for which various chromogenic substrates are known; horseradish peroxidase for which various chromogenic substrates are known; β-galactosidase for which chromogenic substrates are known; human placental alkaline phosphatase and alkaline phosphatase for which various chromogenic substrates are known; aequorin which may be employed in calcium-sensitive bioluminescence detection; β-lactamase which encodes an enzyme for which various chromogenic substrates are known (e.g., PADAC, a chromogenic cephalosporin); an R-locus gene product that regulates the production of anthocyanin pigments (red colour) in plant tissues (Dellaporta et al, 1988, in Chromosome Structure and Function, pp. 263-282); α-amylase (Ikuta et al, 1990, Biotech., 8:241); tyrosinase (Katz et al, 1983, J. Gen. Microbiol, 129:2703) which oxidises tyrosine to dopa and dopaquinone which in turn condenses to form the easily detectable compound melanin; or a xylose transporter (Zukowsky et al, 1983, Proc. Natl. Acad. Sci. USA 80:1101), which encodes a catechol dioxygenase that can convert chromogenic catechols. Alternatively, the screenable marker may be selected from fluorescent proteins such as green fluorescent protein (GFP), including particular mutant or engineered forms of GFP such as BFP, CFP and YFP (Aurora Biosciences) (see, e.g., Tsien et al, U.S. Pat. No. 6,124,128), enhanced GFP (EGFP) and DsRed (Clontech), blue, cyan, green, yellow or red fluorescent proteins (Clontech, Feng et al, 2000, Neuron, 28:41-51), rapidly degrading GFP-fusion proteins, (see, e.g., Li et al, U.S. Pat. No. 6,130,313), and fluorescent proteins homologous to GFP, some of which have spectral characteristics different from GFP and emit at yellow and red wavelengths (Matz et al, 1999, Nat. Biotechnol. 17(10): 969-973).
[0127] Accordingly, where the marker is a positive marker, the modulator gene generally inhibits the level or functional activity of the positive marker. In this configuration, the modulator gene is used to confer partial and desirably complete loss of function of the positive marker gene in the host cell. Such inhibition includes substantial down-regulation of expression of the positive marker gene to basal levels as well as partial down-regulation of expression of the positive marker gene to below "normal" levels. Suitably, the expression of the modulator gene reduces the level or functional activity of an expression product of the positive marker gene by at least about 50, 55, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% or more when compared to the level or functional activity of the expression product in the absence of modulator gene expression. In these embodiments, the modulator gene typically comprises a promoter that is operable in the host cell and that is in operable linkage with a nucleic acid sequence encoding an expression product that is suitably selected from: a site- specific recombinase protein, a ribozyme or antisense molecule, an antigen-binding molecule and a nucleic acid molecule conferring post-transcriptional gene silencing (PTGS). [0128] Thus, in some embodiments, the modulator gene comprises a nucleic acid sequence that encodes a site-specific recombinase protein, the recombinase typically recognises target sites located
within or adjacent to the positive marker gene, whereby excision of a nucleotide sequence between the target sites by the recombinase results in a reduction in the level or functional activity of the positive marker gene expression product. Illustrative site-specific recombinases include, but are not limited to, Cre, FLP-wild type (wt), FLP-L or FLPe. Recombination may be effected by any art-known method, e.g., the method of Doetschman et al. (1987, Nature 330:576-578); the method of Thomas et al. (1986, Cell 44:419-428); the Cre-loxP recombination system (Stemberg and Hamilton, 1981, J. Mol. Biol. 150:467-486; Lakso et al, 1992, Proc. Natl. Acad. Sci. USA 89:6232-6236); the FLP recombinase system of Saccharomyces cerevisiae (O'Gorman et al, 1991, Science 251:1351-1355; Lyznik et al, 1996, Nucleic Acids Res. 24(19):3784-3789); the Cre-loxP-tetracycline control switch (Gossen and Bujard, 1992, Proc. Natl. Acad. Sci. USA 89:5547-51); and ligand-regulated recombinase system (Kellendonk et al, 1999, J. Mol. Biol. 285:175-82). Desirably, the recombinase is highly active, e.g., the Cre-loxP or the FLPe system, and has enhanced thermostability (Rodrguez et al, 2000, Nature Genetics 25:139-40). In specific embodiments, the modulator gene encodes a Cre recombinase or an FLP recombinase and at least a portion of the positive marker gene (including its regulatory sequences) is flanked by either loxP target sites, which are specifically recognised by the Cre recombinase, or FRT target sites, which are specifically recognised by the FLP recombinase. Illustrative examples of loxP target site sequences include
5'-ATAACTTCGTATAGCATACATTATACGAAGTTAT-3' [SEQ ID NO:l] and 5'-TAACTTCGTATA-3' [SEQ ID NO:2]. An illustrative example of an FRT target site sequence is 5 '-GAAGTTCCTATAC-3 ' [SEQ ID NO:3] .
[0129] Several other recombination systems are also suitable for use in the present invention.
These include, for example, the Gin recombinase of phage Mu (Crisona et al, 1994, J. Mol. Biol. 243(3):437-457), the Pin recombinase of E. coli (see, e.g., Kutsukake et al, 1985, Gene 34(2-3):343- 350), the PinB, PinD and PinF from Shigella (Tominaga et al, 1991, J. Bacteriol. 173(13):4079- 4087), the R/RS system of the pSRl plasmid (Araki et al, 1992, J. Mol. Biol. 225(l):25-37) and the cin, hin and β-recombinases. Other recombination systems relevant to this invention described herein are those from Kluyveromyces species, phages, and integrating viruses (e.g., the SSVl-encoded integrase).
[0130] In certain embodiments, the recombinase system can be linked to a second inducible or repressible transcriptional regulation system. For example, a cell-specific Cre-loxP mediated recombination system (Gossen and Bujard, 1992, Proc. Natl. Acad. Sci. USA 89:5547-51) can be linked to a cell-specific tetracycline-dependent time switch (see, e.g., Ewald et al, 1996, Science 273:1384-1386; Furth et al., 1994, Proc. Natl. Acad. Sci. U.S.A. 91:9302-9306; St-Onge et al, 1996, Nucleic Acids Res. 24(19): 3875-7387). In an illustrative example, an altered cre gene with enhanced expression in mammalian cells is used (Gorski and Jones, 1999, Nucleic Acids Res. 27(9): 2059-2061).
[0131] In another illustrative example, the ligand-regulated recombinase system of Kellendonk et al. (1999, J. Mol. Biol. 285: 175-182) can be used. In this system, the ligand-binding domain (LBD) of a receptor, is fused to the Cre recombinase to increase specificity of the recombinase. In this way, the activity of the recombinase is controlled by the presence of the ligand in the cell for the nuclear receptor. The LBD suitably comprises a derivative of part or all of a nuclear receptor, where the part includes the ligand-binding portion of a nuclear receptor. The nuclear receptor may be endogenous to the host or may be derived from another species. The nuclear receptor derivative thereof may be selected from the groups comprising steroid-hormone dependent receptors, which include estrogen, androgen, adrenal glucocorticoid, aldosterone and progesterone receptors; nuclear hormone receptors, which include vitamin D, retinoid, thyroid hormone receptors; and orphan nuclear receptors, which include peroxisome proliferator activated receptors and lipid receptors such as, but not limited to, COUP-TFI/II and SF-1. Suitably, the ligand-binding portion of the nuclear receptor is a portion or derivative of a steroid-hormone dependent receptor and is desirably a derivative of the estrogen receptor LBD. Advantageously, the estrogen receptor LBD derivative exhibits reduced or absent affinity for endogenous estrogen and estrogen-related hormones, with reference to a normal, reference range of binding affinity. In certain embodiments of this type, the LBD of the estrogen receptor derivative exhibits affinity for non-endogenous estrogen hormone analogues such as tamoxifen and analogues thereof. The ligand-binding domain may be fused to the N- or C- terminus of the recombinase protein. In specific embodiments, the estrogen-receptor binding domain is fused to the C- terminus of the Cre recombinase protein.
[0132] In other embodiments, the modulator gene comprises a nucleic acid sequence encoding an antisense RNA molecule that directly blocks the translation of mRNA transcribed from the positive marker gene by binding to the mRNA and preventing protein translation. When employed, antisense RNAs should be at least about 10-20 nucleotides or greater in length, and be at least about 75% complementary to their target genes or gene transcripts such that expression of the targeted homologous sequence is precluded.
[0133] In still other embodiments, the modulator gene comprises a nucleic acid sequence encoding a ribozyme that functions to inhibit the translation of the mRNA of the positive marker gene. Ribozymes are enzymatic RNA molecules capable of catalysing the specific cleavage of RNA. The mechanism of ribozyme action involves sequence specific hybridisation of the ribozyme molecule to complementary target RNA, followed by an endonucleolytic cleavage. Within the scope of the invention are engineered hammerhead motif ribozyme molecules that specifically and efficiently catalyse endonucleolytic cleavage of target gene RNA sequences. Specific ribozyme cleavage sites within any potential RNA target are initially identified by scanning the target molecule for ribozyme cleavage sites which include the following sequences, GUA, GUU and GUC. Once identified, short RNA sequences of between 15 and 20 ribonucleotides corresponding to the region of the positive
marker gene containing the cleavage site may be evaluated for predicted structural features such as secondary structure that may render the oligonucleotide sequence unsuitable. When employed, ribozymes may be selected from the group consisting of hammerhead ribozymes, axehead ribozymes, newt satellite ribozymes, Tetrahymena ribozymes and RNAse P, and are designed according to methods known in the art based on the sequence of the target gene (for instance, see U.S. Pat. No. 5,741,679). The suitability of candidate targets may also be evaluated by testing their accessibility to hybridisation with complementary oligonucleotides, using ribonuclease protection assays.
[0134] In other embodiments, the modulator gene comprises a nucleic acid sequence encoding an antigen-binding molecule that is interactive with a protein product of the positive marker gene. For example, the antigen-binding molecules may comprise whole polyclonal antibodies. Such antibodies may be prepared by injecting a positive marker protein into a production species, which may include mice or rabbits, to obtain polyclonal antisera. Methods of producing polyclonal antibodies are well known to those skilled in the art. Exemplary protocols which may be used are described for example in Coligan et al, "Current Protocols In Immunology", (John Wiley & Sons, hie, 1991), and Ausubel et al, (1994-1998, supra), in particular Section HI of Chapter 11. In lieu of the polyclonal antisera obtained in the production species, monoclonal antibodies may be produced using the standard method as described, for example, by Kδhler and Milstein (1975, Nature 256, 495-497), or by more recent modifications thereof as described, for example, in Coligan et al, (1991, supra) by immortalising spleen or other antibody-producing cells derived from a production species which has been inoculated with target molecule of the invention. The invention also contemplates as antigen-binding molecules Fv, Fab, Fab' and F(ab')2 immunoglobulm fragments. Alternatively, the antigen-binding molecule may be in the form of a synthetic stabilised Fv fragment, a single variable region domain (also known as a dAbs), a "minibody" and the like as known in the art.
[0135] In still other embodiments, the modulator gene comprises a nucleic acid sequence encoding a RNA molecule that directly or indirectly attenuates or otherwise dismpts the expression of the positive marker gene by post-transcriptional gene silencing (PTGS). In these embodiments, the
PTGS conferred by the RNA molecules is sometimes referred to as RNA interference (RNAi). RNAi refers to interference with or destruction of the product of a target gene by introducing a single stranded or double stranded RNA (dsRNA), that is homologous to a transcript of the positive marker gene. Absolute homology is not required for RNAi, with a lower threshold being described at about
85% homology for a dsRNA of about 200 base pairs (Plasterk and Ketting, 2000, Current Opinion in
Genetics andDev. 10: 562-67). Therefore, depending on the length of the dsRNA, the RNAi-encoding nucleic acids can vary in the level of homology they contain toward the positive marker gene transcript, e.g., with dsRNAs of 100 to 200 base pairs having at least about 85% homology with the positive marker gene, and longer dsRNAs, i.e., 300 to 100 base pairs, having at least about 75% homology to the positive marker gene. RNA-encoding constructs that express a single RNA transcript
designed to anneal to a separately expressed RNA, or single constructs expressing separate transcripts from convergent promoters, are suiatbly at least about 100 nucleotides in length. RNA-encoding constructs that express a single RNA designed to form a dsRNA via internal folding are desirably at least about 200 nucleotides in length. [0136] Thus, in the above embodiments, expression of the nucleic acid sequence in the host cell produces a RNA molecule that comprises a targeting region having sequence identity with a nucleotide sequence of the positive marker gene and that attenuates or otherwise dismpts the expression of that gene. In certain embodiments, the targeting sequence displays at least 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% identity to a nucleotide sequence of the positive marker gene. In other embodiments, the targeting sequence hybridises to a nucleotide sequence of the positive marker gene under at least low stringency conditions, more suitably under at least medium stringency conditions and even more suitably under high stringency conditions. Reference herein to low stringency conditions include and encompass from at least about 1% v/v to at least about 15% v/v formamide and from at least about 1 M to at least about 2 M salt for hybridisation at 42° C, and at least about 1 M to at least about 2 M salt for washing at 42° C. Low stringency conditions also may include 1% bovine semm albumin (BSA), 1 mM EDTA, 0.5 M NaHP04 (pH 7.2), 7% SDS for hybridisation at 65° C, and (i) 2xSSC, 0.1% SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHP04 (pH 7.2), 5% SDS for washing at room temperature. Medium stringency conditions include and encompass from at least about 16% v/v to at least about 30% v/v formamide and from at least about 0.5 M to at least about 0.9 M salt for hybridisation at 42° C, and at least about 0.5 M to at least about 0.9 M salt for washing at 42° C. Medium stringency conditions also may include 1% BSA, 1 mM EDTA, 0.5 M NaHP04 (pH 7.2), 7% SDS for hybridisation at 65° C, and (i) 2 x SSC, 0.1% SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHP04 (pH 7.2), 5% SDS for washing at 42° C. High stringency conditions include and encompass from at least about 31% v/v to at least about 50% v/v formamide and from at least about 0.01 M to at least about 0.15 M salt for hybridisation at 42° C, and at least about 0.01 M to at least about 0.15 M salt for washing at 42° C. High stringency conditions also may include 1% BSA, 1 mM EDTA, 0.5 M NaHP04 (pH 7.2), 7% SDS for hybridisation at 65° C, and (i) 0.2 x SSC, 0.1% SDS; or (ii) 0.5% BSA, lmM EDTA, 40 mM NaHP04 (pH 7.2), 1% SDS for washing at a temperature in excess of 65° C. Desirably, the targeting sequence hybridises to a nucleotide sequence of the positive marker gene under physiological conditions. Other stringent conditions are well known in the art. A skilled artisan will recognise that various factors can be manipulated to optimise the specificity of the hybridisation. Optimisation of the stringency of the final washes can serve to ensure a high degree of hybridisation. For detailed examples, see Ausubel et al, supra at pages 2.10.1 to 2.10.16 and Sambrook et al. ("Molecular Cloning. A Laboratory Manual", Cold Spring Harbour Press, 1989) at sections 1.101 to 1.104.
[0137] Suitably, the targeting region has sequence identity with the sense strand or antisense strand of the positive marker gene. In certain embodiments, the RNA molecule is unpolyadenylated, which can lead to efficient reduction in expression of the marker gene, as described for example by Waterhouse et al in U.S. Patent No. 6,423,885. [0138] Typically, the length of the targeting region may vary from about 10 nucleotides (nt) up to a length equalling the length (in nucleotides) of the positive marker gene. Generally, the length of the targeting region is at least 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 nt, usually at least about 50 nt, more usually at least about 100 nt, especially at least about 150 nt, more especially at least about 200 nt, even more especially at least about 500 nt. It is expected that there is no upper limit to the total length of the targeting region, other than the total length of the positive marker gene. However for practical reason (such as e.g. stability of the targeting constructs described herein) it is expected that the length of the targeting region should not exceed 5000 nt, particularly should not exceed 2500 nt and could be limited to about 1000 nt.
[0139] The RNA molecule may further comprise one or more other targeting regions (e.g., from about 1 to about 10, or from about 1 to about 4, or from about 1 to about 2 other targeting regions) each of which has sequence identity with a nucleotide sequence of the positive marker gene. Generally, the targeting regions are identical or share at least 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% sequence identity with each other. [0140] The RNA molecule may further comprise a reverse complement of the targeting region.
Typically, in these embodiments, the RNA molecule further comprises a spacer sequence that spaces the targeting region from the reverse complement. The spacer sequence may comprise a sequence of nucleotides of at least about 100-500 nucleotides in length, or alternatively at least about 50-100 nucleotides in length and in a further alternative at least about 10-50 nucleotides in length. Typically, the spacer sequence is a non-coding sequence, which in some instances is an intron. In embodiments in which the spacer sequence is a non-intron spacer sequence, transcription of the nucleic acid sequence will produce a RNA molecule that forms a hairpin or stem-loop structure in which the stem is formed by hybridisation of the targeting region to the reverse complement and the loop is formed by the non-intron spacer sequence connecting these 'inverted repeats'. Alternatively, in embodiments in which the spacer sequence is an intron spacer sequence, the presence of intron exon splice junction sequences on either side of the intron sequence facilitates the removal of what would otherwise form a loop structure and the resulting RNA will form a double-stranded RNA (dsRNA) molecule, with optional overhanging 3' sequences at one or both ends. Such a dsRNA transcript is referred to herein as a "perfect hairpin". The RNA molecules may comprise a single hairpin or multiple hairpins including "bulges" of single-stranded RNA occurring adjacent to regions of double-stranded RNA sequences.
[0141] Alternatively, a dsRNA molecule as described above can be conveniently obtained using an additional polynucleotide from which a further RNA molecule is producible, comprising the reverse complement of the targeting region. In this embodiment, the reverse complement of the targeting region hybridises to the targeting region of the RNA molecule transcribed from the modulator gene.
[0142] In another example, a dsRNA molecule as described above is prepared using a modulator gene that comprises a duplex, wherein one strand of the duplex shares sequence identity with a nucleotide sequence of the positive marker gene and the other shares sequence identity with the complement of that nucleotide sequence. In this embodiment, the duplex is flanked by two promoters, one controlling the transcription of one of the strands, and the other controlling the transcription of the complementary strand. Transcription of both strands produces a pair of RNA molecules, each comprising a region that is complementary to a region of the other, thereby producing a dsRNA molecule that inhibits the expression of the marker gene.
[0143] In another example, PTGS of the positive marker gene is achieved using the strategy of Glassman et al described in U.S. Patent Application Publication No 2003/0036197. In this strategy, suitable nucleic acid sequences and their reverse complement can be used to alter the expression of any homologous, endogenous target RNA (i.e., comprising a transcript of the marker gene) which is in proximity to the suitable nucleic acid sequence and its reverse complement. The suitable nucleic acid sequence and its reverse complement can be either unrelated to any endogenous RNA in the host or can be encoded by any nucleic acid sequence in the genome of the host provided that nucleic acid sequence does not encode any target mRNA or any sequence that is substantially similar to the target RNA. Thus, in some embodiments of the present invention, the RNA molecule further comprises two complementary RNA regions which are unrelated to any endogenous RNA in the host cell and which are in proximity to the targeting region. In other embodiments, the RNA molecule further comprises two complementary RNA regions which are encoded by any nucleic acid sequence in the nucleome of the host cell provided that the sequence does not have sequence identity with the nucleotide sequence of the positive marker gene, wherein the regions are in proximity to the targeting region. In the above embodiments, one of the complementary RNA regions can be located upstream of the targeting region and the other downstream of the targeting region. Alternatively, both the complementary regions can be located either upstream or downstream of the targeting region or can be located within the targeting region itself.
2.2 Negative marker gene embodiments
[0144] In other embodiments, the marker gene comprises a nucleic acid sequence that encodes a negative selectable marker whose level or functional activity is generally increased, enhanced or otherwise elevated by the modulator gene. Such elevation includes substantial up-regulation of
expression of the negative selectable marker gene to above "basal" levels. Suitably, the expression of the modulator gene increases the level or functional activity of the negative selectable marker by at least about 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000, 5000, 10000%) or more when compared to the level or functional activity of the expression product in the absence of modulator gene expression. Negative selectable markers include any particular nucleic acid sequence (e.g., DNA or RNA), protein, peptide or amino acid sequences which, when introduced into cells or within the proximity of cells, causes the cells to lose viability e.g., when exposed to a certain agent or when the gene encoding the negative selectable marker is derepressed (i.e., the negative selectable marker becomes lethal to the cell under certain conditions). Examples of negative selectable markers (and their agents of lethality where applicable) include herpes simplex vims thymidine kinase (gancyclovir acyclovir or fialuridine), hypoxanthine guanine phosphoribosyl transferase (6-thioguanine or 6-thioxanthine), guanine phosphoribosyltransferase (6-thioguanine), cytosine deaminase (5-fluorocytosine) and toxicity gene expression products, which generally do not require agents of lethality to cause cell death. Illustrative toxicity gene expression products include diphtheria toxin, ricin toxin, ribonucleases such as barnase or RNAse TI, ribosome inhibiting proteins such as dianthin, pokeweed antiviral protein (PAP) as well as hypersensitive response-elicitor polypeptides which cause localised necrosis, or cell death in plants and occasionally the deposition of callose, the physical thickening of cell walls by lignification, and the synthesis of various antibiotic small molecules and proteins. Suitably, any hypersensitivity response elicitor polypeptide can be used, illustrative examples of which include, but are not limited to, P50, MLO, HRP polypeptides such as HRPN, PopAl, ParAl and other hypersensitive response elicitor polypeptides from Erwinia, Pseudomonas, Phytophthora, and Xanthamonas species.
[0145] Generally, the negative selectable marker gene is conditionally expressed; i.e., it is expressed when the modulator gene is incorporated into the genome via random integration. In one embodiment, the negatively selectable marker gene is expressed conditionally by operably linking at least the sequence encoding the negative selectable marker to an inducible transcriptional regulation system. Transactivators produced from the modulator gene in these inducible transcriptional regulation systems are designed to interact specifically with sequences engineered into regulatory elements operably connected to negatively selectable marker gene to induce transcription of that gene when the modulator gene is incorporated into the genome. Thus, in these embodiments, the modulator gene typically comprises a nucleic acid sequence encoding a transcriptional inducer and the targeting cassette comprises a promoter that is operably connected to a sequence encoding the negative selectable marker and to a binding site for the transcriptional inducer, whereby production of the transcriptional inducer in the host cell causes an increase or elevation in the level of the negative selectable marker.
[0146] Suitably, the transcriptional inducer comprises (a) at least one transcriptional activation domain, and (b) at least one DNA-binding domain that binds to, or otherwise interacts with, a promoter which is operably connected to the sequence encoding sequence encoding the negative selectable marker and with which the DNA-binding domain(s) interact(s) to activate transcription of the negative selectable marker gene. In operation, transcription of the modulator gene results in the production of the transcription inducer which, in turn, interacts via its DNA-binding domain(s) with the negative selectable marker gene promoter and via its transcriptional activation domain with transcriptional machinery of the host cell to activate transcription of the negative selectable marker gene, which results in an increase or elevation in the level or functional activity of the negative selectable marker.
[0147] The transcriptional activation domain is suitably but not exclusively selected from the acid transactivation domain (TAD) of HSV1-VP16 (e.g., amino acids 406 to 488, Triezenberg et al, 1988, Genes & Development 2:718-729; Triezenberg, 1995, Current Opinions in Genetics and Development 5:190-196; or amino acids 413 to 490, Regier et al, 1993, Proc Natl Acad Sci U S A. 90(3):883-887; or amino acid 411 to 487; or amino acids 453-499; or amino acids 413 to 454; or amino acids 410 to 452, Walker et al, 1993, Mol Cell Biol. 13(9):5233-5244; amino acids 411 to 455, Nettelbeck et al, 1998, Gene Ther. 5(12): 1656-1664), the activation domain of Oct-2 (e.g., amino acids 438 to 479, Tanaka et al, 1994, Mol Cell Biol. 14(9):6046-6055; or amino acids 3 to 154, Das et al, 1995, Nature. 374(6523):657-660), the activation domain of SP1 (e.g., amino acids 340 to 485, Courey and Tijan, 1988, Cell. 55(5):887-898), the activation domain of NFY (e.g., amino acids 1 to 233, Li et al, 1992, J Biol Chem. 267(13):8984-8990; van Hujisduijnen et al, 1990, EMBO J. 9(10):3119-3127; Sinha et al, 1995, Proc Natl Acad Sci USA. 92(5):1624-1628; Coustry et al. 1995, JBiol Chem. 270(l):468-475), the activation domain of ITF2 (e.g., amino acids 2 to 452, Seipel et al, 1992, EMBOJ. ll(13):4961-4968), the activation domain of c-Myc (e.g., amino acids 1 to 262, Eilers et al 1991, EMBO J. 10(1):133-141), the activation domain of CTF (e.g., amino acids 399 to 499, Mermod et al, 1989, Cell 58(4):741-753; Das and Herr, 1993, JBiol Chem 268(33):25026-25032) or the activation domain of P65 (e.g., amino acids 286-550).
[0148] Desirably, the DNA-binding domain is selected from the DNA-binding domain of the
Gal4 protein (e.g., amino acids 1 to 147, Chasman and Komberg, 1990, Mol Cell Biol. 10(6):2916- 2923), the DNA-binding domain of the LexA protein (e.g., amino acids 1 to 81, Kim et al, 1992, Science 10;255(5041):203-206; or amino acid 2-202; or the whole LexA protein e.g, amino acids 1 to 202, Brent and Ptashne, 1985, Cell 43(3 Pt 2):729-736), the DNA-binding domain of the lac represser (Lacl) protein (e.g., Brown et al, 1987, Cell 49(5):603-612; Fuerst et al, 1989, Proc Natl Acad Sci U S A. 86(8):2549-2553), the DNA-binding domain of the tetracycline repressor (TetR) protein (e.g, Gossen et al, 1992, Proc Natl Acad Sci USA. 89(12):5547-5551; Dingermann et al, 1992, EMBOJ. 11(4): 1487-1492) or the DNA-binding domain of the ZFHD1 protein (e.g, Pomerantz et al, 1995,
Science 267(5194):93-96). It is generally advantageous to add a nuclear localisation signal (NLS) to the 3' end of the DNA-binding domain.
[0149] The negative selectable marker gene promoter suitably comprises a c/s-acting sequence with which the transcriptional inducer interacts. The cts-acting sequence comprises a binding sequence for the transcriptional inducer and particularly for its DNA-binding domain. The binding sequence, therefore, depends on the choice of the DNA-binding domain of the transcription factor used for the expression system, and includes, but is not limited to: (A) a binding sequence for the Gal4 protein such as but not limited to: nucleotide sequence: 5'-CGGACAACTGTTGACCG-3' [SEQ ID NO:4] as for example described by Chasman and Kornberg (1990, supra); or nucleotide sequence: 5'- CGGAGGACTGTCCTCCG 3' [SEQ ID NO:5]; or nucleotide sequence: 5'- CGGAGTACTGTCCTCCG-3' [SEQ ID NO:6] as for example disclosed by Giniger et al. (1988, Proc Natl Acad Sci U S A. 85(2):382-386); (B) a binding sequence for the Gal4 protein such as but not limited to: nucleotide sequence: 5'-TACTGTATGTACATACAGTA-3' [SEQ ID NO:7]; or the LexA operator as for example disclosed by Brent and Ptashne (1984, Nature 312(5995):612-615); (C) a lac operator such as but not limited to nucleotide sequence: 5'-GAATTGTGAGGCTCACAATTC-3' [SEQ ID NO: 8], to which the Lad repressor protein binds, as for example described by Fuerst et al. (1989, supra) and Simons et al. (1984, Proc Natl Acad Sci USA. 81(6): 1624-1628); (D) a tetracycline operator (tet 0) such as but not limited to nucleotide sequence: 5'- TCGAGTTTACCACTCCCTATCAGTGATAGAGAAAAGTGAAAG-3' [SEQ ED NO:9] to which the tetracycline repressor (TetR) protein binds; (E) a binding sequence for the ZFHD-1 protein such as but not limited to: nucleotide sequence: 5'-TAATGATGGGCG-3' [SEQ ID NO: 10] as for example described by Pomeranz et al. (1995, supra); (F) a binding sequence for the c-Myc protein such as but not limited to: 5'-GGAAGCAGACCAGCTGGTCTGCTTCC-3' [SEQ ID NO: 11].
[0150] In other embodiments, conditional expression of the negative selectable marker gene is regulated by using a recombinase system that is used to turn on the expression of that gene. In illustrative embodiments of this type, the recombinase system comprises a removable intervening sequence interposed between a promoter and the negative selectable marker gene, which intervening sequence suppresses or otherwise dismpts the transcription of the marker gene from the promoter.
Suitably, the removable intervening sequence comprises a transcriptional termination that inhibits or otherwise suppresses transcription of downstream sequences. Desirably, the removable intervening sequence comprises target sites that are specifically recognised by a site specific recombinase protein encoded by the modulator gene, to remove the removable intervening sequence when the modulator gene is present in the genome and to thereby render the negative selectable marker gene in operable linkage with the promoter and to permit transcription of that marker gene. Alternatively, the recombinase system comprises a split or divided transgene including an upstream portion and a downstream portion of the negative selectable marker gene and a removable intervening sequence as
broadly described above, which is interposed between the upstream and downstream portions. The upstream portion is operably connected to a promoter but the removable intervening sequence inhibits or otherwise suppresses transcription of the downstream portion, thereby preventing expression of a functional negative selectable marker. Expression in the host cell of a site specific recombinase protein by the modulator gene removes the removable intervening sequence to thereby render a transcribable negative selectable marker gene which permits the expression of a functional negative selectable marker.
[0151] In some embodiments utilising a toxicity gene as the negative selectable marker gene, the regulation of expression of the toxicity gene may be 'leaky' or incomplete, leading to the death of the host cell or the loss of cell function. In these instances, an antidote gene is employed, either in the same or different targeting constmct, which is operably connected to a promoter, whereby an expression product of the antidote gene suppresses the toxic effects of an expression product of the toxicity gene. Illustrative examples of antidote genes include, but are not limited to, antisense, ribozyme or RNA inhibitory molecules that inhibits or otherwise reduces transcription or translation of the toxicity gene, antigen-binding molecules that are interactive with the toxin produced by the toxicity gene or other toxin inhibitors (e.g., barstar gene product, which inhibits the effects of the barnase gene). In such examples, the antidote gene is typically placed under the control of a promoter with a transcriptional activity in the host cell that results in the expression of the antidote gene to a level which prevents the death of the host cell or the loss of cell function under conditions where the in the absence of the modulator gene expression product but which does not prevent the death of the host cell or the loss of cell function in the presence of a modulator gene expression product. Usually, the antidote gene is placed in operably connection with a weak promoter. Generally, by "weak promoter" is intended a promoter that drives expression of a transcribable sequence at a low level. By low level is intended at levels of about {fraction (1/1000)} transcripts to about {fraction (1/100,000)} transcripts to about {fraction (1/500,000)} transcripts. Where a promoter is expressed at unacceptably high levels, portions of the promoter sequence can be deleted or modified to decrease expression levels. Non- limiting examples of weak promoters include promoters for such genes as dUTPase, gi and gE; and promoters such as SV40 early gene promoter and Rous Sarcoma vims LTR promoter. Non-limiting examples of weak promoters for use in plants include the core promoter of the Rsyn7 (WO 99/43838), the core 35S CaMV promoter, and the like. Other constitutive promoters include, for example, U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; 5,608,142 and 6,177,611.
2.3 Nucleic acid sequences of interest
[0152] In some embodiments, the nucleotide sequence of interest is an endogenous polynucleotide that is found naturally in the genome of the host. In other embodiments, the nucleotide sequence of interest is a recombinant or artificial nucleic acid that has been or is about to be
introduced into the genome of the host. Typically, the nucleotide sequence of interest is selected from 1) genes that are both transcribed into mRNA and translated into polypeptides as well as (2) genes that are only transcribed into RNA (e.g., functional RNA molecules such as rRNA, tRNA, RNAi, ribozymes and antisense RNA). [0153] In some embodiments, the nucleotide sequence of interest encodes a polypeptide for commercial manufacture, where the polypeptide is extracted or purified from the host, host cell or host part. Such polypeptides include, but are not limited to, polypeptides involved in the biosynthesis of antibiotics or secondary metabolites, immunogenic molecules for use in vaccines, cytokines and hormones. [0154] In other embodiments, the nucleotide sequence of interest encodes a product conferring a beneficial property to the host or other advantageous characteristic including, but not limited to, herbicide resistance or tolerance (e.g., glyphosate resistance or glufosinate resistance), stress tolerance (e.g., salt tolerance), sterility, improved food content or increased yields (e.g., a product affecting starch biosynthesis or modification such as starch branching enzymes, starch synthases, ADP-glucose pyrophosphorylase, products involved in fatty acid biosynthesis such as desaturases or hydroxylases and products altering sucrose metabolism such as invertases, sucrose isomerases or sucrose synthases) as well as disease resistance or tolerance (e.g., resistance to bacterial, viral, nematode, helminth, insect, protozoan or viral pathogens, resistance to cancers or tumours, resistance to autoimmune diseases, illustrative examples of which include: an antigen of tumour, self, bacterial, viral, nematode, helminth, insect, protozoan or viral origin; a product conferring insect resistance such as crystal toxin protein of Bacillus thuringiensis; a product conferring viral resistance such as a viral coat or capsid protein; a product conferring fungal resistance such as chitinase, β-l,3-glucanase or phytoalexins).
2.4 Promoters [0155] The targeting constructs of the present invention comprise promoters that modulate expression of nucleotide sequences encoding the marker and modulator, respectively and optionally the expression of a foreign or heterologous nucleotide sequence of interest. Promoters contemplated by the present invention may be native to the host organism or may be derived from an alternative source, where the promoter is functional in the host organism. The selection of a particular promoter depends on the cell type used to express the target gene. Some eukaryotic promoters have a broad host range while others are functional in a limited subset of cell types. Illustrative examples of promoter sequences that function in eukaryotic cells, including mammalian cells, include but are not limited to promoters from the simian vims (e.g., SV40), papilloma vims, adenovirus, human immunodeficiency vims (HIV), rous sarcoma vims, avian sarcoma vims, polyoma, cytomegalovirus, the long terminal repeats (LTR) of moloney leukemia a viral LTR, such as the LTR of the Moloney murine leukemia virus, the early and late promoters of SV40 and the thymidine kinase promoter of herpes simplex
vims, the promoters for 3-phosphoglycerate kinase or other glycolytic enzyme genes, the promoters of acid phosphatase genes, e.g., Pho5, as well as the promoters of the hypoxanthine phosphoribosyl transferase (HPTR), adenosine deaminase, pymvate kinase and β-actin genes. Other illustrative examples of promoters that are functional in prokaryotic or eukaryotic systems include the promoters of the lac system, the trp system, the TAC or TRC system, T7 promoter whose expression is directed by T7 RNA polymerase, the major operator and promoter regions of phage λ, the control regions for fd coat protein, the promoters of the yeast α-mating factors, the polyhedron promoter of the baculovirus system and other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various combinations thereof. [0156] In certain embodiments, the targeting constructs are useful for genetically modifying plant genomes and will therefore comprise promoters that operable in plant cells. Numerous promoters that are active in plant cells have been described in the literature, illustrative examples of which include the nopaline synthase (NOS) promoter, the octopine synthase (OCS) promoter (which is carried on tumour-inducing plasmids of Agrobacterium tumefaciens), the caulimovirus promoters such as the cauliflower mosaic vims (CaMV) 19S promoter and the CaMV 35S promoter, the figwort mosaic vims 35S-promoter, the light-inducible promoter from the small subunit of ribulose-l,5-bis- phosphate carboxylase (ssRUBISCO), the Adh promoter, the sucrose synthase promoter, the R gene complex promoter, the GST-II-27 gene promoter and the chlorophyll a/b binding protein gene promoter, etc. [0157] For the purpose of expression in source tissues of the plant, such as the leaf, seed, root or stem, it is sometimes desirable that the promoters driving expression of a particular gene have relatively high expression in these specific tissues. For this purpose, one may choose from a number of promoters for genes with tissue- or cell-specific or enhanced expression. Examples of such promoters include the chloroplast glutamine synthetase GS2 promoter from pea, the chloroplast fructose- 1,6- biphosphatase (FBPase) promoter from wheat, the nuclear photosynthetic ST-LSl promoter from potato, the serine/threonine kinase (PAL) promoter and the glucoamylase (CHS) promoter from Arabidopsis thaliana. Also reported to be active in photosynthetically active tissues are the ribulose- 1,5-bisphosphate carboxylase (RbcS) promoter from eastern larch (Larix laricina), the promoter for the cab gene, cab6, from pine, the promoter for the Cab-1 gene from wheat, the promoter for the CAB-1 gene from spinach, the promoter for the cablR gene from rice, the pymvate, orthophosphate dikinase (PPDK) promoter from com, the promoter for the tobacco Lhcbl*2 gene, the Arabidopsis thaliana SUC2 sucrose-H+ symporter and the promoter for the thylakoid membrane proteins from spinach (psaD, psaF, psaE, PC, FNR, atpC, atpD, cab, rbcS). Other promoters for the chlorophyll a/b-binding proteins may also be utilised in the invention, such as the promoters for LhcB gene and PsbP gene from white mustard.
[0158] For the purpose of expression in sink tissues of the plant, such as the tuber of the potato plant, the fruit of tomato, or the seed of corn, wheat, rice and barley, it is desirable that the promoters driving expression of the gene of interest have relatively high expression in these specific tissues. A number of promoters for genes with tuber-specific or tuber-enhanced expression are known, including the class I patatin promoter, the promoter for the potato tuber ADPGPP genes, both the large and small subunits, the sucrose synthase promoter, the promoter for the major tuber proteins including the 22 kd protein complexes and protease inhibitors, the promoter for the granule-bound starch synthase gene (GBSS) and other class I and II patatins promoters.
[0159] Other promoters can also be used to express a selected gene in specific tissues, such as seeds or fruits. Examples of such promoters include the 5' regulatory regions from such genes as napin, phaseolin, soybean trypsin inhibitor, ACP, stearoyl-ACP desaturase, soybean α' subunit of β- conglycinin (soy 7s), and oleosin. Further examples include the promoter for β-conglycinin. Also included are the zeins, which are a group of storage proteins found in com endosperm. Genomic clones for zein genes have been isolated and the promoters from these clones, including the 15 kD, 16 kD, 19 kD, 22 kD, 27 kD and genes, could also be used. Other promoters known to function, for example, in com include the promoters for the following genes: waxy, Brittle, Shrunken 2, Branching enzymes I and H, starch synthases, debranching enzymes, oleosins, glutelins and sucrose synthases. Examples of promoters suitable for expression in wheat include those promoters for the ADPglucose pyrosynthase (ADPGPP) subunits, the granule bound and other starch synthase, the branching and debranching enzymes, the embryogenesis-abundant proteins, the gliadins and the glutenins. Examples of such promoters in rice include those promoters for the ADPGPP subunits, the granule bound and other starch synthase, the branching enzymes, the debranching enzymes, sucrose synthases and the glutelins. Examples of such promoters for barley include those for the ADPGPP subunits, the granule bound and other starch synthase, the branching enzymes, the debranching enzymes, sucrose synthases, the hordeins, the embryo globulins and the aleurone specific proteins.
[0160] Root specific promoters may also be used. An example of such a promoter is the promoter for the acid chitinase gene. Expression in root tissue could also be accomplished using the root specific subdomains of the CaMV35S promoter that have been identified.
2.5 Ancillary regulatory elements and design features [0161] In certain embodiments, the targeting constructs of the present invention comprises a 3' non-translated sequence, which is operably linked to one or more of the marker gene, modulator gene and optional foreign or heterologous nucleotide sequence of interest (which are individually or collectively referred to herein as "targeting system polynucleotides") and which functions in the selected host cells to terminate transcription and/or to cause addition of a polyadenylated nucleotide sequence to the 3' end of a RNA sequence transcribed from the targeting system polynucleotide(s).
Thus, a 3' non-translated sequence refers to that portion of a gene comprising a nucleic acid segment that contains a transcriptional termination signal and/or a polyadenylation signal and any other regulatory signals (e.g., translational termination signals) capable of effecting mRNA processing or gene expression. The polyadenylation signal is characterised by causing the addition of polyadenylic acid tracts to the 3 ' end of the mRNA precursor. Polyadenylation signals are commonly recognised by the presence of homology to the canonical form 5' AATAAA-3' although variations are not uncommon. The 3' non-translated regulatory sequence desirably includes from about 50 to 1,000 nts and contains transcriptional and translational termination sequences that operable in the host cell.
[0162] Transcription of the targeting system polynucleotide above the level produced by a selected promoter can be conveniently enhanced using enhancers, which are cis-acting elements of DNA, usually about from 10 to 300 nts that act on a promoter to increase its transcription. Enhancers useful for constructing the chimeric constructs of the invention include, but are not limited to, a cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers. Examples of transcriptional enhancers for use in plants include, but are not restricted to, elements from the CaMV 35S promoter and octopine synthase genes as for example described by Last et al. (U.S. Patent No. 5,290,924). It is proposed that the use of an enhancer element such as the ocs element, and particularly multiple copies of the element, will act to increase the level of transcription from adjacent promoters when applied in the context of plant transformation. As transcribed but untranslated leader sequences can influence gene expression, one can also employ a particular leader sequence to enhance expression of a targeting system polynucleotide. Suitable leader sequences include those that comprise sequences selected to direct optimum expression of the targeting system polynucleotide. For example, such leader sequences include a consensus sequence which can increase or maintain mRNA stability and prevent inappropriate initiation of translation as for example described by Joshi (1987, Nucl Acid Res., 15:6643). However, other leader sequences, e.g., the leader sequence of RTBV, have a high degree of secondary structure that is expected to decrease mRNA stability and/or decrease translation of the mRNA. Thus, leader sequences (i) that do not have a high degree of secondary structure, (ii) that have a high degree of secondary structure where the secondary structure does not inhibit mRNA stability and/or decrease translation, or (iii) that are derived from genes that are highly expressed in plants, will be most desirable. Regulatory elements such as the sucrose synthase intron as, for example, described by Vasil et al. (1989, Plant Physiol, 91:5175), the Adh intron I as, for example, described by Callis et al. (1987, Genes Develop., H), or the TMV omega element as, for example, described by Gallie et al. (1989, The Plant Cell, 1:301) can also be included where desired. Other such regulatory elements useful in the practice of the invention are known to those of skill in the art. [0163] These enhancer elements are well known to persons skilled in the art, and can include the
ATG initiation codon and adjacent sequences. The initiation codon must be in phase with the reading
frame of the coding sequence relating to the foreign or endogenous DNA sequence to ensure translation of the entire sequence. The translation control signals and initiation codons can be of a variety of origins, both natural and synthetic. Translational initiation regions may be provided from the source of the transcriptional initiation region, or from the foreign or endogenous DNA sequence. The sequence can also be derived from the source of the promoter selected to drive transcription, and can be specifically modified so as to increase translation of the mRNA.
[0164] Additionally, targeting sequences may be employed to target a protein product of a targeting system polynucleotide (e.g., marker gene product or expression product of the foreign or exogenous nucleotide sequence of interest) to an intracellular compartment within cells or to the extracellular environment. For example, a nucleic acid sequence encoding a transit or signal peptide sequence may be operably linked to a sequence encoding a desired protein such that, when translated, the transit or signal peptide can transport the protein to a particular intracellular or extracellular destination, respectively, and can then be post-translationally removed. Transit or signal peptides act by facilitating the transport of proteins through intracellular membranes, e.g., periplasm, vacuole, vesicle, plastid and mitochondrial membranes, whereas signal peptides direct proteins through the extracellular membrane. For example, the transit or signal peptide can direct a desired protein to a particular organelle such as a plastid (e.g., a chloroplast), rather than to the cytoplasm. Thus, the targeting construct can further comprise a plastid transit peptide encoding nucleic acid sequence operably linked between a promoter and the targeting system polynucleotide. For example, reference may be made to Heijne et al. (1989, Eur. J. Biochem., 180:535) and Keegstra et al. (1989, Ann. Rev. Plant Physiol. Plant Mol. Biol, 40:471).
[0165] A targeting constmct can be introduced into a vector, such as a plasmid. Plasmid vectors include additional nucleic acid sequences that provide for easy selection, amplification, and transformation of the expression cassette in prokaryotic and eukaryotic cells, e.g., pUC-derived vectors, pSK-derived vectors, pGEM-derived vectors, pSP-derived vectors, or pBS-derived vectors. Additional nucleic acid sequences include origins of replication to provide for autonomous replication of the vector, selectable marker genes, desirably encoding antibiotic or herbicide resistance, unique multiple cloning sites providing for multiple sites to insert nucleic acid sequences or genes encoded in the chimeric construct and sequences that enhance transformation of prokaryotic and eukaryotic cells. [0166] For autonomous replication, the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question. Examples of bacterial origins of replication are the origins of replication of plasmids pBR322, pUC19, pACYC177, and pACYC184 permitting replication in E. coli, and pUBHO, pE194, pTA1060, and pAM.beta.l permitting replication in Bacillus. The origin of replication may be one having a mutation to make its function temperature-sensitive in a Bacillus cell (see, e.g., Ehrlich, 1978, Proc. Natl. Acad. Sci. USA 75:1433).
[0167] The length of the targeting vector required for successful site-specific homologous recombination is a critical parameter that is often dependent upon the particular gene targeted for creating a genetically modified target sequence. Vector length is dependent upon several factors. In most cases the entire vector length will be a minimum of 1 kb and usually will not exceed a maximum of 5000 kb, although vector length is also dependent upon the technology utilised to construct the vector. It is possible, for example, to construct a targeting vector with a cosmid, BAC, or YAC as the provider of the two regions of substantial homology thus generating a significantly large vector (Ananvoranich et al. 1997, BioTechniques 23: 812; Cocchia et al., 2000, Nucleic Acids Res., 28:E81). Factors influencing the length of those homology regions are discussed, for example, in Valenzuela et al. (2003, Nature Biotechnology 21(6): 652-659). Vector length also includes plasmid backbone sequences such as those encoding the origin of replication and bacterial drug resistance products such as ampicillin if these are not removed prior to transformation of cells with the vector.
[0168] In certain cases it can be advantageous to remove foreign or exogenous sequences which have been incorporated into the genome of cells upon site-specific homologous recombination between a targeting construct of the invention and cellular endogenous genomic target sequences. This is due to the potential negative effects expression of these sequences can have on cellular or organismal viability and survival. Alternatively, regulatory elements introduced into the genome of the host cell can adversely affect the expression of endogenous loci juxtaposed to these elements. The removal of sequences from the host genome is possible by a number of methodologies. Recombinase systems (e.g., Cre-lox and FLP-FRT) as mentioned, for example, above can be successfully applied for the removal of specific sequences introduced into cellular endogenous genomic sequence via the targeting constructs of the invention. For example, sequences encoding a positive selectable marker and corresponding regulatory elements can be flanked with lox P or FRT target recombination sites in the targeting construct prior to cellular transformation. After introduction of these sequences into the genome of the host cell a transient or stable expression of Cre or FLP recombinase will allow for removal, respectively, of one lox P or FRT site and all sequences positioned between the lox P or FRT sites. Many examples of the application of Cre-lox and FLP-FRT technologies for sequence removal exist. For example Kaartinen et al. (2001, Genesis 31:126) have demonstrated removal of a neomycin phosphotransferase cassette flanked by lox P sites through the transient expression of Cre via adenoviral infection of 16-cell-stage morulae. In addition, Xu et al. (2001, Genesis 30:1) have successfully removed a lox P flanked neomycin phosphotransferase cassette through both a cross with mice expressing Cre under the control of the Ella promoter as well as pronuclear injection of cells containing the cassette with a Cre-expressing plasmid. Thus, if the targeting constmct is configured to replace or correct cellular exonic sequences that are defective, such as for human gene therapy, the transcribed sequences and corresponding regulatory elements of the marker can be removed after
completion of site-specific homologous recombination between the targeting constmct and the host genome.
[0169] In addition, it is preferable that the targeting vector be linearised prior to its introduction into cells for the purposes of genetically modifying cellular endogenous genomic sequences as linear vectors exhibit significantly higher targeting frequencies than those that are circular (Thomas et al, 1986, Cell 44:49). It is, however, possible to successfully utilise targeting vectors for these purposes without linearisation.
2.6 Constmct types
[0170] The targeting constructs of the invention are organised such that the marker gene is operatively positioned between two flanking portions of the targeting cassette, which are sufficiently homologous with regions of the target site in the cellular genome to permit homologous recombination between the targeting cassette and the target site. For example, the target site may comprise an endogenous gene (e.g., comprising an exonic or coding sequence or a sequence encoding a functional RNA) and in certain embodiments, the marker gene is positioned by the flanking portions of the targeting cassette to dismpt or replace at least a portion of the endogenous gene thereby rendering the endogenous gene inactive and thus non-functional. In these embodiments, one of the flanking portions may be substantially homologous to at least a portion of the 5' untranslated sequence of the endogenous gene, and the other substantially homologous to at least a portion of the 3' untranslated sequence of the endogenous gene. Generally, such a non-conditional knock-out approach is used when targeting a small gene. Site-specific homologous recombination between the targeting constmct and the target site subsequently results in replacement of at least a portion of the endogenous gene with the marker gene. In these instances, the targeting construct is used to produce knockout organisms having a partial or complete loss of function in at least one allele of the endogenous gene.
[0171] In other embodiments, the targeting cassette further comprises a foreign or exogenous nucleotide sequence of interest (e.g., a foreign gene or regulatory element, or portion thereof) between the flanking portions of the cassette. In certain examples of this type, the nucleotide sequence of interest is positioned by the flanking portions of the targeting cassette to replace at least a portion of the endogenous gene with the nucleotide sequence of interest to produce an altered or modified endogenous gene or to replace the endogenous gene with the nucleotide sequence of interest or to introduce novel regulatory elements in operable connection with the endogenous gene. In other examples, the nucleotide sequence of interest is positioned by the flanking portions of the targeting cassette to replace a region of the genome (e.g., an intergenic sequence) that does not include gene sequences such as exons or coding sequences, introns, untranslated regions of exons or regulatory element regions such as promoters. In this scenario, cells can be selected that have undergone site- specific homologous recombination at a locus without inactivating that particular locus. In other
examples, the nucleotide sequence of interest is positioned by the flanking portions of the targeting cassette for introduction within an intron or non-coding region of the genome such that the introduction does not dismpt regulatory, exonic or coding sequences. In these examples, one of the flanking portions may be substantially homologous to an exon and portion of an intron of an endogenous gene, and the other substantially homologous to a portion of an intron and a downstream exon. Site-specific homologous recombination between the targeting construct and cellular endogenous genomic target sequences subsequently results in the positioning of the nucleotide sequence of interest within the intron and thus not disrupting critical exonic coding sequences. A requirement of this scenario is that the nucleotide sequence of interest must be under the control of regulatory elements present within the targeting cassette. In still other examples, the nucleotide sequence of interest lacks an upstream promoter in the targeting cassette and is positioned by the flanking portions of the targeting cassette for insertion into a region of the genome that is downstream of endogenous cellular regulatory elements. In these examples, one of the flanking portions may be substantially homologous to a promoter and portion of a 5' untranslated region and the other substantially homologous to an intron and downstream exon. In this scenario, the targeting constmct is designed to drive transcription of the nucleotide sequence of interest under the control of regulatory elements endogenous to the particular gene targeted by the targeting construct. Homologous recombination between the targeting construct and the target site provides regulatory elements specific for the targeted gene which subsequently drive the transcription of the nucleotide sequence of interest. The nucleotide sequence of interest will most often not be transcribed unless site-specific homologous recombination occurs, thereby providing endogenous cellular regulatory elements sufficient to drive transcription of these sequences. Additionally, it will be readily apparent to those of skill in the art that a targeting construct can be engineered to express more than one nucleotide sequence of interest or transgene, which can be the same (for example to increase the effective gene dosage) or different to achieve complementary effects. Each transgene can be under control of the same promoter (for example, through the use of internal ribosomal entry site (IRES) elements) or different promoters. IRES elements function as initiators of the efficient translation of reading frames. In particular, an IRES allows for the translation of two different genes on a single transcript and greatly facilitates the selection of cells expressing the transgenes at uniformly high levels. IRES elements are known in the art, illustrative examples of which include those IRES elements from poliovims Type I, the 5'UTR of encephalomyocarditis vims (EMV), of "Thelier's murine encephalomyelitis vims" (TMEV), of "foot and mouth disease vims" (FMDV) of "bovine enterovirus" (BEV), of "coxsackie B vims" (CBV), or of "human rhinovims" (HRV), or the "human immunoglobulm heavy chain binding protein" (BE?) 5'UTR, the Drosophila antennapediae 5'UTR or the Drosophila ultrabithorax 5'UTR, or genetic hybrids or fragments from the above-listed sequences. See also, e.g., Kim et al, 1992, Molecular and
Cellular Biology 12(8): 3636-3643; McBratney et al, 1993, Current Opinion in Cell Biology 5: 961-
965; Oh and Samow, 1993, Current Opinion in Genetics and Development 3: 295-300; and Ramesh et
al, 1996, Nucleic Acids Research 24:2697-2700. In the above instances, the targeting constructs are suitable for producing transgenic or knock-in organisms containing at least one copy of the nucleotide sequence of interest in the genome of the organism.
3. Host cells [0172] It will be understood that any cell type that is capable of undergoing site-specific homologous recombination can be manipulated by the present targeting constructs and methodology for the purposes of genetically modifying the genome of that cell type. Cells capable of undergoing site-specific homologous recombination can be derived from a variety of organisms and species including, but not limited to, human, murine, ovine, porcine, bovine, simian, canine and feline. In general, any eukaryotic cell capable of undergoing site-specific homologous recombination can be targeted successfully for the generation of a genetically modified sequence within the cellular genome by the targeting constructs and methods of the present invention. Illustrative examples of eukaryotic hosts include, but are not limited to, fungi such as yeast and filamentous fungi, including species of Aspergillus, Trichoderma, and Neurospora; animal hosts including vertebrate animals illustrative examples of which include fish (e.g., salmon, trout, tulapia, tuna, carp, flounder, halibut, swordfish, cod and zebrafish), birds (e.g., chickens, ducks, quail, pheasants and turkeys, and other jungle foul or game birds) and mammals (e.g., dogs, cats, horses, cows, buffalo, deer, sheep, rabbits, rodents such as mice, rats, hamsters and guinea pigs, goats, pigs, primates, marine mammals including dolphins and whales, as well as cell lines, such as human or other mammalian cell lines of any tissue or stem cell type (e.g., COS, NLH 3T3 CHO, BHK, 293, or HeLa cells), and stem cells, including pluripotent and non-pluripotent and embryonic stem cells, and non-human zygotes), as well as invertebrate animals illustrative examples of which include nematodes (representative generae of which include those that infect animals such as but not limited to Ancylostoma, Ascaridia, Ascaris, Bunostomum, Caenorhabditis, Capillaria, Chabertia, Cooperia, Dictyocaulus, Haernonchus, Heterakis, Nematodirus, Oesophagostomum, Ostertagia, Oxyuris, Parascaris, Strongylus, Toxascaris, Trichuris, Trichostrongylus, Tflichonema, Toxocara, Uncinaria, and those that infect plants such as but not limited to Bursaphalenchus, Criconerriella, Diiylenchus, Ditylenchus, Globodera, Helicotylenchus, Heterodera, Longidorus, Melodoigyne, Nacobbus, Paratylenchus, Pratylenchus, Radopholus, Rotelynchus, Tylenchus, and Xiphinernd) and other worms, drosophila, and other insects (such as from the families Apidae, Curculionidae, Scarabaeidae, Tephritidae, Tortricidae, amongst others, representative orders of which include Coleoptera, Diptera, Lepidoptera, and Homoptera.
[0173] In certain embodiments, the host is a plant which is suitably selected from monocotyledons, dicotyledons and gymnosperms. The plant may be an ornamental plant or crop plant.
Illustrative examples of ornamental plants include, but are not limited to, Malus spp, Crataegus spp, Rosa spp., Betula spp, Sorbus spp, Olea spp, Nerium spp, Salix spp, Populus spp. Illustrative examples of crop plants include plant species which are cultivated in order to produce a harvestable product such
as, but not limited to, Abelmoschus esculentus (okra), Acacia spp., Agave fourcroydes (henequen), Agave sisalana (sisal), Albizia spp., Allium fistulosum (bunching onion), Allium sativum (garlic), Allium spp. (onions), Alpinia galanga (greater galanga), Amaranthus caudatus, Amaranthus spp., Anacardium spp. (cashew), Ananas comosus (pineapple), Anethum graveolens (dill), Annona cherimola (cherimoya), Apios americana (American potatobean), Arachis hypogaea (peanut), Arctium spp. (burdock), Artemisia spp. (wormwood), Aspalathus linearis (redbush tea), Athertonia diversifolia, Atriplex nummularia (old man saltbush), Averrhoa carambola (starfruit), Azadirachta indica (neem), Backhousia spp., Bambusa spp. (bamboo), Beta vulgaris (sugar beet), Boehmeria nivea (ramie), bok choy, Boronia megastigma (sweet boronia), Brassica carinata (Abyssinian mustard), Brassica juncea (Indian mustard), Brassica napus (rapeseed), Brassica oleracea (cabbage, broccoli), Brassica oleracea var Albogabra (gai lum), Brassica parachinensis (choi sum), Brassica pekensis (Wong bok or Chinese cabbage), Brassica spp., Burcella obovata, Cajanus cajan (pigeon pea), Camellia sinensis (tea), Cannabis sativa (non-drug hemp), Capsicum spp., Carica spp. (papaya), Carthamus tinctorius (safflower), Carum carvi (caraway), Cassinia spp., Castanospermum australe (blackbean), Casuarina cunninghamiana (beefwood), Ceratonia siliqua (carob), Chamaemelum nobile (chamomile), Chamelaucium spp. (Geraldton wax), Chenopodium quinoa (quinoa), Chrysanthemum (Tanacetum), cinerariifolium (pyrethmm), Cicer arietinum (chickpea), Cichorium intybus (chicory), Clematis spp., Clianthus formosus (Sturt's desert pea), Cocos nucifera (coconut), Coffea spp. (coffee), Colocasia esculenta (taro), Coriandrum sativum (coriander), Crambe abyssinica (crambe), Crocus sativus (saffron), Cucurbita foetidissima (buffalo gourd), Cucurbita spp. (gourd), Cyamopsis tetragonoloba (guar), Cymbopogon spp. (lemongrass), Cytisus proliferus (tagasaste), Daucus carota (carrot), Desmanthus spp., Dioscorea esculenta (Asiatic yam), Dioscorea spp. (yams), Diospyros spp. (persimmon), Doronicum sp., Echinacea spp., Eleocharis dulcis (water chestnut), Eleusine coracana (finger millet), Emanthus arundinaceus, Eragrostis tef (tef), Erianthus arundinaceus, Eriobotrya japonica (loquat), Eucalyptus spp., Eucalyptus spp. (gil mallee), Euclea spp., Eugenia malaccensis (jumba), Euphorbia spp., Euphoria longana (longan), Eutrema wasabi (wasabi), Fagopyrum esculentum (buckwheat), Festuca arundinacea (tall fescue), Ficus spp. (fig), Flacourtia inermis, Flindersia grayliana (Queensland maple), Foeniculum olearia, Foeniculum vulgare (fennel), Garcinia mangostana (mangosteen), Glycine latifolia, Glycine max (soybean), Glycine max (vegetable soybean), Glycyrrhiza glabra (licorice), Gossypium spp. (cottons), Grevillea spp., Grindelia spp., Guizotia abyssinica (niger), Harpagophyllum sp., Helianthus annuus (high oleic sunflowers), Helianthus annuus (monosun sunflowers), Helianthus tuberosus (Jerusalem artichoke), Hibiscus cannabinus (kenaf), Hordeum bulbosum, Hordeum spp. (waxy barley), Hordeum vulgare (barley), Hordeum vulgare subsp. spontaneum, Humulus lupulus (hops), Hydrastis canadensis (golden seal), Hymenachne spp., Hyssopus officinalis (hyssop), Indigofera spp., Inga edulis (ice cream bean), Inocarpus tugiter, Ipomoea batatas (sweet potato), Ipomoea sp. (kang kong), Lablab purpureus (white lablab), Lactuca spp. (lettuce), Lathyrus spp. (vetch), Lavandula spp. (lavender), Lens spp. (lentil),
Lesquerella spp. (bladderpod), Leucaena spp., Lilium spp., Limnanthes spp. (meadowfoam), Linum usitatissimum (flax), Linum usitatissimum (linseed), Linum usitatissimum (Linola.TM.), Litchi chinensis (lychee), Lotus corniculatus (birdsfoot trefoil), Lotus pedunculatus, Lotus sp., Luffa spp., Lunaria annua (honesty), Lupinus mutabilis (pearl lupin), Lupinus spp. (lupin), Macadamia spp., Mangifera indica (mango), Manihot esculenta (cassava), Medicago spp. (lucerne), Medicago spp., Melaleuca spp. (tea tree), Melaleuca uncinata (broombush), Mentha tasmannia, Mentha spicata (spearmint), Mentha X piperita (peppermint), Momordica charantia (bitter melon), Musa spp. (banana), Myrciaria cauliflora (jaboticaba), Myrothamnus fiabellifolia, Nephelium lappaceum (rambutan), Nerine spp., Ocimum basilicum (basil), Oenanthe javanica (water dropwort), Oenothera biennis (evening primrose), Olea europaea (olive), Olearia sp., Origanum spp. (marjoram, oregano), Oryza spp. (rice), Oxalis tuberosa (oca), Ozothamnus spp. (rice flower), Pachyrrhizus ahipa (yam bean), Panax spp. (ginseng), Panicum miliaceum (common millet), Papaver spp. (poppy), Parthenium argentatum (guayule), Passiflora sp., Paulownia tomemtosa (princess tree), Pelargonium graveolens (rose geranium), Pelargonium sp., Pennisetum americanum (bulrush or pearl millet), Persoonia spp., Petroselinum crispum (parsley), Phacelia tanacetifolia (tansy), Phalaris canariensis (canary grass), Phalaris sp., Phaseolus coccineus (scarlet runner bean), Phaseolus lunatus (lima bean), Phaseolus spp., Phaseolus vulgaris (culinary bean), Phaseolus vulgaris (navy bean), Phaseolus vulgaris (red kidney bean), Pisum sativum (field pea), Plantago ovata (psyllium), Polygonum minus, Polygonum odoratum, Prunus mume (Japanese apricot), Psidium guajava (guava), Psophocarpus tetragonolobus (winged bean), Pyrus spp. (nashi), Raphanus satulus (long white radish or Daikon), Rhagodia spp. (saltbush), Ribes nigrum (black currant), Ricinus communis (castor bean), Rosmarinus officinalis (rosemary), Rungia klossii (rungia), Saccharum ojficinarum (sugar cane), Salvia officinalis (sage), Salvia sclarea (clary sage), Salvia sp., Sandersonia sp., Santalum acuminatum (sweet quandong), Santalum spp. (sandalwood), Sclerocarya caffra (mamla), Scutellaria galericulata (scullcap), Secale cereale (rye), Sesamum indicum (sesame), Setaria italica (foxtail millet), Simmondsia spp. (jojoba), Solanum spp., Sorghum almum (sorghum), Stachys betonica (wood betony), Stenanthemum scortechenii, Strychnos cocculoides (monkey orange), Stylosanthes spp. (stylo), Syzygium spp., Tasmannia lanceolata (mountain pepper), Terminalia karnbachii, Tlieobroma cacao (cocoa), Tfiymus vulgaris (thyme), Toona australis (red cedar), Trifoliium spp. (clovers), Trifolium alexandrinum (berseem clover), Trifolium resupinatum (persian clover), Triticurn spp., Triticum tauschii, Tylosema esculentum (morama bean), Valeriana sp. (valerian), Vernonia spp., Vetiver zizanioides (vetiver grass), Vicia benghalensis (purple vetch), Vicia faba (faba bean), Vicia narbonensis (narbon bean), Vicia sativa, Vicia spp., Vigna aconitifolia (mothbean), Vigna angularis (adzuki bean), Vigna mungo (black gram), Vigna radiata (mung bean), Vigna spp., Vigna unguiculata (cowpea), Vitis spp. (grapes), Voandzeia subterranea (bambarra groundnut), Triticosecale (triticale), Zea mays (bicolour sweetcom),
Zea mays (maize), Zea mays (sweet com), Zea mays subsp. mexicana (teosinte), Zieria spp., Zingiber officinale (ginger), Zizania spp. (wild rice), Ziziphus jujuba (common jujube). Desirable crops for the
practice of the present invention include Nicotiana tabacum (tobacco) and horticultural crops such as, for example, Ananas comosus (pineapple), Saccharum spp (sugar cane), Musa spp (banana), Lycopersicon esculentum (tomato) and Solanum tuber osum (potato).
4. Introduction of targeting construct into hosts [0174] The targeting constructs of the invention are introduced into a host by any suitable means including "transduction" and "transfection", which are art recognised as meaning the introduction of a nucleic acid, e.g., an expression vector, into a recipient cell by nucleic acid-mediated gene transfer. "Transformation", however, refers to a process in which a host's genotype is changed as a result of the cellular uptake of exogenous DNA or RNA, and, for example, the transformed cell comprises the expression system of the invention. There are many methods for introducing targeting constructs into cells. Typically, the method employed will depend on the choice of host cell. Technology for introduction of targeting constmcts into host cells is well known to those of skill in the art. Four general classes of methods for delivering nucleic acid molecules into cells have been described: (1) chemical methods such as calcium phosphate precipitation, polyethylene glycol (PEG)-mediate precipitation and lipofection; (2) physical methods such as microinjection, electroporation, acceleration methods and vacuum infiltration; (3) vector based methods such as bacterial and viral vector-mediated transformation; and (4) receptor-mediated. Transformation techniques that fall within these and other classes are well known to workers in the art, and new techniques are continually becoming known. The particular choice of a transformation technology will be determined by its efficiency to transform certain host species as well as the experience and preference of the person practising the invention with a particular methodology of choice. It will be apparent to the skilled person that the particular choice of a transformation system to introduce a targeting construct into cells is not essential to or a limitation of the invention, provided it achieves an acceptable level of nucleic acid transfer. [0175] Thus, the targeting constructs are introduced into tissues or host cells by any number of routes, including viral infection, microinjection, electroporation, or fusion of vesicles. Jet injection may also be used for intra-muscular administration (as described for example by Furth et al, Anal Biochem 205:365-368 (1992)). The targeting constructs may be coated onto microprojectiles, and delivered into a host cell or into tissue by a particle bombardment device, or "gene gun" (see, for example, Tang βt al, Nature 356:152-154 (1992)). Alternatively, the targeting constmcts can be fed directly to, or injected into, the host organism or it may be introduced into the cell (i.e., intracellularly) or introduced extracellularly into a cavity, interstitial space, into the circulation of an organism, introduced orally, etc. Methods for oral introduction include direct mixing of the targeting constructs with food of the organism. In certain embodiments, a hydrodynamic nucleic acid administration protocol is employed (e.g., see Chang et al, 2001, J. Virol. 75:3469-3473; Liu et al, 1999, Gene Ther.
6:1258-1266; Wolff et al, 1990, Science 247:1465-1468; Zhang et al, 1999, Hum. Gene Ther. 10:1735-1737; and Zhang et al, 1999, Gene Ther. 7:1344-1349).
[0176] Certain embodiments of the present invention are concerned with introducing the expression system of the invention into plant cells. Guidance in the practical implementation of transformation systems for plant improvement is provided, for example, by Birch (1997, Annu. Rev. Plant Physiol Plant Molec. Biol. 48: 297-326). Thus, in these embodiments, recipient plant cells are employed that are susceptible to transformation and subsequent regeneration into stably transformed, fertile plants. For monocot transformation for example, immature embryos, meristematic tissue, gametic tissue, embryogenic suspension cultures or embryogenic callus tissue can be employed as a source of recipient cells which is useful in the practice of the invention. For dicot transformation, organ and tissue cultures can be employed as a source of recipient cells. Thus, tissues, e.g., leaves, seed and roots, of dicots can provide a source of recipient cells useful in the practice of the invention. Cultured susceptible recipient cells are suitably grown on solid supports. Nutrients are provided to the cultures in the form of media and the environmental conditions for the cultures are controlled. Media and environmental conditions which support the growth of regenerable plant cultures are well known to the art.
[0177] In principle both dicotyledonous and monocotyledonous plants that are amenable to transformation, can be modified by introducing a targeting construct into a recipient cell and growing a new plant that harbours the targeting constmct of the invention. Illustrative transformation methods include Agrobacterium-mediated transfer, Cauliflower mosaic vims (CaMV)-mediated transfer, electroporation, microprojectile bombardment, microinjection, calcium phosphate precipitation or polyethylene glycol precipitation, pollen-mediated transfer or combination thereof. Transformation techniques that fall within these and other classes are well known to workers in the art, and the particular choice of a transformation technology will be determined by its efficiency to transform the selected host species.
[0178] The methods used to regenerate transformed cells into differentiated plants are not critical to this invention, and any method suitable for a target plant can be employed. Normally, a plant cell is regenerated to obtain a whole plant following a transformation process. Regeneration from protoplasts varies from species to species of plants, but generally a suspension of protoplasts is first made. In certain species, embryo formation can then be induced from the protoplast suspension, to the stage of ripening and germination as natural embryos. The culture media will generally contain various amino acids and hormones, necessary for growth and regeneration. Examples of hormones utilised include auxins and cytokinins. It is sometimes advantageous to add glutamic acid and proline to the medium, especially for such species as com and alfalfa. Efficient regeneration will depend on the medium, on the genotype, and on the history of the culture. If these variables are controlled, regeneration is reproducible. Regeneration also occurs from plant callus, explants, organs or parts. Transformation
can be performed in the context of organ or plant part regeneration as, for example, described in Methods in Enzymology, Vol. 118 and Klee et al. (1987, Annual Review of Plant Physiology, 38:467). Utilising the leaf disk-transformation-regeneration method of Horsch et al. (1985, Science, 227: 1229), disks are cultured on selective media, followed by shoot formation in about 2-4 weeks. Shoots that develop are excised from calli and transplanted to appropriate root-inducing selective medium. Rooted plantlets are transplanted to soil as soon as possible after roots appear. The plantlets can be repotted as required, until maturity is reached.
[0179] In vegetatively propagated crops, the mature transgenic plants are propagated by the taking of cuttings or by tissue culture techniques to produce multiple identical plants. Selection of desirable transgenotes is made and new varieties are obtained and propagated vegetatively for commercial use.
[0180] In seed propagated crops, the mature transgenic plants can be self-crossed to produce a homozygous inbred plant. The inbred plant produces seed containing the newly introduced foreign gene(s). These seeds can be grown to produce plants that would produce the selected phenotype, e.g., early flowering.
[0181] Genetically modified plants derived from plant cells genetically modified through utilisation of the targeting constmcts of the invention include, but are not limited to, a transgenic TO or R0 plant, i.e., the first plant regenerated from transformed plant cells, a genetically modified TI or Rl plant, i.e., the first generation progeny plant, and progeny plants of further generations derived therefrom which the targeting cassette or derivative thereof in their genomes.
[0182] Parts obtained from the regenerated plant, such as flowers, seeds, leaves, branches, fruit, and the like are included in the invention, provided that these parts comprise cells that have been transformed as described. Progeny and variants, and mutants of the regenerated plants are also included within the scope of the invention, provided that these parts comprise the targeting cassette or derivative thereof in their cellular genomic sequences.
[0183] It will be appreciated that the literature describes numerous techniques for regenerating specific plant types and more are continually becoming known. Those of ordinary skill in the art can refer to the literature for details and select suitable techniques without undue experimentation.
[0184] To confirm the presence of any of the targeting system polynucleotides in the regenerating plants, a variety of assays may be performed. Such assays include, for example, "molecular biological" assays well known to those of skill in the art, such as Southern and Northern blotting and PCR. A protein expressed by the heterologous DNA may be analysed by high performance liquid chromatography or ELISA (e.g., nptll) as is well known in the art.
[0185] When the creation of a genetically modified animal containing a modification produced through use of the targeting constmcts of the present invention is desired, advantageous cell types for this purpose are embryonic stem cells. These cells are generally derived from the inner cell mass of preimplantation embryos and propagated in tissue culture for genetic manipulation. Upon genetically modifying the endogenous genomic sequences of the embryonic stem cells through the application of the targeting constructs, the cells are introduced into blastocysts via microinjection techniques and the blastocysts implanted into pseudopregnant female hosts (Hogan et al. (editor) (1994), Manipulating the Mouse Embryo, A laboratory manual, Cold Spring Harbor Laboratory Press, New York). Alternatively, morala aggregation methods can be implemented for the creation of embryos containing genetically modified stem cells (Kong et al. 2000, Lab Anim. 29:25). Embryos which survive through postnatal stages often exhibit a chimeric cellular content in which a certain percentage of cells are derived from blastocyst origin and a certain percentage of cells are derived from those mutated by targeting construct. Chimeric animals can subsequently be bred to heterozyogosity and homozygosity for the allele genetically modified by targeting constmct. Where mating is used to produce genetically modified or transgenic progeny, the transgenic animal may be back-crossed to a parental line, otherwise inbred or cross-bred with animals possessing other desirable genetic characteristics. The progeny may be evaluated for the loss of function of at least one allele targeted by a targeting construct of the invention or for presence of a nucleotide sequence of interest using any suitable screening method. Screening may be accomplished by Southern or northern analysis using a probe that is complementary to at least a portion of the endogenous gene or the nucleotide sequence of interest (and/or a region flanking that sequence) or by PCR using primers complementary to portions of the nucleotide sequence of interest (and/or a region flanking that sequence). Western blot analysis using an antibody against a protein encoded by the nucleotide sequence of interest may be employed as an alternative or additional method for screening. Alternative or additional methods for evaluating the presence of a nucleotide sequence of interest include without limitation suitable biochemical assays such as enzyme and or immunological assays, histological stains for particular markers or enzyme activities and the like.
[0186] The present invention also contemplates cells that have undergone site-specific homologous recombination using the targeting constructs and methods described herein. In addition, the presently described invention includes transgenic non-human animals which have been derived from cells which have undergone site-specific homologous recombination utilising targeting constmcts and methods described herein. Also included are transgenic plants which have been derived from cells which have undergone site-specific homologous recombination utilising targeting constructs and methods described herein. Plants have previously been demonstrated to undergo site- specific homologous recombination as well as gene targeting via positive-negative selection and are therefore amenable to the targeting constructs and methods described herein
[0187] In order that the invention may be readily understood and put into practical effect, particular preferred embodiments will now be described by way of the following non-limiting example.
EXAMPLES EXAMPLE 1 Construction of PTGS-Neo targeting construct
[0188] The aim of this targeting construct is to downregulate expression of the neomycin phosphotransferase gene to basal levels in embryonic stem cells containing random integration events, as opposed to homologous recombination of targeting constructs. This will be achieved using RNA interference (RNAi). The mouse U6 promoter will be incorporated into the construct to express specific short interfering RNA hairpins directed against the neomycin mRNA transcripts. This U6 promoter-RNAiraeo cassette will be positioned external of the targeting cassette which is designed for incorporation into the mouse genome via site-specific homologous recombination. ES cells containing random integrations (incorporate mouse U6 and siRNA into a random genomic location) will lose resistance to G418, since the siRNAs directed against neomycin mRNA transcripts will be expressed and degrade or knockdown these transcripts. This constmct will therefore enrich for homologous recombination events and reduce the number of colonies to be screened by Southern blot hybridisation.
OVERVIEW
[1] Vector backbone
[0189] The combined mouse U6 promoter and siRNA templates (RNAi fragments) will be cloned into the Kpnl site at the 5' end and SacH site at the 3' end of the cloning sites in pLOz and pOziπ vectors, as shown in Figures 1 and 2, respectively. These vectors were made using the pBluescript backbone, and cloning the PGK neo cassette into the SaWXbaϊ site. Oligonucleotides comprising LoxP and FRT sites were cloned into KpnVSali. and XbaVNotl sites.
[2] Construction fragments
A) U6 PROMOTER [0190] Mouse U6 promoter fragments (5' and 3') will be generated by PCR using the subcloned mouse U6 RNA gene as a template.
B) SIRNA [0191] Cloned only after the mU6 promoter. The 3' end of the mU6 promoter has been designed to allow the siRNA hairpin template to be cloned into a Bbs\ site thereby allowing the coding sequence of the RNA hairpin to start at (+1) with the preferred G nucleotide. siRNA hairpin design is as follows:
[0192] Bbsl overhang - sense siRNA template -9bp loop- antisense si RNA template - terminator - restriction cloning sites
c) AMP HOMOLOGY FRAGMENT
[0193] Contains new restriction sites for linearisation of targeting vector. Cloned at the vector Seal and Psil sites.
5'mU6 PROMOTER
[0194] The mouse U6 RNA gene has already been successfully subcloned into the TOPO vector.
The mouse U6 RNA gene was amplified off C57BL/6 genomic DNA with PCR primers: PmU6_01 5'- GCTCCACCCACATTGTCTAATCAC-3' [SEQ ID NO: 12], PmU6_02 5'-GAACCAACTCCTTGTCCTCTTACG-3' [SEQ ID NO:13]. The PCR product was cloned into the ρCR2.1-TOPO vector (Invitrogen) and named pCR2.1-TOPO_mU6. ρCR2.1-TOPO_mU6 (see Figure 3) will be used as template for the following PCR:
PCR primers
[0195] P232_01 [SEQ JX> NO:14] [0196] 5' CAAACACTCGAGAGATCCGACGCCGCCATCTCTA Spacer Xhol mU6 promoter homology
[0197] P232_02 [SEQ ID NO: 15]
[0198] 5' CAAACAGGATCCGAAGACCACAAACAAGGCTTTTCTCCAAGG Spacer BamHI Bbsl mU6 promoter homology [0199] Predicted size of PCR Product: 349 bp
Cloning and PCR screen
[0200] The 5'mU6 fragment will be cloned into the Xhol and BairiHI sites of pBluescript_II_SK
(2.96 kb) to create the vector 232_pBSK_5'mU6 (3.25 kb) (Figure 4). Positive clones can be detected by PCR using P232_05 and M13rev, product 653 bp. Negative control = pBSK with same primers (product 373 bp).
siRNA TEMPLATE
Oligonucleotides
[0201] P232_03 [SEQ ID NO:16] S' πTGACCGACCTGTCCGGTGCCITC^^G/tG tGGCACCGGACAGGTCGGTCTTrTTGGTACCGC Bbsl siRNA template 9 p loop siRNA template RNA Kpnl Notl Overhang (sense) (anti-sense) pol III Terminator
[0202] P232_04 [SEQ LD NO: 17] 5' GGCCGCGGTACCAAAAAGACCGACCTGTCCGGTGCCrcrC'rrG^GGCACCGGACAGGTCGGT Notl overhang Kpnl RNA complementary siRNA 9 bp loop complementary siRNA Pol III (anti-sense) (sense) Terminator
Annealing [0203] The oligonucleotides are to be annealed to form the hairpin siRNA template:
TTTGACCGACCTGTCCGGTGCCTTCAAGAGAGGCACCGGACAGGTCGGTCTTTTTGGTACCGC TGGCTGGACAGGCCACGGAAGTTCTCTCCGTGGCCTGTCCAGCCAGAAAAACCATGGCGCCGG
Cloning and PCR screen
[0204] The annealed oligonucleotides will be cloned into the Bbsl and Notl sites of the 232_pBSK_5'mU6 vector to create 232_pBSK_5'mU6_siRΝA (Figure 5). In doing so the Bbsl site will be destroyed. **Note that the oligonucleotides can only be cloned after the 5'mU6 promoter since this fragment introduces the Bbsl site.
[0205] For PCR screening, the Ml 3 fwd and rev primers can be used. It is advisable to use the
232_pBSK_5'mU6 plasmid with these primers as a negative control (expected band size 492 bp) for size comparison. If siRNA template has been successfully cloned then the PCR screen will give a product: 540 bp
SUBCLONING 5'RNAI (Kpnl SITE)
[0206] The 5 'RNAi fragment (402 bp) will be cloned into the Kpnl site of pLOz and pOzIII (see
Figures 1 and 2, respectively) to yield the vectors 232 jpLOz_5 'RNAi and 232_pLOzUI_5'RNAi (see Figures 6 and 7, respectively).
3'mU6 PROMOTER
[0207] The same plasmid template as used for PCR amplification of the 5'mU6 promoter is used for PCR amplification of the 3'mU6 promoter. This PCR uses the same reverse primer as for the 5 'mU6 promoter.
PCR primers
[0208] P232_06 [SEQ ID NO:18] 5 ' CAAACACTCGAGCCGCGGCCGCGATCCGACGCCGCCATCTCTA Spacer Xhol SacII Notl mU6 promoter homology
[0209] P232_02 [SEQ ED NO: 15] 5' CAAACAGGATCCGAAGACCACAAACAAGGCTTTTCTCCAAGG Spacer BamHI Bbsl mU6 promoter homology
[0210] Predicted size of PCR Product:: 355 bp
Cloning and PCR screen
[0211] The 3'mU6 fragment will be cloned into the Xhol and BamHI sites of pBluescript_II_SK
(2.96 kb) to create the vector 232_pBSK_3'mU6 (3.25kb) (Figure 9). Positive clones can be detected by PCR using P232_05 and M13rev, product 662 bp. Negative control = pBSK with same primers (373bp).
3'sIRNA TEMPLATE
Oligonucleotides
[0212] P232_08 [SEQ ID NO:19]
S' TTTGCGAAACATCGCATCGAGCrrC/t^G.iGVfGCTCGATGCGATGTTTCGCTTTTTCCGCGGAGCT Bbsl siRNA template 9 bp loop siRNA template RNA SacII Sacl overhang overhang (sense) (anti-sense) polIII terminator
[0213] P232_09 [SEQ ID NO:20]
5' CCGCGGAAAAAGCGAAACATCGCATCGAGCrCTCTrG^^GCTCGATGCGATGTTTCG SacII RNA complementary si RNA 9 bp loop complementary si RNA polIII (anti-sense) (sense) terminator
Annealing [0214] The oligonucleotides are to be annealed to form the hairpin siRNA template:
TTTGCGAAACATCGCATCGAGCTTCAAGAGAGCTCGATGCGATGTTTCGCTTTTTCCGCGGAGCT GCTTTGTAGCGTAGCTCGAAGTTCTCTCGAGCTACGCTACAAAGCGAAAAAGGCGCC
Cloning and PCR screen [0215] The annealed oligos will be cloned into the Bbsl and S cl sites of the 232_pBSK_3 'mU6 vector to create 232_pBSK_3'RNAi (Figure 10). In doing so the Bbsl site will be destroyed.
[0216] For PCR screen, the Ml 3 fwd and rev primers can be used. It is advisable to use the
232_pBSK_5'mU6 plasmid with these primers as a negative control (expected band size 492 bp) for size comparison. If siRNA template has been successfully cloned then the PCR screen will give a product: 530bp
SUBCLONING 3'RNAI (Sacil SITE)
[0217] Note that the 5' or 3' mU6_siRNA fragments can be cloned in any order into pLOz and pOziπ. The 3'mU6_siRNA fragment (377 bp) will be cloned into the SαcII sites of pLOz and pOzHJ to yield 232_pLOz_RNAi and 232_pLOzIII_RNAi, respectively (see Figures 10 and 11).
AMPR HOMOLOGY FRAGMENT [0218] Template is 232j>L0z or 232_pQzIII
PCR primers
[0219] P232_13 [SEQ ID O:21] 5' CAAACATTATAACGCGTCGCGAGTTTAAACAGGGATTTTGCCGATTTCG Spacer Psil M Nrul Pmel Ampr homology
[0220] P232 4 [SEQ D NO:22] 5' TCACGCTCGTCGTTTGGTATG Ampr homology
Predicted PCR Product: 785bp
Cloning and PCR screen [0221] The Ampr homology fragment will be subcloned into the pPCR vector to produce the vector 232jpPCR_Amp_hom. Positive clones can be detected by PCR using P232_13/14 if required (785bp product).
CLONING AMP HOMOLOGY FRAGMENT
[0222] The Amp homology fragment will be cloned into the Sacl and Psil sites of the 232_pLOz_RNAi and 232jpOzH[_RNAi to create the vectors pLiN and pFLiN, respectively (Figures 13 and 14). The purpose is to add extra restriction sites for linearization. This is the final cloning step.
EXAMPLE 2 Construction & testing 3 different Neo siRNA's for best gene knockdown result
ril siRNA Neo TARGETS
[0223] siRNAs #1 and #2 will be used in construction of 232_pLiN and 232jpFLiN, with one at 5' & 3' ends. The aim of these new constructs is to determine which of the 3 siRNAs works best at knocking down Neomycin resistance by using the same siRNA at both 5' & 3' ends.
Sequences of siRNAs
[0224] siRNA target #1 : 5'-AAGCGAAACATCGCATCGAGC [SEQ ID NO:23]
[0225] siRNA target #2 : 5'-AAGACCGACCTGTCCGGTGCC [SEQ ED NO:24] [0226] siRNA target #3 : 5'-AAGAGCTTGGCGGCGAATGGG [SEQ ID NO:25]
f 21 NEW VECTORS
[0227] a) 232_pLiN_II and 232_pFLiN_II. These vectors will have siRNA Neo target #1 @ 5' and 3' ends.
[0228] b) 232_pLiN_πi and 232_pFLiN_III. These vectors will have siRNA Neo target #2 @ 5 ' and 3' ends.
[0229] c) 232jpLiN_IV and 232_pFLiN Tv7 These vectors will have siRNA Neo target #3 @ 5 ' and 3' ends.
1. 232_pLiN_II & 232_pFLiN_II
A) OLIGONUCLEOTIDES [0230] P232_16 [SEQ ID NO:26]
5* TJ3GCGAAACATCGCATCGAGCrrC^G^G^GCTCGATGCGATGTTTCGCTTTTTGGTACCGC
Bbsl siRNA template #1 9bp loop siRNA template terminator Kpnl Notl
Overhang (sense) (anti-sense)
[0231] P232 7 [SEQ ID NO:27] 5' GGCC∞GGTACCAAAAAGCGAAACATCGCATCGAGCrC'rC7Tσ^GCTCGATGCGATGTTTCG Notl overhang Kpnl terminator complementary siRNA 9bp loop complementary siRNA (anti-sense) (sense)
B) ANNEALING
[0232] The oligonucleotides are to be annealed to form the hairpin siRNA #1 template: TTTGCGAAACATCGCATCGAGCTTCAAGAGAGCTCGATGCGATGTTTCGCTTTTTGGTACCGC GCTTTGTAGCGTAGCTCGAAGTTCTCTCGAGCTACGCTACAAAGCGAAAAACCATGGCGCCGG
c) CLONING AND PCR SCREEN
[0233] The annealed oligonucleotides will be cloned into the Bbsl and Notl sites of the
232jpBSK_5'mU6 vector to create 232_pBSK_5'RΝAi#2 (see Figure 15). In doing so the Bbsl site will be destroyed. For PCR screen, the Ml 3 fwd and rev primers can be used. It is advisable to use the 232_pBSK_5'mU6 plasmid with these primers as a negative control (expected band size 492bp) for size comparison. If 5'siRNA#l oligo has been successfully cloned then the PCR screen will give a product: 544bp
D) SUBCLONING 5'SlRNA#2 (KPNl SITE) [0234] The 5'siRNA#2 fragment (393bp) will be cloned into the Kpnl site of
232_pLOz_3'RNAi and pOzIII_3'RNAi (from original design) to create the vectors 232_pLiN_ϋ (Figures 16) and 232_pFLiN_π (not shown), respectively.
2. pLiNJII & pFLiNJII
A) OLIGONUCLEOTIDES [0235] P232J8 [SEQ ID NQ:28]
5' TTTGACCGACCTGTCCGGTGCC7 C^G^G^GGCACCGGACAGGTCGGTCTTTTTCCGCGGAGCT
Bbsl overhang siRNA template #2 9bp loop siRNA template #2 RNA polIII SacII Sad overhang (sense) (antisense) terminator
[0236] P232_19 [SEQ ID NO:29]
5 ' CCGCGGAAAAAGACCGACCTGTCCGGTGCCΓCΓCΓΓG GGCACCGGACAGGTCGGT SacII RNA polIII complementary siRNA 9bp loop complementary siRNA Terminator (anti-sense) (sense)
B) ANNEALING [0237] The oligonucleotides are to be annealed to form the hairpin siRNA #2 template:
TTTGACCGACCTGTCCGGTGCCTTCAAGAGAGGCACCGGACAGGTCGGTCTTTTTCCGCGGAGCT TGGCTGGACAGGCCACGGAAGTTCTCTCCGTGGCCTGTCCAGCCAGAAAAAGGCGCC
c) CLONING ANDPCR SCREEN [0238] The annealed oligonucleotides will be cloned into the Bbsl and Sαcl sites of the
232_pBSK_3'mU6 vector to create 232_pBSK_3'RNAi#2 (Figure 17). In doing so the Bbsl site will be destroyed. For PCR screen, the Ml 3 fwd and rev primers can be used. It is advisable to use the 232_pBSK_5'mU6 plasmid with these primers as a negative control (expected band size 492bp) for size comparison. If 3'siRNA#2 oligo has been successfully cloned then the PCR screen will give a product: 530bp.
D) SUBCLONING 3'SIRNA#2 (SACII SITE)
[0239] The 3'siRNA#2 fragment (377bp) will be cloned into the SacU site of
232_pLOz_5'RNAi and 232_pOziπ_5'RNAi (from original design) to create the vectors 232_pLiN_Ifl (Figure 18) and 232_pFLiN_IJJ (not shown), respectively.
3. pLiNJV & pFLiNJV
A) 5' SIRNA #3 OLIGONUCLEOTIDES
[0240] P232_20 [SEQ ID NQ:30]
5' TTTGAGCTTGGCGGCGAATGGGΓΓC^G^G^CCCATTCGCCGCCAAGCTCTTTΓTGGTACCGC
Bbsl overhang siRNA template #3 9bp loop siRNA template #3 RNA polIII Kpnl Notl (sense) (anti-sense) terminator
[0241] P232_21 [SEQ ID NO:31]
5 ' GGCCGCGGTACCAAAAAGAGCTTGGCGGCGAATGGGΓCΓCΓΓG CCCATTCGCCGCCAAGCT
Notl overhang Kpnl RNA polIII complementary siRNA 9bp loop complementary siRNA Terminator (anti-sense) (sense) B) ANNEALING
[0242] The oligonucleotides are to be annealed to form the hairpin 5' siRNA #3 template:
TTTGAGCTTGGCGGCGAATGGGTTCAAGAGACCCATTCGCCGCCAAGCTCTTTTTGGTACCGC TCGAACCGCCGCTTACCCAAGTTCTCTGGGTAAGCGGCGGTTCGAGAAAAACCATGGCGCCGG
c) CLONING AND PCR SCREEN [0243] The annealed oligonucleotides will be cloned into the Bbsl and Notl sites of the
232_pBSK_5'mU6 vector to create 232_pBSK_5'RΝAi#3 (Figure 19). In doing so the Bbsl site will be destroyed. For PCR screen, the Ml 3 fwd and rev primers can be used. It is advisable to use the 232jpBSK_5'mU6 plasmid with these primers as a negative control (expected band size 492bp) for size comparison. If 5'siRNA#3 oligo has been successfully cloned then the PCR screen will give a product: 540bp
D) SUBCLONING 5' SlRNA#3 (KPNl SITE)
[0244] The 5'siRNA#3 fragment (389bp) will be cloned into the Kpnl site of 232_pLOz and
232_pOzHI. This cloning step will create the vectors 232_pLOz_5'RNAi#3 (Figure 20) and pOz_iπ_5'RNAi#3 (not shown), respectively.
E) 3' SIRNA #3 OLIGONUCLEOTIDES
[0245] P232_22 [SEQ ID NO:32]
5' TI GAGCTTGGCGGCGAATGGG7 C^^G^G^CCCATTCGCCGCCAAGCTCTTTTTCCGCGGAGCT Bbsl overhang siRNA template #3 9bp loop siRNA template #3 RNApolIII SacII Sad (sense) (anti-sense) terminator overhang
[0246] P232_23 [SEQ ID NO:33]
5' CCGCGGAAAAAGAGCTTGGCGGCGAATGGGΓCΓCΓΓG^CCCATTCGCCGCCAAGCT SacII RNA polIII complementary siRNA #3 9bp loop complementary siRNA #3 Terminator (anti-sense) (sense)
F) ANNEALING
[0247] The oligonucleotides will be annealed to form the hairpin 3' siRNA#3 template:
TTTGAGCTTGGCGGCGAATGGGTTCAAGAGACCCATTCGCCGCCAAGCTCTTTTTCCGCGGAGCT TCGAACCGCCGCTTACCCAAGTTCTCTGGGTAAGCGGCGGTTCGAGAAAAAGGCGCC G) CLONING AND PCR SCREEN
[0248] The annealed oligonucleotides will be cloned into the .Bbsl and Sacl sites of the
232_pBSK_3'mU6 vector to create 232jpBSK_3'RNAi#3 (Figure 21). In doing so the Bbsl site will be destroyed. For PCR screen, the Ml 3 fwd and rev primers can be used. It is advisable to use the 232_pBSK_5'mU6 plasmid with these primers as a negative control (expected band size 492bp) for size comparison. If 3'siRNA#3 oligo has been successfully cloned then the PCR screen will give a product: 530bp.
H) SUBCLONING 3'SlRNA#3 (SflcII SITE)
[0249] The 3'siRNA#3 fragment (377bp) will be cloned into the SacU site of
232_pLOz_5'RNAi#3 and 232jpOzIII_5'RNAi#3 to create the vectors 232_pLiNJV (Figure 22) and 232_pFLiN_IV (not shown), respectively.
[0250] The schematic shown in Figure 23 shows how the inclusion of the mU6-siRNA cassettes in the construct prepared according to Examples 1 and 2 aids in enriching for correctly targeted clones. In the targeted locus (3), homologous recombination excludes the mU6-siRNA cassette, and as such the siRNA is not expressed to knockdown the neo mRNA transcripts in these cells, so they will survive selection. By contrast, for random integrations of the targeting vector (4), the mU6-siRNA will also integrate into the mouse genome and siRNA' s will be expressed. The presence of siRNA' s will cause degradation of the neo mRNA transcripts, and cause these ES cells to die under G418 selection.
EXAMPLE 3 Construction of Cre-mediated neo excisable targeting construct [0251] A cassette comprising a nucleotide sequence encoding Cre recombinase-estrogen receptor ligand binding domain fusion protein operably linked to an SV40 promoter (the SV40-Cre-ERT cassette) will be cloned into the SacU site of the 232_pOzIII vector to yield pABU (see Figure 24). This cassette is tamoxifen inducible and in random integration events it will be incorporated and therefore cause deletion of the region between the two LoxP sites i.e., PGK-neo. As a consequence these cells will die under G418 selection and reduce number of ES colonies to be screened.
[0252] The schematic depicted in Figure 25 shows how the inclusion of the SV40-Cre-ERT cassette in a targeting vector of the invention aids in enriching for correctly targeted clones. In the targeted locus (3), homologous recombination excludes the SV40-Cre-ERT cassette, and so Cre is not expressed in these cells, thereby permitting the cells to survive selection. Whereas, for random integrations of the targeting vector (4), the tamoxifen inducible Cre cassette will also integrate into the mouse genome and will be expressed. The presence of Cre will excise the floxed region flanking the PGK-neo cassette, thereby removing the cassette from the genome, and causing the ES cells to die under G418 selection.
EXAMPLE 4 Construction of switch expressible negative selectable marker targeting construct I
[0253] A cassette comprising a nucleotide sequence encoding Cre recombinase operably linked to a promoter (e.g., SV40 promoter) will be cloned upstream and downstream of the homology arms of a targeting cassette of interest. The targeting cassette comprises a neo gene expression cassette for positive selection and a negative selectable marker gene (e.g., a thymidine kinase gene) that is downstream of a promoter (e.g., 5mUB promoter) and a transcriptional terminator that is flanked by LoxP sites and that blocks expression of the negative selectable marker gene from the promoter. Thus, in random integration events, the Cre recombinase cassettes will be incorporated and will cause deletion of the region between the two LoxP sites i.e., the transcriptional terminator, and expression of the negative selectable marker gene, thereby causing cell death. By contrast, homologous recombination will exclude the Cre recombinase cassettes, thereby allowing blockage of expression of the negative selectable marker gene and permitting survival of the cells under G418 selection. A schematic illustrating this construct is shown in Figure 26.
EXAMPLE 5
Construction of switch expressible negative selectable marker targeting construct II [0254] A cassette comprising a nucleotide sequence encoding Cre recombinase operably linked to a promoter (e.g., SV40 promoter) will be cloned upstream and downstream of the homology arms of a targeting cassette of interest. The targeting cassette comprises a neo gene expression cassette for positive selection and a negative selectable marker gene (e.g., a thymidine kinase gene) comprising a downstream portion, an upstream portion that is operably connected to a promoter (e.g., 5mUB promoter) and that is incapable of conferring cell death in the absence of the downstream portion and a franscriptional terminator interposed between the upstream and downstream portions, which is flanked by LoxP sites and which blocks expression of the downstream portion. Thus, in random integration events, the Cre recombinase cassettes will be incorporated and will cause deletion of the region between the two LoxP sites i.e., the transcriptional terminator, and expression of a complete negative
selectable marker gene, thereby causing cell death. By contrast, homologous recombination will exclude the Cre recombinase cassettes, thereby allowing blockage of expression of the downstream portion and permitting survival of the cells under G418 selection.
EXAMPLE 6 Construction of a traws-activator-controlled thymidine kinase gene targeting construct
[0255] A fransactivation cassette comprising a promoter (e.g., SV40 promoter) operably connected to a nucleotide sequence that encodes a trøws-activator protein will be cloned upstream and downstream of the homology arms of a targeting cassette of interest that comprises a neo gene expression cassette for positive selection and a thymidine kinase gene expression cassette for negative selection. The trarø-activator protein comprises a transcriptional activation domain (e.g., the acid transactivation domain (TAD) of HSV1-VP16) and a DNA-binding domain (e.g., the DNA-binding domain of the Gal4 protein). The promoter driving expression of the thymidine kinase gene contains a binding site (e.g., 5'-CGGACAACTGTTGACCG-3' [SEQ ID NO:4]) for the DNA-binding domain of the trans-activator protein. Thus, the transactivation cassettes of the resulting construct will be incorporated in random integration events, thereby causing activation of expression of the thymidine kinase gene and cell death. By contrast, homologous recombination will exclude the transactivation cassettes, thereby preventing expression of the thymidine kinase gene and permitting survival of the cells under G418 selection.
EXAMPLE 7 Construction of Cre-mediated neo excisable targeting construct
[0256] A tamoxifen inducible Cre-ERT2 cassette is operably linked to a eukaryotic promoter (eg
PGK) positioned externally of the homology arms, as shown in Figure 29. In random integration events of the targeting vector, the cassette mentioned above will also be incorporated into the genome of the host cell. Inducible expression of the Cre will cause excision of the positive selection cassette thereby causing cell death. It is important to note that there can be leaky expression of the Cre recombinase in bacteria which is undesirable for the construction of the targeting vector since the neo cassette must remain intact with the vector at all times. Therefore the Neo cassette must contain in addition to a eukaryotic promoter (eg PGK) for neomycin phosphotransferase expression in ES cells, also a bacterial promoter (eg EM7) for kanamycin resistance in bacteria. The kanamycin resistance will aid in selection of complete targeting vectors as opposed to those in which the floxed positive selection cassette has been excised. Nonetheless it is important to sequence the complete vector to verify the integrity of the Cre cassette and loxP sites since mutations to either of these can also be selected for in the bacteria.
[0257] The disclosure of every patent, patent application, and publication cited herein is hereby incorporated herein by reference in its entirety.
[0258] The citation of any reference herein should not be construed as an admission that such reference is available as "Prior Art" to the instant application. [0259] Throughout the specification the aim has been to describe the preferred embodiments of the invention without limiting the invention to any one embodiment or specific collection of features. Those of skill in the art will therefore appreciate that, in light of the instant disclosure, various modifications and changes can be made in the particular embodiments exemplified without departing from the scope of the present invention. All such modifications and changes are intended to be included within the scope of the appended claims.