[go: up one dir, main page]

CA3032535A1 - Method to amplify dna sequences from degraded sources - Google Patents

Method to amplify dna sequences from degraded sources Download PDF

Info

Publication number
CA3032535A1
CA3032535A1 CA3032535A CA3032535A CA3032535A1 CA 3032535 A1 CA3032535 A1 CA 3032535A1 CA 3032535 A CA3032535 A CA 3032535A CA 3032535 A CA3032535 A CA 3032535A CA 3032535 A1 CA3032535 A1 CA 3032535A1
Authority
CA
Canada
Prior art keywords
primers
dna
amplicons
sequence
primer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA3032535A
Other languages
French (fr)
Inventor
Sean PROSSER
Paul Hebert
Jeremy DEWAARD
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Guelph
Original Assignee
University of Guelph
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Guelph filed Critical University of Guelph
Publication of CA3032535A1 publication Critical patent/CA3032535A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6848Nucleic acid amplification reactions characterised by the means for preventing contamination or increasing the specificity or sensitivity of an amplification reaction

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A two stage nested multiplex PCR method is described to amplify DNA sequences from degraded specimens. The method of the invention is for recovery of full length DNA from specimens, whereby the specimen contains degraded DNA. Such specimens may be type specimens and the target DNA may be a DNA barcode for recognizing known species and for discovery of species yet to be named.

Description

Method to Amplify DNA Sequences from Degraded Sources Field of the Invention The invention relates to a method to amplify DNA sequences from degraded sources using a combination approach involving NGS (next generation sequencing). More specifically, the method of the invention is a two-stage multiplex PCR (polymerase chain reaction) and NGS
approach for recovery of full length DNA from specimens, whereby the specimen contains degraded DNA. Such specimens may be type specimens and the target DNA may be a DNA
barcode for recognizing known species and for discovery of species yet to be named. The invention further relates to kits and systems for carrying out such method.
Background of the Invention Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen.
Many individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavorable for DNA preservation, success in sequence recovery has been uncertain.
The immense repositories of identified specimens in the world's natural history museums provide the opportunity to construct a DNA barcode reference library that can subsequently be used to identify newly collected specimens [1,2]. However, the scientific value of this library would be greatly enhanced if each species was represented by sequences from its type material, particularly the holotype. Without such information, there are many cases in which the correct application of taxon names is uncertain. For example, the analysis of type(s) is critical when the study of modern specimens suggests synonymy (e.g. [31) or when it indicates that a long-known species is actually a complex of two or more morphologically similar taxa (e.g. [41). The recovery of a barcode sequence from type material is also essential when it represents the only known record(s) for a taxon ¨ a situation that is surprisingly common [5].
Early studies have recovered sequence information from museum specimens, including beetles [6,7], flies [8,9,10], true bugs [11], and moths [12,13]. Some of these investigations analyzed specimens that were relatively young (<50 years), while others extracted DNA from whole specimens. However, Hausmann et al [12] and Rougerie et al [14]
recovered barcode
- 2 -sequences from a single leg of type specimens more than 100 years old with a protocol that required six PCRs and twelve sequencing reactions (see details in [151).
Strutzenberger et al [16] reduced costs by processing specimens in batches of 95, but the basic protocol was unchanged, requiring substantial template DNA and careful inspection of data to ensure that contamination among wells had not produced chimeric sequences. As well, the failure of any single reaction led to an incomplete sequence for the barcode region.
Prior studies have often encountered difficulty in recovering sequence information from old museum specimens because of DNA degradation [17,18]. While protocols have improved, there are still important constraints [12,13,15,16]. Past studies have generally employed several PCR reactions to generate a set of short amplicons that were Sanger sequenced and assembled into a barcode record. When many amplification reactions are required, as in cases where difficulties in primer binding are encountered, template can be depleted before sequence is recovered. There is no easy solution because DNA extracts are small (<50 [tL) and concentrations are low (typically <0.5 pg/[tL) so dilution is rarely feasible [4,14]. As a consequence, sequence recovery from many type specimens is not currently possible.
Next-generation sequencers (NGS) are increasingly used for studies on both freshly collected and museum specimens [e.g. 191. Work on fresh specimens has shown that the barcode region can be recovered from hundreds of individuals at a time by using multiplex identifier (MID) tags to associate the sequence records from each specimen [20,21]. However, there are still issues with preferential amplification of certain fragments and inefficient amplification that leads to the inability to sequence the full target sequence. Taken together with the challenges of sequencing very small specimen size that contains heavily degraded DNA, it is desirable to provide a method that overcomes at least one disadvantage of known sequencing protocols.
Summary of the Invention The present invention provides methods, systems and kits that are useful for recovering sequences from degraded DNA present in a sample. When maintained within optimal archival conditions, DNA is highly stable and predicted to be viable for several millennia. Within the ambient environment however, or when exposed to particular stressors such as extreme heat, desiccation, irradiation, or known mutagenic compounds, genomic DNA breaks down rapidly and severely. For various applications and settings, this can prohibit genetic analyses when the quantity and quality of remaining DNA falls below the sensitivity thresholds of current
- 3 -analytical equipment and procedures. Specimens held in biomedical or natural history collections degrade rapidly overtime, particularly when stored in compounds such as formalin, paraffin, or low concentration ethanol; forensic cases and environmental samples involving trace quantities (i.e. 'eDNA') can be inhibited by ultraviolet exposure or diluted beyond detection; and processed and manufactured animal and timber products may endure severe temperatures and desiccation, rendering the DNA (and source organism) imperceptible.
The present invention has been made to solve at least one foregoing problem of the prior art and therefore an aspect of the present invention is to provide a method for amplifying and characterizing DNA sequences from small sample amounts containing degraded DNA in an efficient, rapid and economical manner.
In aspects the method comprises a two stage nested multiplex PCR/NGS approach that is effective for amplification of a desired DNA sequence from a small sample of degraded DNA resulting in efficient, unbiased co-amplification of fragments spanning a desired gene region, in aspects a barcode region of a gene.
The present method has the advantage of requiring very little template DNA and providing protection against the failure of any particular amplification reaction due to the novel initial two step multiplex PCR developed approach. The method uses relatively few primers, however the primers are allowed to pair in any combination as opposed to being restricted to specific pairs ¨ all while avoiding common pitfalls (e.g. overlap amplification, primer incorporation and primer dimer sequencing). This is accomplished without the use of special enzymes beyond standard polymerases.
In one aspect, the invention relates to target-specific primers and compositions comprising such primers useful for the selective amplification of one or more target sequences associated with a barcode region in degraded DNA.
Primers are selected with respect to the target sequence to be amplified and the condition of the degraded DNA, that is, primers of about 150bp or more can be utilized to target degraded DNA. However it is understood by one of skill in the art that primers can be designed shorter in order to be able to target shorter segments of degraded DNA where there is limited sequence. The method of the invention can be used to detect and amplify highly degraded DNA
in specimens where even no DNA could be detected by other methods.
The present method may recover full-length barcodes from type specimens with heavily degraded DNA by employing a two-step multiplex PCR to generate short amplicons covering the barcode region and then using NGS for their characterization, i.e.
sequencing. In this
- 4 -manner the entire barcode region of a desired gene from a small specimen containing degraded DNA can be characterized.
The method of the invention is scalable and widely applicable, that is, has a taxonomic breadth. The method encompasses amplification of DNA over a wide variety of diverse animal groups. It has been scaled to work on 96 samples simultaneously with good success rates, and may be scaled further to several hundred sample simultaneously.
The method of the invention can be used with various conditions of DNA
degradation (e.g. samples decades to centuries old, formalin-fixed, fluid-preserved, or processed) and still lead to successful DNA amplification and in aspects, barcode recovery.
As the method is quick and cost-effective to sequence degraded DNA from very limited sources, this method has good potential in a variety of areas for example to researchers, food safety officials, forensic investigators, wildlife enforcement officers, biomedical technicians and so forth.
The effectiveness of the present method has been validated by recovering sequences from century-old specimens of Lepidoptera, including those where Sanger analysis completely failed. Importantly, in aspects, this two stage multiplex PCR/NGS
method escapes problems that often confront Sanger analysis, such as uncertain primer binding, amplification bias, and/or the need for large amounts of template DNA.
According to an aspect of the invention there is provided a method comprising a two-step multiplex PCR followed by NGS to sequence degraded DNA.
According to an aspect of the invention the method comprises two stages, one stage involving a two-step multiplex PCR and the other stage comprising NGS to recover/characterize the sequence of a barcode region in the sample comprising degraded DNA. In aspects the barcode region is of the cytochrome c oxidase I gene.
According to another aspect of the invention there is provided a method comprising multiplex nested PCR to form a plurality of amplicons from a degraded DNA
source. NGS is then utilized to recover the sequence from the plurality of degenerate amplicons generated by the two stage multiplex nested PCR, in aspects to characterize a barcode region of a gene.
According to another aspect of the invention there is provided a method comprising multiplex nested PCR, the method comprising performing a two-stage nested PCR
on a sample containing degraded DNA. The present invention is based, in part, on the novel use of two stages of specific hybridization between a homologous region in a probe and the complementary sequence in a nucleic acid template of the degraded DNA, each of which is
5 followed by extension of the probe by DNA synthesis. The second stage utilizes the products of the first stage as a template.
In aspects, the method of the invention substantially reduces the formation of spurious reaction products in multiplex amplification reactions of large numbers of specific degraded nucleic acid sequences.
In aspects the present invention provides novel compositions useful in substantially reducing the formation of spurious reaction products in two part multiplex amplification reactions of large numbers of specific nucleic acid sequences from degraded DNA.
According to another aspect of the invention there is provided a multiplex PCR
assay mixture for amplification of a target degraded DNA, the mixture comprising a combination of a plurality of primer sets wherein a number of the primer sets are nested. In aspects, a number of the primer sets are 10bp and adapter-tailed primers. In further aspects, the primers (forward and reverse) include degeneracy at sites important for primer binding, i.e. 3' terminus for forward primer and 5' terminus for reverse primer, such that 12 forward and reverse primers provide a composition comprising 2010 primers.
According to an aspect of the invention there is provided a two stage method for obtaining a full length barcode sequence from specimens with degraded DNA, the method comprising two step multiplex nested PCR utilizing primers that span the entire barcode sequence that can pair in any combination to generate a plurality of amplicons while avoiding overlap amplification, primer incorporation and/or primer dimer sequencing;
and NGS for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.
In aspects the two step multiplex nested PCR utilizes primers that target non-adjacent fragments of the target sequence in each of the steps. Furthermore, the primers in the first step are designed such that undesired elongation is blocked (in one aspect are non-tailed) and are selected further to be paired with the next downstream reverse primers. The primers in the second step are adapter-tailed primers and may further incorporate a MID tag.
According to an aspect of the invention there is provided a method to generate redundant amplicons for a target DNA sequence of degenerated DNA, the method comprising:
(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy;
- 6 -(b) using the amplicon products of (a) as a template, performing a second multiplex nested PCR comprising a plurality of adapter-tailed primers with optional MID
tags that hybridize to the amplicon products of (a), (c) repeating step (a) and then (b); and (d) pooling the products from (c).
In aspects the method then further comprises performing next generation sequencing to the pooled products from (d). The pooled products from (d) are optionally cleaned to remove any undesired genomic DNA, primer dimers and/or residual primers.
In aspects, undesired elongation in the first step of multiplex PCR can be achieved through various mechanisms such as use of non-complementary tails on the PCR1 primers or with the use of any type of agent that blocks elongation from the 5' end of the primers, i.e.
chemical conjugation.
According to another aspect of the present invention is a method for amplifying a barcode region from the cytochrome c oxidase 1 gene (COI) from a small specimen of degraded DNA using multiplex PCR, the method comprising:
- extracting the degraded DNA to provide a linear template;
- performing first multiplex nested PCR1 using a plurality of forward primers and downstream reverse primers that include degeneracy and hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;
- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCR1 reaction as a template using adapted tailed primers that hybridize to portions of said amplicons, - pooling all amplicon products; and - performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.
The multiplex PCR described herein is desirably performed under suitable conditions for hybridization.
According to an aspect of the invention is a method for detection and identification of a barcode region of the COI gene in a small specimen containing degraded DNA
to identify the taxonomic classification of said specimen, the method comprising;
- extracting linear degraded DNA from said specimen;
- 7 -- performing two step multiplex nested PCR on said linear degraded DNA
using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;
- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and - classifying said specimen.
According to another aspect of the invention is a kit for performing multiplex nested PCR on a small specimen comprising degraded DNA in order to determine the barcode region of the COI gene and thus classify the specimen, the kit comprising; primers specific for said barcode region of said COI gene, suitable buffers, reaction nucleotides, enzymes, optional stabilizers and instructions for use. In aspects, kits can be designed for any specimen type depending on the target gene of interest for amplification and sequencing.
According to another aspect, there is provided a method for amplifying degraded DNA, the method comprising:
amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.
In an aspect, the forward and reverse primers in the PCR1 reaction comprise block elongation moieties to block elongation from the 5' end of the primers.
In an aspect, the block elongation moieties comprise non-complementary tails.
In an aspect, the method comprises from about 2 to about 10 forward primers and from about 2 to about 10 reverse primers.
In an aspect, the method comprises 6 forward primers (F1, F2, F3, F4, F5, and F6) and 6 reverse primers (R1, R2, R3, R4, R5, and R6).
In an aspect, for PCR1, Fl, F3, and F5 are paired with R1, R2, R3, R4, R5, and R6 and F2, F4, and F6 are paired with R1, R2, R3, R4, and R5.
In an aspect, for PCR2, Fl is paired with R1, R2, and R3; F2 is paired with R2, R3, and R4; F3 is paired with R3, R4, and R5; F4 is paired with R5 and R6; and F6 is paired with R6.
In an aspect, the primers for PCR2 comprise adapter tailed primers for sequencing.
- 8 -In an aspect, the primers are degenerate.
According to an aspect, there is provided a method for sequencing degraded DNA, the method comprising amplifying redundant amplicons such that each region of the target DNA
sequence is covered by multiple amplicons, wherein the generation of specific amplicons is determined automatically by a combination of primer-template matching and the pattern of DNA degradation in the target sequence.
In accordance with an aspect, there is provided a method of amplifying a barcode region of a degraded DNA sample, the method comprising:
performing at least a PCR1 a reaction and a PCR1b reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCRla complement of amplicons and a PCR1b complement of amplicons, wherein the plurality of forward primers comprise primers Fi, F2, ... , F., in order from upstream to downstream of the target sequence, wherein n is a whole number;
wherein the plurality of reverse primers comprise primers Ri, R2, , Rm, in order from upstream to downstream of the target sequence, wherein m is a whole number;
wherein the plurality of reverse primers are downstream of Fi and the plurality of forward primers are upstream of Rn;
wherein the PCRla reaction comprises each odd-numbered forward primer starting with Fi and further comprises all or substantially all of the reverse primers;
and wherein the PCR1b reaction comprises each even-numbered forward primer starting with F2 and further comprises all or substantially all of the reverse primers that are upstream of F2.
In an aspect, the forward and reverse primers comprise block elongation moieties to block elongation from the 5' end of the primers and reduce non-target amplification.
In an aspect, the block elongation moieties comprise non-complementary tails.
In an aspect, the method further comprises performing a plurality of PCR2 reactions, PCR21, PCR22, PCR2., to amplify the PCRla and PCR1b complements of amplicons, wherein each PCR2 reaction uses a different forward primer and a different set of one or more downstream reverse primers; and wherein the PCR1 a complement of amplicons are amplified using odd-numbered forward primers and wherein the PCR1b complement of amplicons are amplified using even-numbered forward primers.
In an aspect, the method further comprises pooling the resulting amplicons.
- 9 -In an aspect, the primers for PCR2 are adapter-tailed for sequence analysis.
In an aspect, the primers for PCR2 are MID-tagged to associate amplicons with specific specimens, such that multiple specimens can be sequenced simultaneously.
In an aspect, n is from 2-10, such as 6.
In an aspect, m is from 2-10, such as 6.
In an aspect, the forward and reverse primers are as defined in Table 4.
In an aspect, the template is not depleted through use of the method.
In accordance with an aspect, there is provided a method of amplifying degraded DNA
according to the scheme shown in Figures 2a and 2b herein.
In an aspect, the method is for taxonomic classification of unknown specimens.
In an aspect, the primers are degenerate.
In an aspect, the method is for analyzing a plurality of specimens simultaneously.
In an aspect, the method is for amplification of a sample comprising small amounts of degraded DNA, such as at least about 0.1 ng of degraded DNA, such as at least about 0.5 ng, about 1 ng, about 10 ng, about 100 ng, about 500 ng, or from about 21tg to about 51.tg of degraded DNA.
The practice of the present subject matter may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, molecular biology (including recombinant techniques), cell biology, and biochemistry, which are within the skill of the art.
Such conventional techniques include, but are not limited to, preparation of synthetic polynucleotides, polymerization techniques, chemical and physical analysis of polymer particles, preparation of nucleic acid libraries, nucleic acid sequencing and analysis, and the like. Specific illustrations of suitable techniques can be used by reference to the examples provided herein. Other equivalent conventional procedures can also be used.
Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Hermanson, Bioconjugate Techniques, Second Edition (Academic Press, 2008);
Merkus, Particle Size Measurements (Springer, 2009); Rubinstein and Colby, Polymer Physics (Oxford University Press, 2003); and the like.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which these inventions belong. All patents, patent applications, published applications, and other publications referred
- 10 -to herein, both supra and infra, are incorporated by reference in their entirety. If a definition and/or description is set forth herein that is contrary to or otherwise inconsistent with any definition set forth in the patents, patent applications, published applications, and other publications that are herein incorporated by reference, the definition and/or description set forth herein prevails over the definition that is incorporated by reference.
As used herein, the terms "comprises," "comprising," "includes," "including,"
"has,"
"having" or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of features is not necessarily limited only to those features but may include other features not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, "or" refers to an inclusive-or and not to an exclusive-or. For example, a condition A
or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Brief Description of the Drawings The above and other aspects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
Figure 1 schematically depicts the two stage multiplex nested PCR/NGS
methodology of the present invention.
Figure 2 schematically depicts primer positions for the first and second rounds of PCR
(a) and all possible final amplicons (b). The initial round of PCR (PCR1) includes two separate reactions (a ¨ above broken line) using 10bp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR (PCR2) includes six separate reactions (a ¨ below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID
tags unique to that specimen. For increased multiplexing, each reverse PCR2 primer can also be tailed with a MID tag allowing a large number of possible combinations (e.g. adding 96 unique MID tags to the forward primers and 4 unique MID tags to the reverse primers allows 384 specimens to be multiplexed and individually tracked).
-11 -Figure 3 shows the recovery of sequences from ten type specimens in each of three DNA categories. (a) Number of reads; (b) Per base coverage; (c) Number of base pairs (bp) recovered via NGS. HQ ¨ high quality; MQ ¨ medium quality, LQ ¨ low quality.
Mean (horizontal black line), standard deviation (edges of box), min and max (whiskers, *) are shown. The horizontal broken line in (c) represents a full-length (658bp) barcode.
Figure 4 shows a neighbor-joining tree showing 100% concordance between sequences generated from type specimens using NGS and Sanger sequencing. For each species, BOLD
Process IDs are shown for both the Sanger and NGS-generated (outlined in red) sequences.
Figure 5 shows a neighbor-Joining tree of barcodes generated century-old type specimens and contemporary congeneric taxa (where available). Barcodes from the 26 century-old specimens (outlined in red) were generated via NGS. Four cases involve confirmed or suspected synonymy: Celerna amplimargo and C. lerne, Aeolochroma caesia and A.

saturataria, Sarcinodes subvirgata and S. holzi, Pingasa furvifrons and P.
nobilis.
Figure 6 schematically shows the alignments of sequence records derived from two type specimens of Geometridae, one with high quality DNA (a) and one with low quality DNA
(b). The alignments show only a single representative of each distinct sequence. In many cases, there were hundreds or thousands of a particular sequence. High quality reads have high coverage across the entire 658bp barcode region and originate from a single source ¨ indicated by a single nucleotide (color) at each position in the contig. Low quality reads do not span the entire barcode region (i.e. they have regions lacking coverage) and often originate from multiple sources ¨ indicated by multiple nucleotides (colors) at certain positions in the contig.
Figure 7 shows that there is no negative impact on sequence recovery when NGS
throughput is increased by analyzing 95 samples simultaneously. "10-plex"
refers to amplifying and sequencing 10 samples in a single process, while "95-plex"
refers to amplifying and sequencing 95 samples in a single process. In addition to decreasing processing time, costs are cut almost 10-fold by moving from a 10- to 95-plex system. A similar move to a 384-plex system is currently being developed and would further cut costs significantly.
Figure 8 shows the effects of designing primers to target a specific taxonomic group.
In this example, primers designed to target animal DNA in general are compared to the same primers designed to target vertebrate DNA specifically. Both primer sets were used to amplify the same mammalian DNA, and the results clearly show a significant performance improvement by the vertebrate primers compared to the general primers. By making similar
- 12 -primer modifications, the NGS method can theoretically be applied to any genetic sequence in any type of organism.
Figure 9 shows PCR success rates for general- and vertebrate-specific primers.
Both primer sets target the same gene region. To directly compare the primer sets, each set was used to amplify the same DNA from 95 fresh and 95 degraded vertebrate samples. In both cases, the vertebrate-specific primers outperformed the general primers.
Detailed Description of the Invention The following description of various exemplary embodiments is exemplary and explanatory only and is not to be construed as limiting or restrictive in any way. Other embodiments, features, objects, and advantages of the present teachings will be apparent from the description and accompanying drawings, and from the claims.
As used herein, "amplify", "amplifying" or "amplification reaction" and their derivatives, refer generally to any action or process whereby at least a portion of a nucleic acid molecule (referred to as a template nucleic acid molecule) is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule. The template nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded. In some embodiments, amplification includes a template-dependent in vitro enzyme-catalyzed reaction for the production of at least one copy of at least some portion of the nucleic acid molecule or the production of at least one copy of a nucleic acid sequence that is complementary to at least some portion of the nucleic acid molecule.
Amplification optionally includes linear or exponential replication of a nucleic acid molecule.
In some embodiments, such amplification is performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. At least some of the target sequences can be situated on the same nucleic acid molecule or on different target nucleic acid molecules included in the single amplification reaction. In some embodiments, "amplification"
includes amplification of at least some portion of DNA- and RNA-based nucleic acids alone, or in combination. The amplification reaction can include single or double-stranded nucleic acid substrates and can further including any of the amplification processes known to one of
- 13 -ordinary skill in the art. In some embodiments, the amplification reaction includes polymerase chain reaction (PCR).
As used herein, "amplification conditions" and its derivatives, generally refers to conditions suitable for amplifying one or more nucleic acid sequences. Such amplification can be linear or exponential. In some embodiments, the amplification conditions can include isothermal conditions or alternatively can include thermocyling conditions, or a combination of isothermal and thermocycling conditions. In some embodiments, the conditions suitable for amplifying one or more nucleic acid sequences includes polymerase chain reaction (PCR) conditions. Typically, the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences, or to amplify an amplified target sequence ligated to one or more adapters, e.g., an adapter-ligated amplified target sequence.
Generally, the amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid. The amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification. Typically, but not necessarily, amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending and separating are repeated. Typically, the amplification conditions include cations such as Mg ++ or Mn (e.g., MgCl2, etc.) and can also include various modifiers of ionic strength.
As used herein, "target sequence" or "target sequence of interest" and its derivatives, refers generally to any single or double-stranded nucleic acid sequence that can be amplified or synthesized according to the disclosure, including any nucleic acid sequence suspected or expected to be present in a sample. In some embodiments, the target sequence is present in double-stranded form and includes at least a portion of the particular nucleotide sequence to be amplified or synthesized, or its complement, prior to the addition of target-specific primers or appended adapters. Target sequences can include the nucleic acids to which primers useful in the amplification or synthesis reaction can hybridize prior to extension by a polymerase. In some embodiments, the term refers to a nucleic acid sequence whose sequence identity, ordering or location of nucleotides is determined by one or more of the methods of the disclosure.
- 14 -As defined herein, "sample" is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target. In some embodiments, the sample comprises DNA, RNA, PNA, LNA, chimeric, hybrid, or multiplex-forms of nucleic acids. The sample can include any biological, animal, avian, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids. The term also includes any isolated nucleic acid sample such a genomic DNA, fresh-frozen or formalin-fixed paraffin-embedded nucleic acid specimen.
As used herein, "degraded DNA" is used in its broadest sense to include DNA
that is "falling apart" or broken down into smaller pieces. Degraded DNA may be reflective of: using very old DNA samples; using DNA extracted from formalin-fixed paraffin embedded samples;
freezing and thawing DNA samples repeatedly; leaving DNA samples at room temperature; or exposing DNA samples to heat or physical shearing.
As used herein, the term "primer" and its derivatives refer generally to any polynucleotide that can hybridize to a target sequence of interest. In some embodiments, the primer can also serve to prime nucleic acid synthesis. Typically, the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule. The primer may be comprised of any combination of nucleotides or analogs thereof, which may be optionally linked to form a linear polymer of any suitable length. In some embodiments, the primer is a single-stranded oligonucleotide or polynucleotide. (For purposes of this disclosure, the terms "polynucleotide"
and "oligonucleotide" are used interchangeably herein and do not necessarily indicate any difference in length between the two). In some embodiments, the primer is single-stranded but it can also be double-stranded. The primer optionally occurs naturally, as in a purified restriction digest, or can be produced synthetically. In some embodiments, the primer acts as a point of initiation for amplification or synthesis when exposed to amplification or synthesis conditions; such amplification or synthesis can occur in a template-dependent fashion and optionally results in formation of a primer extension product that is complementary to at least a portion of the target sequence. Exemplary amplification or synthesis conditions can include contacting the primer with a polynucleotide template (e.g., a template including a target sequence), nucleotides and an inducing agent such as a polymerase at a suitable temperature and pH to induce polymerization of nucleotides onto an end of the target-specific primer. If
- 15 -double-stranded, the primer can optionally be treated to separate its strands before being used to prepare primer extension products. In some embodiments, the primer is an oligodeoxyribonucleotide or an oligoribonucleotide. In some embodiments, the primer can include one or more nucleotide analogs. The exact length and/or composition, including sequence, of the target-specific primer can influence many properties, including melting temperature (Tm), GC content, formation of secondary structures, repeat nucleotide motifs, length of predicted primer extension products, extent of coverage across a nucleic acid molecule of interest, number of primers present in a single amplification or synthesis reaction, presence of nucleotide analogs or modified nucleotides within the primers, and the like. In some embodiments, a primer can be paired with a compatible primer within an amplification or synthesis reaction to form a primer pair consisting or a forward primer and a reverse primer.
In some embodiments, the forward primer of the primer pair includes a sequence that is substantially complementary to at least a portion of a strand of a nucleic acid molecule, and the reverse primer of the primer of the primer pair includes a sequence that is substantially identical to at least of portion of the strand. In some embodiments, the forward primer and the reverse primer are capable of hybridizing to opposite strands of a nucleic acid duplex. Optionally, the forward primer primes synthesis of a first nucleic acid strand, and the reverse primer primes synthesis of a second nucleic acid strand, wherein the first and second strands are substantially complementary to each other, or can hybridize to form a double-stranded nucleic acid molecule. In some embodiments, one end of an amplification or synthesis product is defined by the forward primer and the other end of the amplification or synthesis product is defined by the reverse primer. In some embodiments, where the amplification or synthesis of lengthy primer extension products is required, such as amplifying an exon, coding region, or gene, several primer pairs can be created than span the desired length to enable sufficient amplification of the region. In some embodiments, a primer can include one or more cleavable groups. In some embodiments, primer lengths are in the range of about 10 to about 60 nucleotides, about 12 to about 50 nucleotides and about 15 to about 40 nucleotides in length.
Typically, a primer is capable of hybridizing to a corresponding target sequence and undergoing primer extension when exposed to amplification conditions in the presence of dNTPS and a polymerase. In some instances, the particular nucleotide sequence or a portion of the primer is known at the outset of the amplification reaction or can be determined by one or more of the methods disclosed herein. In some embodiments, the primer includes one or more cleavable groups at one or more locations within the primer.
- 16 -As used herein, "target-specific primer" and its derivatives, refers generally to a single stranded or double-stranded polynucleotide, typically an oligonucleotide, that includes at least one sequence that is at least 50% complementary, typically at least 75%
complementary or at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% or at least 99% complementary, or identical, to at least a portion of a nucleic acid molecule that includes a target sequence. In such instances, the target-specific primer and target sequence are described as "corresponding" to each other.
In some embodiments, the target-specific primer is capable of hybridizing to at least a portion of its corresponding target sequence (or to a complement of the target sequence); such hybridization can optionally be performed under standard hybridization conditions or under stringent hybridization conditions. In some embodiments, the target-specific primer is not capable of hybridizing to the target sequence, or to its complement, but is capable of hybridizing to a portion of a nucleic acid strand including the target sequence, or to its complement. In some embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the target sequence itself; in other embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85%
complementary, more typically at least 90% complementary, more typically at least 95%
complementary, more typically at least 98% complementary, or more typically at least 99%
complementary, to at least a portion of the nucleic acid molecule other than the target sequence.
In some embodiments, the target-specific primer is substantially non-complementary to other target sequences present in the sample; optionally, the target-specific primer is substantially non-complementary to other nucleic acid molecules present in the sample. In some embodiments, nucleic acid molecules present in the sample that do not include or correspond to a target sequence (or to a complement of the target sequence) are referred to as "non-specific"
sequences or "non-specific nucleic acids". In some embodiments, the target-specific primer is designed to include a nucleotide sequence that is substantially complementary to at least a portion of its corresponding target sequence. In some embodiments, a target-specific primer is at least 95% complementary, or at least 99% complementary, or identical, across its entire length to at least a portion of a nucleic acid molecule that includes its corresponding target sequence. In some embodiments, a target-specific primer can be at least 90%, at least 95%
- 17 -complementary, at least 98% complementary or at least 99% complementary, or identical, across its entire length to at least a portion of its corresponding target sequence. In some embodiments, a forward target-specific primer and a reverse target-specific primer define a target-specific primer pair that can be used to amplify the target sequence via template-dependent primer extension. Typically, each primer of a target-specific primer pair includes at least one sequence that is substantially complementary to at least a portion of a nucleic acid molecule including a corresponding target sequence but that is less than 50%
complementary to at least one other target sequence in the sample. In some embodiments, amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, each including at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. In some embodiments, the target-specific primer can be substantially non-complementary at its 3' end or its 5' end to any other target-specific primer present in an amplification reaction. In some embodiments, the target-specific primer can include minimal cross hybridization to other target-specific primers in the amplification reaction. In some embodiments, target-specific primers include minimal cross-hybridization to non-specific sequences in the amplification reaction mixture. In some embodiments, the target-specific primers include minimal self-complementarily. In some embodiments, the target-specific primers can include one or more cleavable groups located at the 3' end. In some embodiments, the target-specific primers can include one or more cleavable groups located near or about a central nucleotide of the target-specific primer. In some embodiments, one of more targets-specific primers includes only non-cleavable nucleotides at the 5' end of the target-specific primer. In some embodiments, a target specific primer includes minimal nucleotide sequence overlap at the 3' end or the 5' end of the primer as compared to one or more different target-specific primers, optionally in the same amplification reaction. In some embodiments 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more, target-specific primers in a single reaction mixture include one or more of the above embodiments. In some embodiments, substantially all of the plurality of target-specific primers in a single reaction mixture includes one or more of the above embodiments.
As used herein, "polymerase" and its derivatives, generally refers to any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically but not necessarily, such nucleotide polymerization can occur in a template-
- 18 -dependent fashion. Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization. Optionally, the polymerase can be a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases.
Typically, the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases. The term "polymerase" and its variants, as used herein, also refers to fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide.
In some embodiments, the second polypeptide can include a reporter enzyme or a processivity-enhancing domain. Optionally, the polymerase can possess 5' exonuclease activity or terminal transferase activity. In some embodiments, the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture. In some embodiments, the polymerase can include a hot-start polymerase or an aptamer based polymerase that optionally can be reactivated.
As used herein, the term "nucleotide" and its variants comprises any compound, including without limitation any naturally occurring nucleotide or analog thereof, which can bind selectively to, or can be polymerized by, a polymerase. Typically, but not necessarily, selective binding of the nucleotide to the polymerase is followed by polymerization of the nucleotide into a nucleic acid strand by the polymerase; occasionally however the nucleotide may dissociate from the polymerase without becoming incorporated into the nucleic acid strand, an event referred to herein as a "non-productive" event. Such nucleotides include not only naturally occurring nucleotides but also any analogs, regardless of their structure, that can bind selectively to, or can be polymerized by, a polymerase.
The term "extension" and its variants, as used herein, when used in reference to a given primer, comprises any in vivo or in vitro enzymatic activity characteristic of a given polymerase that relates to polymerization of one or more nucleotides onto an end of an existing nucleic acid molecule. Typically but not necessarily such primer extension occurs in a template-
- 19 -dependent fashion; during template-dependent extension, the order and selection of bases is driven by established base pairing rules, which can include Watson-Crick type base pairing rules or alternatively (and especially in the case of extension reactions involving nucleotide analogs) by some other type of base pairing paradigm. In one non-limiting example, extension occurs via polymerization of nucleotides on the 3'0H end of the nucleic acid molecule by the polymerase.
As used herein, "multiplex identifier tag (MID)" or "DNA tagging sequence" and its derivatives, refers generally to a unique short (6-14 nucleotide) nucleic acid sequence within an adapter that can act as a "key" to distinguish or separate a plurality of amplified target sequences in a sample. For the purposes of this disclosure, a DNA barcode or DNA tagging sequence can be incorporated into the nucleotide sequence of an adapter.
As used herein, a "barcode" is a short DNA sequence from a uniform locality on the genome used for identifying species.
As defined herein "multiplex amplification" refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel. The "plexy" or "plex" of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536-plex, 3072-plex, 6144-plex or higher.
As used herein, "nested PCR" means that two pairs of PCR primers were used for a single locus. The first pair amplifies the locus as seen in any PCR
experiment. The second pair of primers (nested primers) bind within the first PCR product and produce a second PCR
product that may be shorter than the first one.
As used herein "Next Generation Sequencing (NGS)" refers to various types of massive parallel sequencing techniques. NGS extends the process of sequencing by sequencing millions of fragments in parallel fashion. NGS basically incorporates library preparation, cluster generation, sequencing and data analysis. Several different types of NGS
platforms are commercially available.
DNA barcoding is a new system of species identification and discovery using a short section of DNA from a standardized region of the genome [1]. This DNA sequence is then used to identify different species in a manner analogous to a supermarket scanner using black stripes
- 20 -of the UPC barcode to identify purchases. It would be very beneficial to be able to barcode any type of sample from any source, no matter how old or how it has been treated.
In particular, it is beneficial to be able to barcode specimens whereby the DNA may be degraded to certain extents.
The method of the invention (schematically shown in Figure 1) incorporates a novel two stage multiplex nested PCR approach to first amplify very small amounts of degraded DNA to produce a plurality of amplicons that cover the entire region of the gene or sequence of interest. The amplicons are then subject to sequence characterization by NGS methods. The method of the invention uses NGS to characterize/recover sequences of the pool of amplicons produced by the multiplex PCR from specimens with varying DNA qualities (i.e.
different levels of degradation). In combination these two provide for a novel method of amplification and characterization of DNA sequences from degraded sources.
The present method has use in one aspect for sequencing essentially the entire barcode of a specimen that may have varying degrees of DNA degradation, inclusive of specimens with almost no intact DNA. The present method can be used with as little as 2 [tg specimen sample size or more containing various degrees of degraded DNA. This will then provide utility with respect to identifying species and confirming species but also for a variety of other applications including biomedicine, forensics and environmental DNA (eDNA) monitoring where assembly of longer sequences from trace amounts of fragmented DNA is necessary. The present method can also be used with respect to foods where artificial sequences may be inserted therein. In general, the method provides for recovery of barcode sequences (or any desired sequence) and possibly promote development of portable devices for DNA barcoding.
A mitochondrial gene barcode is used to enable the identification of most animal species. For plants, mitochondrial genes do not differ sufficiently to distinguish among closely related species. The gene region being used for almost all animal groups is a 658 base-pair region in the mitochondrial cytochrome c oxidase 1 gene ("COI") (the Folmer region). It is highly effective in identifying a range of animal groups as well as birds, butterflies, fish and flies. The COI barcode is not effective for identifying plants because it evolves too slowly, but the two gene regions in the chloroplast, matK and rbcL, are approved as the barcode regions for land plants. For fungi, the internal transcribed spacer (ITS) region may be used. Other barcode regions are being identified and it will be understood that the methods described herein are applicable to any barcode region, whether currently known or identified in the future.
- 21 -The method of the invention has been demonstrated herein to recover the barcode region for COI from small amounts of template DNA, initially from a small number of Lepidoptera and subsequently from samples spanning several major insect orders, as well as arachnids, marine invertebrates, and land- and aquatic-based vertebrates.
However, it would be understood by one of skill in the art that the present method is very universal and in aspects can also be scaled for plants or other organisms, as well as for other gene regions. The method can be provided as a system in separate kits for invertebrates, mammals, fish and birds as non-limiting examples.
The present two stage multiplex PCR/NGS approach whereby it allows for all fragments to be amplified in a single multiplex PCR due to the multiplex nature of the PCR
reactions which allows for high primer redundancy. As a result, each DNA
extract processed with the present approach is exposed to amplification by approximately 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The multiplex PCR is performed such that initially generated amplicons using a plurality of primers act as a template for a subsequent round of multiplex PCR with different primer characteristics.
This is also in contrast to the traditional Sanger approach which utilizes multiple PCR
reactions for each fragment.
In the method a first step of multiplex PCR (PCR1) is performed using nested degenerate primers designed to hybridize to the extracted target DNA template.
To avoid preferential amplification of certain fragments and amplification bias, a second round of multiplex PCR (PCR2) is performed targeting non-adjacent fragments of the DNA
template using the first round PCR (PCR1) products (amplicons) as a template. The same reaction is basically repeated and then further in a nested approach (using nested PCR).
In the first stage PCR1 10bp-tailed primers are used while in PCR2 adapter-tailed primers that are also tailed with a multiplex identifier tag (MID) are used.
To produce more template options for the second multiplex PCR2 without increasing bias, each of the two first step PCR1 reactions contain selected forward primers for all downstream reverse primers to allow the same region of DNA to be covered by multiple amplicons ¨ thus produce more redundancy. Thus each primer pair in the multiplex second stage PCR2 is provided with multiple template amplicon options where only one needs to work to get full coverage of the target sequence. To further neutralize amplification bias, reactions are split into six reactions, one for each forward primer that is paired with the next 1-3 downstream reverse primers. Taken together, this cumulatively prevents overlap amplification,
- 22 -reduces amplification bias and results in redundant amplification so that if one particular primer set is not effective or fails then another is likely to cover for it.
To further avoid primer incorporation into the middle of sequence reads, certain reverse primers from PCR1 are omitted so that overlap amplicons cannot form, however, this reduces the amount of amplicons available as templates for PCR2 which leads to a loss of the amplification redundancy. Thus unwanted elongation by the polymerase is blocked in PCR1.
This was effected by the use of non-complementary tails on the PCR1 primers, however, any agent that blocks elongation (i.e. chemical conjugation) from the 5' end of the primers can be used.
For NGS, the primers used in the PCR2 are tailed with adapter sequences and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. The superior performance of NGS in sequence recovery is likely due, in part, to the developed multiplex nature of the PCR reactions which allowed high primer redundancy. As a result, each DNA
extract processed with the current protocol was exposed to amplification by 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The high diversity of primers undoubtedly meant that there was a greater chance of achieving the primer-template homology necessary for successful amplification. The higher success of the NGS
protocol (compared to Sanger sequencing) is likely also a consequence of the greater sensitivity of these sequencing platforms. This difference was evidenced by the fact that, in the initial experiment, 16 of 20 specimens which failed to generate a 164bp sequence via Sanger analysis, generated sequence reads for the same region with NGS. Subsequent experiments comparing Sanger sequencing to the NGS method showed a 5-20 fold increase in the number of recovered barcode sequences using the NGS method (Table 1). Furthermore, while increased sample age has a strong negative affect on barcode recovery via Sanger analysis, the NGS method recovers long barcodes regardless of age (Figure 6). The results show that it was possible to recover a full-length COI barcode with NGS from specimens that failed with Sanger analysis.
Table 1 ¨ Direct comparison of Sanger and NGS method on various taxonomic groups. This table compares the results of analyzing the same DNA using the best available Sanger sequencing method and the NGS method. The results of each experiment (experiment numbers correspond to those in Table 1) show that the NGS method yields significantly more and longer barcode sequences than the Sanger method, and in two cases is the only method that could produce barcode sequences.
Min Seq Max Seq Mean Seq o .5' Recovered Mode Seq Success s=1 = 0 Length Length Length ct j/D"- Seqs Length (bp) Rate . H (bp) (bp) (bp)
- 23 -;.. H H H ;.. H ;.. H ;..
H
0 w 0 w 0 w 0 w 0 w 0 w tt) c./D to c./D to c./) to c./D ari c/D to c./D
ct ct ct ct ct ct c/D Z c/D Z c/D Z c/D Z c/D Z c/D Z
3 Lepidoptera 95 10 74 84 39 407 658 158 342 164 279 11% 78%
7 Coleoptera 846 110 568 86 35 658 658 193 334 164 658 13% 67%
8 Arachnids 94 7 89 164 95 480 658 299 495 307 658 7% 95%
9 Arachnids 190 0 164 0 52 0 658 0 426 0 658 0% 86%
iles/
Rept 95 1 21 166 55 166 371 166 150 166 None 1% 22%
amphibians 13 Mammals 95 0 23 0 56 0 658 0 239 0 None 0% 24%
Shokralla eta! [20,21] used NGS to recover full-length barcodes from freshly collected specimens of Lepidoptera with a single primer pair. However, the present novel method now demonstrates that NGS can regularly recover complete or near-complete barcodes from century-old specimens with heavily degraded DNA. Moreover, because it requires little template DNA, much of each DNA extract remains for future analysis. Although analytical costs were approximately $10 CAD a specimen, a 4-fold increase in the number of specimens processed in each run is feasible with a move to a NGS platform generating more reads, resulting in an estimated cost of less than $3 per sample.
While initially only applied to 10 samples simultaneously, the NGS method can be applied to 96 samples simultaneously without decreasing sequence recovery (Figure 8).
Additionally, subsequent experiments have demonstrated the method to be successful for over 400 different families of animals, covering several different phyla (Table 2).
Samples fixed in formalin and preserved in ethanol were also successfully analyzed, from all major groups examined to date: spiders, freshwater insects, molluscs, crustaceans, reptiles, and mammals (Table 2). DNA barcodes have also been successfully recovered from forensic specimens and samples of heavily processed materials confiscated by wildlife enforcement officers (data not shown).

tµ.) Table 2 - This table lists each experiment used to develop, optimize, and enhance the NGS method. The purpose of each experiment is included, as o ,-, -.I
well as information on the samples employed for each experiment. The type of sequencing used for each experiment and overall success rates are o tµ.) listed. The degree of DNA damage was estimated based on the ease of which barcodes could be amplified using Sanger sequencing methods.
No. No.

vi Preservation Cause of Degree of Sequencing Exp Samples Purpose Sample Famili Success Method DNA Damage DNA
Damage method s es Rate Lepidoptera Low, medium, Sanger, 1 Initial test 30 1 Dry Age 100%
(types) high NGS
Lepidoptera 2 Test high throughput 95 8 Dry Age High NGS 100%
(types) Compare performance to Sanger, 3 Lepidoptera 94 20 Dry Unknown High 78%
Sanger NGS
Mixed Sanger, P
4 Primer test on other taxa 376 371 Dry None None 99% 0 Arthropods th NGS o N) Freshwater Sanger, u, Primer test on other taxa 94 85 Fluid None None 96 /0 u, invertebrates NGS t.) 6 Vertebrates Primer test on other taxa 95 93 Fluid None None Sanger, 98%
NGS
, Large scale primer test on Sanger, , 7 Coleoptera 846 70 Dry Age, unknown High 67%
problematic group NGS
Test ethanol preserved Sanger, 8 Arachnids 94 14 Fluid Age High 95%
specimens NGS
Test formalin fixed Formalin 9 Arachnids 190 20 Fluid High NGS 86%
specimens fixation Reptiles/am Test formalin fixed Formalin High Sanger, 95 12 Fluid 22%
phibians specimens fixation NGS Iv EPT's, Test formalin fixed unkno Formalin n Diptera specimens wn Fluid fixation High NGS 47% n Molluscs, Test formalin fixed Formalin 12 95 66 Fluid High NGS 19% tµ.) o crustaceans specimens fixation c:
Test formalin fixed Formalin Sanger, 13 Mammals 95 15 Fluid High
24% vi specimens fixation NGS =

o
- 25 -Furthermore, new primer sets can be developed, a task facilitated by the well-parameterized barcode reference library for the animal kingdom, and subsequent experiments have demonstrated that a primer set designed for vertebrates provides increased barcode amplification in comparison with the standard primers outlined here (Figure 9). Indeed the method of the invention can be used to amplify and sequence any desired degraded DNA as primers can be tailored for any given sequence. Past research has employed NGS
to sequence genomes, but this study has demonstrated its value in probing sequence diversity in single gene regions when combined with two step multiplex PCR as described herein. A large-scale program to sequence type specimens would represent a major advance in stabilizing and validating the application of scientific names. As well, because many type specimens derive from developing nations, it would represent an important step in the repatriation of knowledge that will aid these nations in managing their biodiversity by enabling DNA-powered identification systems, a major advance in settings where the scientific workforce is small and biodiversity is high.
In a specific aspect, the method described herein involves the following protocol:
1) PCR1 a. Two separate reactions, one with primers Fl, F3, F5 + R1-R6 (PCR1a), the other with F2, F4, F6 + R2-R6 (PCR1b). Separate reaction are necessary to prevent non-target amplification. Primers are tailed with short non-complimentary sequence to prevent another form of non-target amplification.
2) PCR2 a. Six separate reactions, one for each forward primer plus the next three downstream reverse primers (or next two or next one, if only two or one downstream reverse primer exists).This reaction uses PCR1 product as template (PCRla product for PCR2 F 1,F3,F5 reactions, and PCR1b for PCR2 F2, F4, F6 reactions). PCR2 forward and reverse primers contain MID tags to associate amplicons with individual specimens, so that multiple specimens can be sequenced simultaneously.
3) PCR purification a. Following PCR2, all six reactions (or all 6 plates of reactions in the case of the 95-plex version) are pooled. We can do this because the MID tags will allow us to re-associate the resulting sequence reads with their original sample. An aliquot of the pooled reactions is purified for sequencing.
4) Sequencing
- 26 -a. Purified products are quantified and diluted to sequencing manufacture's recommendation. The diluted product is then sequenced following manufacturer's instructions.
5) Data Analysis a. The resulting sequence reads are de-multiplexed via the MID tags and split into separate datasets, one for each specimen (typically 95 datasets) b. Each dataset is processed through a bioinformatics pipeline that trims off primer, MID, and adapter sequences and filters out low quality reads. The filtered reads, which ideally overlap with one another, are then arranged into a contiguous sequence, which ideally will be a full-length barcode. Formation of the contig can involve alignment of reads to a reference, but can in theory also be de novo (i.e. no reference sequence involved).
Examples Materials and Methods The following pertains specifically to the initial experiment, the purpose of which was to develop and optimize the NGS method. Subsequent experiments contained minor modifications, such as the use of additional MID tags in the primer sequences to increase throughput, or the use of different taxa and associated primers, but the overall design and principal of the protocol remained the same.
Type Specimens Tissue samples were obtained from 1820 specimens (mostly primary types but some were equally important non-types) of Geometridae (Lepidoptera) from the Natural History Museum (London) as part of a project to develop a strongly validated taxonomic system to support species inventories and studies of host plant use in Papua New Guinea [22,23].
Genitalic dissections of these specimens generated residual tissue that was held frozen until its use in the present study.
DNA Extraction All tissue samples were processed in an isolated 'clean' laboratory at the Canadian Centre for DNA Barcoding (CCDB; www.ccdb.ca) with dedicated reagents, supplies and protective clothing. Each sample was incubated overnight in lysis buffer, following a modified
- 27 -protocol of Knolke et al [24], before DNA was extracted using a silica membrane-based method in either single columns or 96-well plate format [25]. To maximize the concentration of extracted DNA, elution from each silica membrane was performed with 30 [tI, of pre-warmed (to 56 C) 10mM Tris HC1.
Sanger Sequencing Since DNA quality varies greatly, even among specimens of similar age [2,8], each DNA extract was initially assessed by Sanger analysis. This involved an attempt to amplify both 164bp (C microLepF 1 tl + C TypeR1) and 94bp (C TypeF1 + C TypeR1) regions of the COI barcode [2,9]. PCR amplification and cycle sequencing employed standard CCDB
protocols [2,25,26] with amplicons bidirectionally sequenced on an ABI 3730XL
(Applied Biosystems). All traces were edited using CodonCode v. 4.2.7 (CodonCode Corporation) and the resulting 164bp and 94bp sequences were validated by comparison with sequences from conspecific individuals or, when they were unavailable, by Neighbor-Joining (NJ) analysis to ensure that each sequence branched as expected. These tests for sequence recovery permitted the assignment of DNA from each specimen to one of three categories: 1) High Quality (HQ) ¨ those that generated a 164bp sequence; 2) Medium Quality (MQ) ¨ those that generated a 94bp sequence; and 3) Low Quality (LQ) ¨ those that failed to generate any sequence. The present study examined ten specimens from each category with the goal of developing a NGS
protocol effective across varying levels of DNA degradation. The specimens (Table 3) selected for analysis included a single representative from each of 30 genera in the family Geometridae, all more than a century old (mean age = 111 years). Sequences, electropherograms and primer details for the specimens are on BOLD (dx.doi.org/10.5883/DS-NGSTYPES) and GenBank (see Table 3 for accession numbers).

tµ.) Table 3. Type specimens analyzed, including sequencing results and accession numbers c:
,-, -.1 o NGS
Sequence tµ.) Process ID No.
Sanger Age Sanger Min. Max. Avg.
Recovered Contig Read --4 (Sanger! Identification Status NGS
Genbank vi (Yrs) Group Coy. Coy. Coy. bp by NGS GenBank Archive NGS) Reads Acc.
Acc.
Acc.

Myrioblephara / PNGTY1837- 104 Syntype HQ 143804 7 115751 29924 658 pending pending SRR1867808 mixticolor I Cassephyra P
112 Holotype HQ 213007 72 146189 42992 658 pending pending SRR1867811 PNGTY1827- plenimargo o N) .
.
, I Psilalcis 109 Syntype HQ 106286 0 74012 20477 448 pending pending SRR1867812 ,I, PNGTY1843- auropurpurea , , , I Paralcidia 110 Syntype HQ 221885 5 168474 44541 657 pending pending SRR1867813 PNGTY1839- marginata Iv I Atmoceras n 110 Syntype HQ 143340 30 76376 28215 658 pending pending SRR1867814 1-3 PNGTY1823- plumosa n tµ...) 'a tA
o o --.) o 14 / Tripteridia 110 viridisecta Syntype HQ 188107 1 101191 37855 570 pending pending SRR1867815 14 / Gymnoscelis 111 PNGTY1834- ochriplaga Holotype HQ 186897 1 103399 38169 657 pending pending SRR1867816 14 / Axinoptera 110 fiata Holotype HQ 166116 0 83389 31838 474 pending pending SRR1867817 asc 14 / Calluga 112 semirasata Holotype HQ 215946 1 154302 43408 658 pending pending SRR1867818 123 Eois semirubra Holotype HQ 232024 106 143908 46803 658 pending pending SRR1867819 / PNGTY1831- 112 Collix ghoshaN/A MQ 11665 0 11082 2142 459 pending pending SRR1945335 dichobathra / PNGTY1838- 111 PapuarismeHolotype MQ 6479 0 5747 1165 569 pending pending SRR1945382 brunneata n.) o 1--, / PNGTY1835- 109 HyposidraN/A MQ 62208 0 45441 12301 570 pending pending SRR1945383 apiciftdva un / PNGTY1836- 101 Milionia know/el Syntype MQ 44190 1 43301 9153 658 pending pending SRR1945384 PNGTY587-13 Ctimene / PNGTY1846- 118 basistraga Syntype MQ 546 0 105 31 323 pending pending SRR1946575 15 obsoleta PNGTY639-13 Psendensemia P
/ PNGTY1842- 102 bursadoides Syntype MQ 134542 37 82031 27382 658 pending pending SRR1945385 , .
.
15 dignitosa co Lo N) u, PNGTY917-13 0"
Pingasa nob//is / PNGTY1840- 108 Holotype MQ 46793 6 24276 9516 658 pending pending SRR1945386 furvifrons , , , Aeolochroma / PNGTY1821- 121 Holotype MQ 68837 3 43442 14002 658 pending pending SRR1945387 caesia Sarcinodes / PNGTY1844- 106 Holotype MQ 99655 86 42405 20379 658 pending pending SRR1945388 subvirgata Iv n Celerena /erne n / PNGTY1828- 105 Holotype MQ 113363 1 42333 21657 569 pending -- pending -- SRR1945389 tµ...) amplimargo 'a un o o o C
n.) o / PNGTY1832- 104 Dyscheralcis Syntype LQ 2681 0 1278 424 514 N/A pending SRR1867935 retroflexa un 110 Alcis irrufata Holotype LQ 49944 2 37401 8881 657 N/A pending SRR1867936 14 / Cleora repetita Syntype LQ 7632 1 5708 731 454 N/A pending SRR1867937 PNGTY1830- suffusa P
.
.
PNGTY008- Spectrobasis .
N) 109 Syntype LQ 1468 0 1157 280 237 N/A N/A SRR1867938 u, 12* / N/A difjerens t.,..) N) .

PNGTY073- Desmoclysha 104 Holotype LQ 320 0 116 40 357 , 12* / N/A umpuncta , , PNGTY102- Sterrhochaeta 120 Syntype LQ 3081 0 215 91 324 12* / N/A minuta PNGTY120- Propithex 118 Holotype LQ 14863 0 1554 163 323 12* / N/A alternata Iv / PNGTY1825- 105 Bursadopsis Syntype LQ 133263 4 71444 20742 634 N/A pending SRR1867942 n plenifascia n t...., 'a u, =

=

14 / Chloroclystis 117 Syntype LQ 4411 0 1685 647 419 N/A pending SRR1867943 PNGTY1829- rufofasciata Polyacme 112 straminea Holotype LQ 141402 23 71197 28117 658 N/A pending SRR1867944 brunneata The four Process ID's marked with an asterisk (*) represent specimens where NGS analysis generated sequence reads from multiple species. HQ ¨
high quality; MQ ¨ medium quality; LQ ¨ low quality; N/A ¨ not applicable.
1-d c7, Next-generation Sequencing DNA degradation often limits PCR amplicons to <200bp in specimens that are more than 50 years old [18], precluding efforts to recover the entire barcode region with one or two primer sets. As a consequence, primer sets were designed to amplify fragments ranging in length from 120bp to 148bp with enough overlap to permit recovery of the 658bp barcode region. These primers needed to be tailed with adapter sequences for analysis on an Ion Torrent PGM (Life Technologies) and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. Ten sets of MID-tagged primers, each consisting of six forward and six reverse primers, were employed to analyze ten type specimens per NGS run (Table 4).
Table 4. Primers used in the first (PCR1) and second (PCR2) reactions to allow the analysis of 10 specimens in an Ion Torrent PGM run.
PCR Code Primer Name Sequence (5'-3') MID
Adapter ATTCAACCAATCATA
Fl LepF 1-Sanger-ion 1 None None AAGATATTGG
AT TRRWRA TGATCAA
F2 AncientLepF2-Sanger-ion 1 None None RTWTATAAT
TTATAATTGGDGGRT
F3 AncientLepF3 -Sanger-ion 1 None None AGWAGWATWRTWR
F4 AncientLepF4-Sanger-ion 1 None None AWAVVVGG
ATTTTTWSWCTWCA
F5 AncientLepF5 -Sanger-ion 1 None None TWTDGCWGG
TATTTGTWTGAKCW
F6 AncientLepF6-Sanger-ion 1 None None RTWKKWATTAC
WGGTATWACTATRA
R1 AncientLepR 1 -Sanger-ion 1 None None ARAAAAT TA T

TCARAAWCTWATRT
R2 AncientLepR2-Sanger-ion2 None None TRTTTADWCG

ARDGGDGGRTAWAC
R3 AncientLepR3-Sanger-ion3 None None WGTTCAWCC
GTWGWAATRAARTT
R4 AncientLepR4-Sanger-ion4 None None DATWGCWCC
GTTARWARTATDGT
R5 AncientLepR5-Sanger-ion5 None None RATDGCWCC
TAAACTTCTGGATGT
R6 LepR1-Sanger-ion6 None None CCAAAAAATCA
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF 1-ion 1 CTCAG AT TCAAC CAA A
ss 1 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion 1 CTCAG AT TRRWRAT G A
ss 1 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3 -ion 1 CTCAG TTATAATTGG A
ss 1 DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion 1 CTCAG AGWAGWATW A
ss 1 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5 -ion 1 CTCAG ATTTTTWSWC A
ss 1 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion 1 CTCAG TATTTGTWTG A
ss 1 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
= IonXpre PCR2 Fl LepF 1 -ion2 CTCAG ATTCAACCAA A
ss2 TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion2 CTCAG ATTRRWRAT A
ss2 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion2 CTCAG TTATAATTGG A
ss2 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion2 CTCAG AGWAGWAT A
ss2 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion2 CTCAG ATTTTTWSWC A
ss2 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion2 CTCAG TATTTGTWTG A
ss2 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion3 CTCAG ATTCAACCAA A
ss3 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion3 CTCAG ATTRRWRATG A
ss3 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F3 AncientLepF3-ion3 CTCAG TTATAATTGG A
ss3 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion3 CTCAG AGWAGWATW A
ss3 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion3 CTCAG ATTTTTWSWC A
ss3 TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion3 CTCAG TATTTGTWTG A
ss3 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion4 CTCAG ATTCAACCAA A
ss4 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion4 CTCAG ATTRRWRATG A
ss4 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion4 CTCAG TTATAATTGG A
ss4 DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion4 CTCAG AGWAGWATW A
ss4 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion4 CTCAG ATTTTTWSWC A
ss4 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion4 CTCAG TATTTGTWTG A
ss4 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion5 CTCAG ATTCAACCAA A
ss5 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F2 AncientLepF2-ion5 CTCAG ATTRRWRAT A
ss5 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion5 CTCAG TTATAATTGG A
ss5 DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion5 CTCAG AGWAGWAT A
ss5 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion5 CTCAG ATTTTTWSW A
ss5 CTWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion5 CTCAG TATTTGTWTG A
ss5 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion6 CTCAG ATTCAACCAA A
ss6 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion6 CTCAG ATTRRWRATG A
ss6 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion6 CTCAG TTATAATTGG A
ss6 DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion6 CTCAG AGWAGWATW A
ss6 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion6 CTCAG ATTTTTWSWC A
ss6 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion6 CTCAG TATTTGTWTG A
ss6 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
= IonXpre PCR2 Fl LepF I -ion7 CTCAG ATTCAACCAA A
ss7 TCATAAAGATATTGG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion7 CTCAG ATTRRWRATG A
ss7 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion7 CTCAG TTATAATTGG A
ss7 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion7 CTCAG AGWAGWATW A
ss7 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion7 CTCAG ATTTTTWSWC A
ss7 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion7 CTCAG TATTTGTWTG A
ss7 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion8 CTCAG ATTCAACCAA A
ss 8 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion8 CTCAG ATTRRWRATG A
ss 8 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F3 AncientLepF3-ion8 CTCAG TTATAATTGG A
ss 8 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion8 CTCAG AGWAGWATW A
ss 8 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion8 CTCAG ATTTTTWSWC A
ss 8 TWCATWTDGCWGG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion8 CTCAG TATTTGTWTG A
ss8 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF 1 -ion9 CTCAG ATTCAACCAA A
ss9 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion9 CTCAG ATTRRWRAT A
ss9 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion9 CTCAG TTATAATTGG A
ss9 DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion9 CTCAG AGWAGWAT A
ss9 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion9 CTCAG ATTTTTWSW A
ss9 CTWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion9 CTCAG TATTTGTWTG A
ss9 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepFl-ion10 CTCAG ATTCAACCAA A
ss10 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F2 AncientLepF2-ion10 CTCAG ATTRRWRATG A
ss10 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion10 CTCAG TTATAATTGG A
ss10 DGGRTTTGGWAATTG

CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion10 CTCAG AGWAGWATW A
ss10 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion10 CTCAG ATTTTTWSWC A
ss10 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion10 CTCAG TATTTGTWTG A
ss10 AKCWRTWKKWATTAC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R1 AncientLepRl-ionl-trP 1 WGGTATWACTATRAAR trP 1 ss 1 AAAATTAT
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R2 AncientLepR2-ion2-trP 1 TCARAAWCTWATRTTR trP 1 ss2 TTTADWCG
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R3 AncientLepR3-ion3-trP 1 ARDGGDGGRTAWACWG trP 1 ss3 TTCAWCC

CCTCTCTATGGGCAGTCGGTGAT
IonXpre R4 AncientLepR4-ion4-trP1 GTWGWAATRAARTTDA trP1 ss4 TWGCWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R5 AncientLepR5-ion5-trP 1 GTTARWARTATDGTRAT trP1 ss5 DGCWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R6 LepRl-i on6-trP1 TAAACTTCTGGATGTCC trP1 ss6 AAAAAATCA
The "Code" column refers to primer labels in Fig. 2. The COI binding region within each primer sequence is shown in black, while the 10bp tail (PCR1) or MID tag (PCR2) is shown in blue.
The "key sequence" (required for Ion Torrent sequencing) is shown in green and the sequencing adapters are shown in red. The 10bp tails on the PCR1 primers are technically IonXpress MID

tags, but they serve only to block short amplicons from acting as primers during PCR1. They were chosen over random decamer tails to maximize primer-template matching in PCR2. The same forward and reverse PCR1 primers are used for all ten samples in the first round of PCR. In the second round of PCR, samples are discriminated by using ten different sets of MID-tagged forward PCR2 primers (the same set of PCR2 reverse primers is used for all ten samples).
Optimization of NGS Protocols Optimization studies tested the impact of varied primer combinations, number of PCR
cycles, differential concentrations of primers and nesting of PCRs. Efforts to multiplex all six forward and reverse primers in a single reaction were unsuccessful because the small regions of overlap were preferentially amplified over the six target fragments. Splitting the PCR into two reactions, each targeting non-adjacent fragments (e.g. PCR1 = F 1+R1, F3+R3, F5+R5; PCR2 =
F2+R2, F4+R4, F6+R6), solved this issue, but revealed another problem: the dominance of certain amplicons. This problem was overcome by mixing the forward primers with the full complement of reverse primers (e.g. PCR1 = F 1+F3+F5 + six reverse primers; PCR2 =
F2+F4+F6 + 5 reverse primers). This allowed each forward primer to potentially pair with several downstream reverse primers, creating redundancy that improved sequence recovery while reducing the dominance of any particular amplicon. For example, depending upon the quality of the template DNA, the barcode segment amplified by primers F4+R4 could be amplified by any of the twelve combinations of Fl, F2, F3 or F4 paired with R4, RS or R6. This redundancy aided the recovery of full-length barcodes from specimens with varied degrees of DNA degradation or with particular primer mismatches (as evidenced by the lack of a certain product in the final sequence array).
When DNA quality is poor, primer binding becomes increasingly important to "kick start"
amplification [26]. Perfect primer binding is impossible when diverse taxa are analyzed, but the prospects for recovery of desired amplicons can be improved by raising the number of PCR cycles and by increasing the primer degeneracy. Both tactics were employed in the present NGS protocol.
Two rounds of PCR were employed, with 60 cycles in the first and 40 cycles in the second.
All forward and reverse primers included degeneracy at the sites most important for primer binding (3' terminus for F, 5' terminus for R). Considering this degeneracy, the 12 forward and reverse primers were actually a cocktail of 2010 primers. Other factors were found to have important impacts on final outcomes. For example, initial tests revealed that primers with the 33bp-40bp adapter/MID tails required for NGS were less effective in generating product than the same primers without tails, a difference that was particularly strong for LQ extracts. This difference was probably due to interference with primer binding caused by the formation of secondary structures in the primers with tails. Although primers without tails produced the highest amplification success, their use allowed short, non-target amplicons to act as primers generating chimeric amplicons which combined sequence information from primers and the specimen.
To overcome this problem, 10bp tails lacking complementarity to any region in the target genomes were added to the 5' terminus of all primers. Their presence inhibited polymerase elongation when short amplicons or primer dimers attempted to act as primers, preventing the formation of chimeric amplicons while avoiding the secondary structure issues inherent with longer tails. Although the first round of PCR was effective in generating amplicons, a second round of PCR was used to introduce the adapter-tailed primers for sequence analysis. It likely had the additional benefit of reducing amplification bias because it involved six separate reactions, one for each forward primer, dampening amplification bias by limiting primer competition.
Final NGS Protocol These experimental studies led to the development of a two-stage, nested, multiplex PCR
protocol which produced sequence records spanning the barcode region. The first round of PCR
included two reactions for each specimen (PCR 1.1 and PCR 1.2 in Fig. 2a), each consuming 21..t.L
of genomic DNA as template. Each reaction included three forward primers (F
1+F3+F5 or F2+F4+F6) with six and five reverse primers respectively, allowing each forward primer to generate from 1-6 amplicons, depending on the quality of DNA and its binding position in relation to the reverse primers. Detailed reaction components (final volume = 12.5 [tL) are provided in Table 5. Thermocycling consisted of 94 C for 2 minutes, 60 cycles of {94 C for 40 seconds, 48 C
for 40 seconds, 72 C for 30 seconds}, and a final extension of 72 C for 5 minutes.

Table 5. Components of PCR reactions in the NGS protocol.
PCR 1.1 PCR 1.2 PCR 2.1, 2.2' PCR 2.5 PCR 2.6 10% Trehalose 5.125 [IL 5.25 [IL 5.75 [IL 5.875 [IL 6.0 [IL
H20 0.13 [IL 0.13 [IL 0.13 [IL 0.13 [IL 0.13 [IL
5X Buffer 2.5 [IL 2.5 [IL 2.5 [IL 2.5 [IL 2.5 [IL
25 mM MgCl2 1.25 [IL 1.25 [IL 1.25 [IL 1.25 [IL 1.25 [IL
[IM primers 0.125 [IL each 0.125 [IL each 0.125 [IL each 0.125 [IL each 0.125 [IL each 10 [IM dNTP 0.0625 [IL 0.0625 [IL 0.0625 [IL 0.0625 [IL
0.0625 [IL
Taq (5U1 [IL) 0.06 [IL 0.06 [IL 0.06 [IL 0.06 [IL 0.06 [IL
Template 2 [IL 2 [IL 2 [IL 2 [IL 2 [IL
TOTAL 12.5 [IL 12.5 [IL 12.5 [IL 12.5 [IL 12.5 [IL
Reactions differ only in the number of primers and the amount of trehalose.
Trehalose sourced from Fluka Analytical; Hyclone ultra-pure water from Thermo Fisher Scientific;
Buffer, MgCl2, and Taq polymerase from KAPA Biosystems; primers from Integrated DNA
Technologies.
In Figure 2 the primer positions for the first and second rounds of PCR (a) and all possible final amplicons (b) is shown. The initial round of PCR includes two separate reactions (a - above broken line) using 10bp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR includes six separate reactions (a -below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID tags unique to that specimen. To assign each amplicon to a particular reaction (i.e. 2.1, 2.2, 2.3, etc.), each reverse PCR2 primer is tailed with a MID tag unique for each reaction in the second round of PCR.
The second round of PCR used product from the first PCR reactions as template and included six reactions per specimen (PCR 2.1-2.6 in Fig. 2a), each coupling a single forward primer with one to three reverse primers and using 24, of the appropriate primary PCR product as template. It boosted amplicon yields while also adding the required sequencing adapters. Each secondary PCR generated 1-3 amplicons which collectively spanned the COI
barcode region (Fig.
2b). The first four PCRs (2.1-2.4 in Fig. 2a) contained forward primers F1-F4, each combined with the three immediately downstream reverse primers (e.g. F1+R1+R2+R3). The fifth PCR (2.5 in Fig. 2a) combined F5 with R5 and R6, while the sixth PCR (2.6 in Fig. 2a) combined F6 with R6. All of these reactions employed primers with adapter tails and MID tags to enable NGS to discriminate fragments and/or individuals in post processing. Detailed reaction components (final volume = 12.5 L) are provided in Table 5. Thermocycling consisted of 94 C for 2 minutes, 40 cycles of {94 C for 40 seconds, 48 C for 40 seconds, 72 C for 30 seconds}, and a final extension of 72 C for 5 minutes.
The secondary PCR products from each specimen (six reactions) were pooled and a double size selection protocol (PCRClean DX kit ¨ Aline Biosciences) was employed to remove genomic DNA, primer dimers and residual primers. The first cleanup step was designed to remove any high molecular weight genomic DNA (>800bp) that might reflect recent contamination (e.g. human DNA from researchers working with the specimens). Briefly, the PCR product and magnetic beads were incubated in a 2:1 ratio (volume PCR product: volume beads) for 8 minutes at room temperature followed by 2 minutes on a magnet. The pellet of beads was discarded, while the supernatant was retained for the second cleanup step which was designed to bind molecular weights ranging from 250bp-700bp (i.e. the PCR products) onto beads, while lower molecular weight DNA (primer dimers, residual primers) remained in solution. This step was carried out by mixing enough beads and sterile water to generate a 5:4 ratio (PCR product:
beads) and incubated for 8 minutes followed by two minutes on a magnet. The supernatant was discarded and the pellet of beads was washed three times with 80% ethanol before the PCR products were eluted from the beads with 36 tL of sterile water. Following cleanup, the concentration of each purified PCR
product was measured on a Qubit 2.0 spectrophotometer using the Qubit dsDNA HS
Assay Kit (Life Technologies) and all 10 samples were normalized to 1 ng/ L and mixed in equal proportions. From this mixture, the final sequencing template library was created by making a 1/300 dilution. An Ion PGM Template 0T2 400 kit (Life Technologies) was used for template preparation and sequencing was carried out on an Ion Torrent PGM following the manufacturer's instructions. Sequencing was performed on a 316 chip using an Ion PGM
Sequencing 400 Kit (Life Technologies).
Data Analysis Raw data from each Ion Torrent PGM run were uploaded to the Galaxy platform for analysis (https://usegalaxy.org/) [27]. Several filters were applied to remove low quality, short, and non-target reads before an alignment was constructed to assemble the full barcode contig.
Representative examples of the sequence reads recovered from HQ and LQ
extracts are shown in Figure 6. The resultant FASTA file was then exported to permit comparisons with Sanger-generated sequences in BOLD. The authenticity of each NGS-generated sequence was subsequently validated by querying the sequence against the BOLD
Identification Engine (www.boldsystems.org) to check for contamination or non-target amplification.
Further validation was performed via Neighbor-Joining (NJ) analysis that included the NGS-generated sequences as well as sequences from recently collected specimens of the same species or close relatives. The compiled reads from each run were deposited in the Sequence Read Archive (SRA;

http://www.ncbi.nlm.nih.gov/sra) under study accession 5RP055961 (see Table 3 for individual sample accession numbers), while the barcode contig for each specimen was deposited in the BOLD dataset (dx.doi.org/10.5883/DS-NGSTYPES) and in GenBank (see Table 3 for accession numbers).
Results Because the NGS protocol allowed the simultaneous processing often specimens, just three runs were required to analyze the 30 specimens. The average number of sequence reads per specimen showed five-fold variation (182K, 59K, 36K), while the average depth of coverage per base showed six-fold variation (36K, 12K, 6K) across the three DNA categories (Figs. 3a and 3b).
The number of reads per specimen averaged 90K, resulting in an average coverage depth of 18K
per base. Sequences were recovered from every specimen with reads averaging 610bp, 578bp, and 458bp for the HQ, MQ and LQ extracts respectively (Fig. 3c). Barcode compliant sequences (>487bp) were recovered from 8 HQ, 8 MQ, and 4 LQ specimens (Table 3), while sequence records >400bp were recovered from 25 of 30 specimens (83%). In fact, more than 200bp of sequence data was recovered from all 30 specimens (Table 3). The recovery of sequences from ten type specimens in each of three DNA categories was shown in Figure 3.
The sequences generated by NGS samples from the HQ and MQ specimens were perfectly matched in their zones of overlap to the shorter sequences generated by Sanger analysis (Fig. 4).
The protocol does involve 100 cycles of PCR amplification, but there was no evidence of artifacts when the NGS sequences were compared to their Sanger counterparts (Fig. 4).
Further confirmation of their validity was provided by the fact that they grouped with sequences from closely allied taxa (Fig. 5). It was more difficult to verify the sequences obtained via NGS from the LQ specimens because they had no Sanger counterparts for comparison. In six cases, the NGS
sequences clearly derived from a single species, but reads from the other four specimens appeared to originate from two or more species. Obvious contaminants (e.g. fungi, bacteria) were easily removed during post processing, but some sequences in these four records appeared to derive from closely allied species or pseudogenes. In principle, the contaminants and authentic sequences could be discriminated if reference sequences were available from modern specimens of these species, but they were not. Because the four specimens showing these admixtures generated the fewest sequence reads and the lowest depth of coverage, it is likely that their DNA
was heavily degraded (Table 3). Once contemporary sequences for these species become available, it should be possible to recognize the authentic sequences.
To summarize, the current method works on a plurality of samples simultaneously with high success rates for good quality degraded DNA with a slight drop for lower quality degraded DNA. The method still works for samples that may contain almost no intact DNA.
Lowest quality degraded DNA was still amplified and characterized using the method of the invention and shown to recover >500bp sequences from samples that failed using traditional Sanger approaches. The method may be used universally on any type of degraded DNA sample for many applications including environmental, forensics and food industry (cooked foods contain degraded DNA), generally in any application where DNA is degraded due to age, environment, processing and so forth. The method can be customized for invertebrates, mammals, fish, birds and so forth. In one aspect, the method effectively amplifies and characterizes entire barcode regions for use in biological classification. This will be helpful for classification of old specimens such as for example those found in museums [2-5,28], as demonstrated in two recent studies [29,30].
The invention can be provided as a system in a kit containing the desired primers, buffers, enzymes, instructions for use and so forth. A kit may be customized for a particular specimen, a specimen that would comprise degraded DNA.
It is to be noted that the term "a" or "an" entity refers to one or more of that entity. For example, "a characteristic" refers to one or more characteristics or at least one characteristic. As such, the terms "a" (or "an"), "one or more" and "at least one" are used interchangeably herein. It is also to be noted that the terms "comprising", "including", and "having"
have been used interchangeably.
Ranges: throughout this disclosure, various aspects described herein can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope described herein. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
It will be understood that any aspects described as "comprising" certain components may also "consist of' or "consist essentially of," wherein "consisting of' has a closed-ended or restrictive meaning and "consisting essentially of' means including the components specified but excluding other components except for materials present as impurities, unavoidable materials present as a result of processes used to provide the components, and components added for a purpose other than achieving the technical effect described herein. For example, a composition defined using the phrase "consisting essentially of' encompasses any known pharmaceutically acceptable additive, excipient, diluent, carrier, and the like. Typically, a composition consisting essentially of a set of components will comprise less than 5% by weight, typically less than 3%
by weight, more typically less than 1% by weight of non-specified components.
It will be understood that any component defined herein as being included may be explicitly excluded from the claimed invention by way of proviso or negative limitation.
Many patent applications, patents, and publications are referred to herein to assist in understanding the aspects described. Each of these references are incorporated herein by reference in their entirety.
The foregoing examples and detailed description are offered by way of illustration and not by way of limitation. All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the scope of the appended claims.

References 1. Hebert PDN, Cywinska A, Ball S, deWaard JR. Biological identifications through DNA
barcodes. Proc R Soc Lond B Biol Sci. 2003; 270: 313-321.
2. Hebert PDN, deWaard JR, Zakharov EV, Prosser SWJ, Sones JE, McKeown JTA, et al. A DNA
`Barcode Blitz': Rapid digitization and sequencing of a natural history collection. PLoS ONE.
2013; 8: e68535. doi:10.1371/journal.pone.0068535.
3. Mutanen M, Kekkonen M, Prosser SW, Hebert PDN, Kaila L. One species in eight: DNA
barcodes from type specimens resolve a taxonomic quagmire. Mol Ecol Resour.
2014; doi:
10.1111/1755-0998.12361.
4. Hausmann A, Hebert PDN, Mitchell A, Rougerie A, Sommerer M, Edwards T.
Revision of the Australian Oenochroma vinaria Guenee, 1858 species-complex (Lepidoptera:
Geometridae, Oenochrominae): DNA barcoding reveals cryptic diversity and assesses status of type specimen without dissection. Zootaxa. 2009a; 2239: 1-21.
5. Kirchman JJ, Witt CC, McGuire JA, Graves GR. DNA from a 100-year-old holotype confirms the validity of a potentially extinct hummingbird species. Biol Lett. 2010; 6:
112-115.
6. Gilbert MTP, Moore W, Melchior L, Worobey M. DNA extraction from dry museum beetles without conferring external morphological damage. 2007; PLoS ONE. 2: e272.
doi :10.1371/j ournal .pone .0000272.
7. Thomsen PF, Elias S, Gilbert MTP, Haile J, Munch K, Kuzmina S, et al. Non-destructive sampling of ancient insect DNA. PLoS ONE. 2009; 4: e5048.
doi:10.1371/journal.pone.0005048.
8. Dean MD, Ballard JWO. Factors affecting mitochondrial DNA quality from museum preserved Drosophila simulans. Entomol Exp Appl. 2001; 98: 279-283.

9. Hernandez-Triana LM, Prosser SW, Rodriguez-Perez MA, Chaverri LG, Hebert PDN, Gregory, TR. Recovery of DNA barcodes from blackfly museum specimens (Diptera:
Simuliidae) using primer sets that target a variety of sequence lengths. Mol Ecol Resour. 2013;
14: 508-518. doi:
10.1111/1755-0998.12208.
10. Van Houdt JKJ, Breman FC, Virgilio M, De Meyer M. Recovering full DNA
barcodes from natural history collections of Tephritid fruitflies (Tephritidae, Diptera) using mini barcodes. Mol Ecol Resour. 2010; 10: 459-465.
11. Bluemel JK, King RA, Virant-Doberlet M, Symondson WOC. Primers for identification of type and other archived specimens of Aphrodes leafhoppers (Hemiptera, Cicadellidae). Mol Ecol Resour. 2011; 11: 770-774.
12. Hausmann A, Sommerer M, Rougerie R, Hebert P. Hypobapta tachyhalotaria n.
sp. from Tasmania ¨ an example of a new species revealed by DNA barcoding (Lepidoptera, Geometridae).
Spixiana. 2009b; 32: 161-166.
13. Lees DC, Lack HW, Rougerie R, Hernandez-Lopez A, Raus T, Avtzis ND, et al.
Tracking origins of invasive herbivores using herbaria and archival DNA: the case of the horse-chestnut leafminer. Front Ecol Environ. 2011; 9: 322-328.
14. Rougerie R, Naumann S, Nassig WA. Morphology and molecules reveal unexpected cryptic diversity in the enigmatic genus Sinobirma Bryk, 1944 (Lepidoptera:
Saturniidae). PLoS ONE.
2012; 7: e43920. doi:10.1371/journal.pone.0043920.
15. Lees DC, Rougerie R, Zeller-Lukashort C, Kristensen NP. DNA mini-barcodes in taxonomic assignment: a morphologically unique new homoneurous moth clade from the Indian Himalayas described in Micropterix (Lepidoptera, Micropterigidae). Zool Scr. 2010; 39:
642-661.

16. Strutzenberger P, Brehm G, Fiedler K. DNA barcode sequencing from old type specimens as a tool in taxonomy: A case study in the diverse genus Eois (Lepidoptera:
Geometridae). PLoS
ONE. 2012; 7: e49710.
17. Zimmermann J, Hajibabaei M, Blackburn DC, Hanken J, Cantin E, Posfai J, et al. DNA damage in preserved specimens and tissue samples: a molecular assessment. Front Zool.
2008; 5: 18.
18. Allentoft ME, Collins M, Harker D, Haile J, Oskam CL, Hale ML et al. The half-life of DNA
in bone: measuring decay kinetics in 158 dated fossils. Proc Biol Sci. 2012;
279: 4724-4733.
19. Rowe KC, Singhal S, Macmanes MD, Ayroles JF, Morelli TL, Rubidge EM, et al. Museum genomics: low-cost and high-accuracy genetic data from historical specimens.
Mol Ecol Resour.
2011; 11: 1082-1092. doi:10.1111/j.1755-0998.2011.03052.x.
20. Shokralla S, Gibson JF, Nikbakht H, Janzen DH, Hallwachs W, Hajibabaei M.
Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA
barcode capture from single specimens. Mol Ecol Resour. 2014; 14: 892-901.
21. Shokralla S, Porter T, Gibson J, Dobosz R, Janzen DH, et al. Massively parallel multiplex DNA sequencing for specimen identification using an Illumina Mi Seq platform.
Sci Rep, 2015; 5:
9687.
22. Holloway JD, Miller SE, Pollock DM, Helgen L, Darrow K. GONGED
(Geometridae of New Guinea Electronic Database): a progress report on development of an online facility of images.
Spixiana. 2009; 32: 122-123.
23. Miller SE. DNA barcode enabled ecological research on Geometridae in Papua New Guinea.
Spixiana. 2014; 37: 245-246.
24. Knolke S, Erlacher S, Hausmann A, Miller MA, Segerer AH. A procedure for combined genitalia dissection and DNA extraction in Lepidoptera. Insect Syst Evol.
2005; 35: 401-409.

25. Ivanova NV, deWaard JR, Hebert PDN. An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes. 2006; 6: 998-1002.
26. deWaard JR, Ivanova NV, Hajibabaei M, Hebert PDN. Assembling DNA barcodes.
Methods Mol Biol. 2008; 410: 275-293.
27. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol.
2010; Chapter 19:
Unit 19.10: 11-21.
28. Miller SE, Hausmann A, Hallwachs W, Janzen DH. Advancing taxonomy and bioinventories with DNA barcodes. Phil. Trans. R. Soc. B. 2016; 371 20150339. doi:
10.1098/rstb.2015.0339.
29. Spiedel W, Hausmann A, Muller GC, Kravchenko V, Mooser J, Witt TJ, et al.
Taxonomy 2.0:
Sequencing of old type specimens supports the description of two new species of the Lasiocampa decolorata group from Morocco (Lepidoptera: Lasiocampidae). Zootaxa, 2015;
3999: 401-412.
30. Hausmann A, Miller SE, Holloway JD, deWaard JR, Pollock D, Prosser SWJ, Hebert PDN.
Calibrating the taxonomy of a megadiverse insect family: 3000 DNA barcodes from geometrid type. Genome. 2016; 0,0. doi:10.1139/gen-2015-0197.

Claims (40)

Claims:
1. A two stage method for obtaining a full length barcode sequence from specimens with degraded DNA, the method comprising two step multiplex nested PCR utilizing primers that hybridize to portions of the barcode sequence that can pair in any combination to generate a plurality of amplicons that span the entire barcode sequence while avoiding overlap amplification, primer incorporation and/or primer dimer sequencing; and NGS
for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.
2. The method of claim 1, wherein said two step multiplex nested PCR co-amplifies fragments covering the barcode region.
3. The method of claim 2, wherein said two step multiplex nested PCR
comprises:
a first multiplex PCR1 wherein said primers that hybridize to the barcode sequence and simultaneously blocking undesired elongation to form a plurality of amplicons; and a second multiplex PCR2 utilizing the amplicons from PCR1 as a template and a plurality of primers that are adapter-tailed, wherein in PCR1 forward primers are selected for all downstream reverse primers.
4. The method of claim 3, wherein said primers of PCR1 are tailed with short, non-complementary sequences.
5. The method of any one of claims 1 to 4, wherein said specimen contains at least 0.1 ng of degraded DNA, such as at least 0.5 ng, 1 ng, 10 ng, 100 ng, 500 ng, or from 2µg-5µg of degraded DNA.
6. The method of any one of claims 1 to 5, wherein said barcode sequence is the 658 base-pair region in the mitochondrial cytochrome c oxidase 1 gene (COI).
7. The method of any one of claims 1 to 5, wherein said barcode sequence is matK or rbcL for identifying plants.
8. A method to generate a plurality of redundant amplicons for a target degenerated DNA
sequence, the method comprising:
(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy;
(b) using the amplicon products of (a) as a template, performing a second multiplex nested PCR comprising a plurality of adapter-tailed primers with optional MID
tags that hybridize to the amplicon products of (a), and (c) pooling the products from (b).
9. The method of claim 8, further comprising removing undesired genomic DNA, primer dimers and/or residual primers.
10. The method of claim 8 or 9, further comprising performing next generation sequencing to the pooled products from (c).
11. A method for amplifying and characterizing a barcode region from the cytochrome c oxidase 1 gene (COI) in a small specimen of degraded DNA using multiplex PCR, the method comprising:
- extracting the degraded DNA to provide a linear template;
- performing first multiplex nested PCR1 using a plurality of forward primers and downstream reverse primers that hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;
- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCR1 reaction as a template using adapted tailed primers with optional multiplex identifier tags (MID) that hybridize to portions of said amplicons to generate a more degenerate larger pool of amplicons, - pooling all amplicon products, said amplicon products spanning and overlapping the entire length of said barcode region; and - performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.
12. The method of claim 11 furthering comprising comparing said determined barcode sequence to a bank of characterized sequences to determine the species of said specimen.
13. A method for detection and identification of a barcode region of the COI gene in a small specimen containing degraded DNA to identify the phylogeny of said specimen, the method comprising;
- extracting linear degraded DNA from said specimen;
- performing two step multiplex nested PCR on said linear degraded DNA
using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;
- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and - classifying said specimen.
14. A kit for performing two step multiplex nested PCR on a small specimen comprising degraded DNA in order to determine the barcode region of the COI gene and thus classify the specimen, the kit comprising; primers specific for said barcode sequence, buffers, optional stabilizers, enzymes and instructions for use.
15. A method for amplifying degraded DNA, the method comprising:
amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.
16. The method of claim 15, wherein the forward and reverse primers in the PCR1 reaction comprise block elongation moieties to block elongation from the 5' end of the primers.
17. The method of claim 16, wherein the block elongation moieties comprise non-complementary tails.
18. The method of any one of claims 15 to 17, comprising from about 2 to about 10 forward primers and from about 2 to about 10 reverse primers.
19. The method of claim 18, comprising 6 forward primers (F1, F2, F3, F4, F5, and F6) and 6 reverse primers (R1, R2, R3, R4, R5, and R6).
20. The method of claim 21, wherein for PCR1, F1, F3, and F5 are paired with R1, R2, R3, R4, R5, and R6 and F2, F4, and F6 are paired with R1, R2, R3, R4, and R5.
21. The method of claim 20, wherein for PCR2, F1 is paired with R1, R2, and R3; F2 is paired with R2, R3, and R4; F3 is paired with R3, R4, and R5; F4 is paired with R5 and R6;
and F6 is paired with R6.
22. The method of any one of claims 15 to 21, wherein the primers for PCR2 comprise adapter tailed primers for sequencing.
23. The method of any one of claims 15 to 22, wherein the primers are degenerate.
24. A method for sequencing degraded DNA, the method comprising amplifying redundant amplicons such that each region of the target DNA sequence is covered by multiple amplicons, wherein the generation of specific amplicons is determined automatically by a combination of primer-template matching and the pattern of DNA degradation in the target sequence.
25. A method of amplifying a barcode region of a degraded DNA sample, the method comprising:
performing at least a PCR1a reaction and a PCR1b reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCR1a complement of amplicons and a PCR1b complement of amplicons, wherein the plurality of forward primers comprise primers F1, F2, ... , F n, in order from upstream to downstream of the target sequence, wherein n is a whole number;
wherein the plurality of reverse primers comprise primers R1, R2, ... , R m, in order from upstream to downstream of the target sequence, wherein m is a whole number;
wherein the plurality of reverse primers are downstream of F1 and the plurality of forward primers are upstream of R n;

wherein the PCR1 a reaction comprises each odd-numbered forward primer starting with F1 and further comprises all or substantially all of the reverse primers;
and wherein the PCR1b reaction comprises each even-numbered forward primer starting with F2 and further comprises all or substantially all of the reverse primers that are upstream of F2.
26. The method of claim 25, wherein the forward and reverse primers comprise block elongation moieties to block elongation from the 5' end of the primers and reduce non-target amplification.
27. The method of claim 26, wherein the block elongation moieties comprise non-complementary tails.
28. The method of any one of claims 25 to 27, further comprising performing a plurality of PCR2 reactions, PCR2 1, PCR2 2, PCR2n, to amplify the PCR1a and PCR1b complements of amplicons, wherein each PCR2 reaction uses a different forward primer and a different set of one or more downstream reverse primers; and wherein the PCR1a complement of amplicons are amplified using odd-numbered forward primers and wherein the PCR1b complement of amplicons are amplified using even-numbered forward primers.
29. The method of claim 28, further comprising pooling the resulting amplicons.
30. The method of claim 28 or 29, wherein the primers for PCR2 are adapter-tailed for sequence analysis.
31. The method of any one of claims 28 to 30, wherein the primers for PCR2 are MID-tagged to associate amplicons with specific specimens, such that multiple specimens can be sequenced simultaneously.
32. The method of any one of claims 25 to 31, wherein n is from 2-10, such as 6.
33. The method of any one of claims 25 to 32, wherein m is from 2-10, such as 6.
34. The method of any one of claims 25 to 33, wherein the forward and reverse primers are as defined in Table 4.
35. The method of any one of claims 1 to 34, wherein the template is not depleted through use of the method.
36. A method of amplifying degraded DNA according to the scheme shown in Figures 2a and 2b herein.
37. The method of any one of claims 1 to 36, for taxonomic classification of unknown specimens.
38. The method of any one of claims 1 to 37, wherein the primers are degenerate.
39. The method of any one of claims 1 to 38, for analyzing a plurality of specimens simultaneously.
40. The method of any one of claims 1 to 39, wherein the method is for amplification of a sample comprising small amounts of degraded DNA, such as at least about 0.1 ng of degraded DNA, such as at least about 0.5 ng, about 1 ng, about 10 ng, about 100 ng, about 500 ng, or from about 2µg to about 5µg of degraded DNA.
CA3032535A 2015-08-18 2016-08-18 Method to amplify dna sequences from degraded sources Abandoned CA3032535A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562206487P 2015-08-18 2015-08-18
US62/206,487 2015-08-18
PCT/CA2016/050970 WO2017027975A1 (en) 2015-08-18 2016-08-18 Method to amplify dna sequences from degraded sources

Publications (1)

Publication Number Publication Date
CA3032535A1 true CA3032535A1 (en) 2017-02-23

Family

ID=58050529

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3032535A Abandoned CA3032535A1 (en) 2015-08-18 2016-08-18 Method to amplify dna sequences from degraded sources

Country Status (2)

Country Link
CA (1) CA3032535A1 (en)
WO (1) WO2017027975A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018223188A1 (en) * 2017-06-06 2018-12-13 Murdoch Childrens Research Institute Assay
CN108192897A (en) * 2018-02-11 2018-06-22 云南省烟草农业科学研究院 One grows tobacco rbcl genetic fragments and its application
CN109979536B (en) * 2019-03-07 2022-12-23 青岛市疾病预防控制中心(青岛市预防医学研究院) Species identification method based on DNA bar code
CN110331210B (en) * 2019-04-29 2022-08-02 华南农业大学 Mini-Barcoding primer for acquiring DNA bar code of collected beetle specimen and application thereof
CN113140256A (en) * 2020-01-20 2021-07-20 深圳华大智造科技有限公司 Substance DNA tracing method
CN113186338B (en) * 2020-09-14 2022-07-26 中国科学院植物研究所 Universal primer for identifying angiosperm plant species and application thereof

Also Published As

Publication number Publication date
WO2017027975A1 (en) 2017-02-23

Similar Documents

Publication Publication Date Title
Burrell et al. The use of museum specimens with high-throughput DNA sequencers
Baloğlu et al. A workflow for accurate metabarcoding using nanopore MinION sequencing
Duhaime et al. Towards quantitative metagenomics of wild viruses and other ultra‐low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method
US9809840B2 (en) Reference markers for biological samples
JP7332733B2 (en) High molecular weight DNA sample tracking tags for next generation sequencing
Shokralla et al. Next‐generation DNA barcoding: using next‐generation sequencing to enhance and accelerate DNA barcode capture from single specimens
CA3032535A1 (en) Method to amplify dna sequences from degraded sources
Zhang et al. Lighting up single-nucleotide variation in situ in single cells and tissues
US10102337B2 (en) Digital measurements from targeted sequencing
US20160115544A1 (en) Molecular barcoding for multiplex sequencing
CN110878345A (en) Increasing confidence in allele calls by molecular counting
CN105121664A (en) Methods of sequencing nucleic acids in mixtures and compositions related thereto
KR20110106922A (en) Single Cell Nucleic Acid Analysis
CA3057163A1 (en) Methods of attaching adapters to sample nucleic acids
D’Ercole et al. A SMRT approach for targeted amplicon sequencing of museum specimens (Lepidoptera)—patterns of nucleotide misincorporation
Méndez-García et al. Metagenomic protocols and strategies
CN109072296B (en) Methods for direct target sequencing using nuclease protection
Hsieh et al. A rapid insect species identification system using mini‐barcode pyrosequencing
WO2018235938A1 (en) Method of sequencing and analyzing nucleic acid
WO2019117704A1 (en) Methods for detecting pathogenicity of ganoderma sp.
US10927405B2 (en) Molecular tag attachment and transfer
Puritz et al. Expressed Exome Capture Sequencing (EecSeq): a method for cost-effective exome sequencing for all organisms with or without genomic resources
Bargiela et al. Metagenomic protocols and strategies
Wilson Document Title: Assessing Deep Sequencing Technology for Human Forensic Mitochondrial DNA Analysis

Legal Events

Date Code Title Description
FZDE Discontinued

Effective date: 20221108

FZDE Discontinued

Effective date: 20221108