CA3032535A1 - Method to amplify dna sequences from degraded sources - Google Patents
Method to amplify dna sequences from degraded sources Download PDFInfo
- Publication number
- CA3032535A1 CA3032535A1 CA3032535A CA3032535A CA3032535A1 CA 3032535 A1 CA3032535 A1 CA 3032535A1 CA 3032535 A CA3032535 A CA 3032535A CA 3032535 A CA3032535 A CA 3032535A CA 3032535 A1 CA3032535 A1 CA 3032535A1
- Authority
- CA
- Canada
- Prior art keywords
- primers
- dna
- amplicons
- sequence
- primer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 165
- 108091028043 Nucleic acid sequence Proteins 0.000 title claims abstract description 22
- 108020004414 DNA Proteins 0.000 claims abstract description 170
- 238000007403 mPCR Methods 0.000 claims abstract description 22
- 108091093088 Amplicon Proteins 0.000 claims description 91
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 89
- 230000003321 amplification Effects 0.000 claims description 88
- 238000006243 chemical reaction Methods 0.000 claims description 84
- 238000007481 next generation sequencing Methods 0.000 claims description 81
- 230000002441 reversible effect Effects 0.000 claims description 67
- 230000000295 complement effect Effects 0.000 claims description 57
- 238000012163 sequencing technique Methods 0.000 claims description 39
- 238000007857 nested PCR Methods 0.000 claims description 23
- 239000012634 fragment Substances 0.000 claims description 14
- 101150087323 COI gene Proteins 0.000 claims description 9
- 230000015556 catabolic process Effects 0.000 claims description 9
- 238000006731 degradation reaction Methods 0.000 claims description 9
- 239000000539 dimer Substances 0.000 claims description 8
- 238000011144 upstream manufacturing Methods 0.000 claims description 8
- 102000004190 Enzymes Human genes 0.000 claims description 6
- 108090000790 Enzymes Proteins 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000000903 blocking effect Effects 0.000 claims description 5
- 239000000872 buffer Substances 0.000 claims description 5
- 101150084750 1 gene Proteins 0.000 claims description 4
- 102000000634 Cytochrome c oxidase subunit IV Human genes 0.000 claims description 4
- 108090000365 Cytochrome-c oxidases Proteins 0.000 claims description 4
- 238000010348 incorporation Methods 0.000 claims description 4
- 238000012300 Sequence Analysis Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 101100301006 Allochromatium vinosum (strain ATCC 17899 / DSM 180 / NBRC 103801 / NCIMB 10441 / D) cbbL2 gene Proteins 0.000 claims description 2
- 101150004101 cbbL gene Proteins 0.000 claims description 2
- 101150088250 matK gene Proteins 0.000 claims description 2
- 230000002438 mitochondrial effect Effects 0.000 claims description 2
- 101150074945 rbcL gene Proteins 0.000 claims description 2
- 239000003381 stabilizer Substances 0.000 claims description 2
- 101100519158 Arabidopsis thaliana PCR2 gene Proteins 0.000 claims 11
- 101150102573 PCR1 gene Proteins 0.000 claims 9
- 238000011084 recovery Methods 0.000 abstract description 19
- 238000003752 polymerase chain reaction Methods 0.000 description 64
- 150000007523 nucleic acids Chemical class 0.000 description 58
- 108020004707 nucleic acids Proteins 0.000 description 48
- 102000039446 nucleic acids Human genes 0.000 description 48
- 125000003729 nucleotide group Chemical group 0.000 description 39
- 239000002773 nucleotide Substances 0.000 description 36
- 239000000047 product Substances 0.000 description 34
- 239000000523 sample Substances 0.000 description 31
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 24
- 238000004458 analytical method Methods 0.000 description 21
- 241000894007 species Species 0.000 description 21
- 230000015572 biosynthetic process Effects 0.000 description 17
- 241000255777 Lepidoptera Species 0.000 description 15
- 238000003786 synthesis reaction Methods 0.000 description 13
- 108090000623 proteins and genes Proteins 0.000 description 12
- 238000012360 testing method Methods 0.000 description 12
- 238000002474 experimental method Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000013459 approach Methods 0.000 description 10
- 238000006116 polymerization reaction Methods 0.000 description 10
- 241001465754 Metazoa Species 0.000 description 9
- 239000000203 mixture Substances 0.000 description 9
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 8
- 241001634830 Geometridae Species 0.000 description 8
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 8
- 239000011324 bead Substances 0.000 description 8
- 239000000284 extract Substances 0.000 description 8
- 239000012530 fluid Substances 0.000 description 8
- 238000009396 hybridization Methods 0.000 description 8
- 150000002500 ions Chemical class 0.000 description 7
- 238000007480 sanger sequencing Methods 0.000 description 7
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 6
- 101150006914 TRP1 gene Proteins 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 6
- 230000001965 increasing effect Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 6
- 108091033319 polynucleotide Proteins 0.000 description 6
- 102000040430 polynucleotide Human genes 0.000 description 6
- 239000002157 polynucleotide Substances 0.000 description 6
- 241000239223 Arachnida Species 0.000 description 5
- 241000255925 Diptera Species 0.000 description 5
- 241000196324 Embryophyta Species 0.000 description 5
- 241000124008 Mammalia Species 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 239000011541 reaction mixture Substances 0.000 description 5
- 241000271566 Aves Species 0.000 description 4
- 230000005778 DNA damage Effects 0.000 description 4
- 231100000277 DNA damage Toxicity 0.000 description 4
- 108091034117 Oligonucleotide Proteins 0.000 description 4
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 235000013305 food Nutrition 0.000 description 4
- 238000005304 joining Methods 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 238000005648 named reaction Methods 0.000 description 4
- 238000002360 preparation method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 210000001519 tissue Anatomy 0.000 description 4
- HDTRYLNUVZCQOY-UHFFFAOYSA-N α-D-glucopyranosyl-α-D-glucopyranoside Natural products OC1C(O)C(O)C(CO)OC1OC1C(O)C(O)C(O)C(CO)O1 HDTRYLNUVZCQOY-UHFFFAOYSA-N 0.000 description 3
- 241000251468 Actinopterygii Species 0.000 description 3
- 241000254173 Coleoptera Species 0.000 description 3
- 238000007400 DNA extraction Methods 0.000 description 3
- 241000238631 Hexapoda Species 0.000 description 3
- 108020005196 Mitochondrial DNA Proteins 0.000 description 3
- HDTRYLNUVZCQOY-WSWWMNSNSA-N Trehalose Natural products O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@@H]1O[C@@H]1[C@H](O)[C@@H](O)[C@@H](O)[C@@H](CO)O1 HDTRYLNUVZCQOY-WSWWMNSNSA-N 0.000 description 3
- LVTKHGUGBGNBPL-UHFFFAOYSA-N Trp-P-1 Chemical compound N1C2=CC=CC=C2C2=C1C(C)=C(N)N=C2C LVTKHGUGBGNBPL-UHFFFAOYSA-N 0.000 description 3
- HDTRYLNUVZCQOY-LIZSDCNHSA-N alpha,alpha-trehalose Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@@H]1O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 HDTRYLNUVZCQOY-LIZSDCNHSA-N 0.000 description 3
- 150000001413 amino acids Chemical class 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 238000011109 contamination Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000002224 dissection Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 229910001629 magnesium chloride Inorganic materials 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 229920000642 polymer Polymers 0.000 description 3
- 108090000765 processed proteins & peptides Proteins 0.000 description 3
- 241000757777 Aeolochroma saturataria Species 0.000 description 2
- 241000238424 Crustacea Species 0.000 description 2
- 241000233866 Fungi Species 0.000 description 2
- 241000258937 Hemiptera Species 0.000 description 2
- 108091023242 Internal transcribed spacer Proteins 0.000 description 2
- 241000237852 Mollusca Species 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- 241001578487 Pingasa nobilis Species 0.000 description 2
- 241000255588 Tephritidae Species 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000006555 catalytic reaction Methods 0.000 description 2
- 239000007795 chemical reaction product Substances 0.000 description 2
- 230000021615 conjugation Effects 0.000 description 2
- 239000000356 contaminant Substances 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- -1 deoxyribonucleotide triphosphates Chemical class 0.000 description 2
- 238000010790 dilution Methods 0.000 description 2
- 239000012895 dilution Substances 0.000 description 2
- 239000013505 freshwater Substances 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000013383 initial experiment Methods 0.000 description 2
- 238000013101 initial test Methods 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000001668 nucleic acid synthesis Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000012188 paraffin wax Substances 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000008188 pellet Substances 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 239000000377 silicon dioxide Substances 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 239000008223 sterile water Substances 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 1
- 108091027075 5S-rRNA precursor Proteins 0.000 description 1
- 241000757782 Aeolochroma Species 0.000 description 1
- 241000157282 Aesculus Species 0.000 description 1
- 241000550583 Alcis irrufata Species 0.000 description 1
- 241000258993 Anthrenus museorum Species 0.000 description 1
- 241001425390 Aphis fabae Species 0.000 description 1
- 241000995250 Aphrodes Species 0.000 description 1
- 241000272878 Apodiformes Species 0.000 description 1
- 108091023037 Aptamer Proteins 0.000 description 1
- 241000239290 Araneae Species 0.000 description 1
- 241000238421 Arthropoda Species 0.000 description 1
- 241000551044 Atmoceras Species 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 241000551081 Axinoptera Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 241000551070 Bursadopsis Species 0.000 description 1
- 241000606371 Caesia Species 0.000 description 1
- 241000551074 Calluga Species 0.000 description 1
- 241001212166 Cassephyra Species 0.000 description 1
- 241000284280 Celerena Species 0.000 description 1
- 241001408630 Chloroclystis Species 0.000 description 1
- 241001414720 Cicadellidae Species 0.000 description 1
- 241001112628 Cleora repetita Species 0.000 description 1
- 241001269314 Collix Species 0.000 description 1
- 241000938605 Crocodylia Species 0.000 description 1
- 241001269342 Ctimene Species 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 102100029764 DNA-directed DNA/RNA polymerase mu Human genes 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 241001517923 Douglasiidae Species 0.000 description 1
- 241000255345 Drosophila simulans Species 0.000 description 1
- 241000551087 Dyscheralcis Species 0.000 description 1
- 241001408197 Eois Species 0.000 description 1
- 241001438040 Eois semirubra Species 0.000 description 1
- 241000284599 Gymnoscelis Species 0.000 description 1
- 101000919849 Homo sapiens Cytochrome c oxidase subunit 1 Proteins 0.000 description 1
- 241001214628 Hypobapta tachyhalotaria Species 0.000 description 1
- 241000270322 Lepidosauria Species 0.000 description 1
- 241000130680 Micropterigidae Species 0.000 description 1
- 241000130677 Micropterix Species 0.000 description 1
- 241000863199 Milionia Species 0.000 description 1
- 241000863202 Myrioblephara Species 0.000 description 1
- 241000713399 Oenochroma vinaria Species 0.000 description 1
- 241001491679 Oenochrominae Species 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 241000551101 Paralcidia Species 0.000 description 1
- 108010010677 Phosphodiesterase I Proteins 0.000 description 1
- 241000425347 Phyla <beetle> Species 0.000 description 1
- 241000863968 Pingasa Species 0.000 description 1
- 241001257056 Polyacme Species 0.000 description 1
- 241000551105 Propithex Species 0.000 description 1
- 108091008109 Pseudogenes Proteins 0.000 description 1
- 102000057361 Pseudogenes Human genes 0.000 description 1
- 241001419352 Psilalcis Species 0.000 description 1
- 241000863824 Sarcinodes Species 0.000 description 1
- 241000305537 Sarcinodes holzi Species 0.000 description 1
- 241000551049 Sarcinodes subvirgata Species 0.000 description 1
- 241000255975 Saturniidae Species 0.000 description 1
- 241000256103 Simuliidae Species 0.000 description 1
- 241001161927 Sinobirma Species 0.000 description 1
- 241000551148 Spectrobasis Species 0.000 description 1
- 241000551150 Sterrhochaeta Species 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 241000551140 Tripteridia Species 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 238000003149 assay kit Methods 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 239000003054 catalyst Substances 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 210000003763 chloroplast Anatomy 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 238000012252 genetic analysis Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 210000004392 genitalia Anatomy 0.000 description 1
- 244000038280 herbivores Species 0.000 description 1
- 235000010181 horse chestnut Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 231100000219 mutagenic Toxicity 0.000 description 1
- 230000003505 mutagenic effect Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 229940124276 oligodeoxyribonucleotide Drugs 0.000 description 1
- 239000000546 pharmaceutical excipient Substances 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 239000012264 purified product Substances 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000010257 thawing Methods 0.000 description 1
- 239000001226 triphosphate Substances 0.000 description 1
- 235000011178 triphosphate Nutrition 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 229910021642 ultra pure water Inorganic materials 0.000 description 1
- 239000012498 ultrapure water Substances 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 229910001868 water Inorganic materials 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6848—Nucleic acid amplification reactions characterised by the means for preventing contamination or increasing the specificity or sensitivity of an amplification reaction
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
A two stage nested multiplex PCR method is described to amplify DNA sequences from degraded specimens. The method of the invention is for recovery of full length DNA from specimens, whereby the specimen contains degraded DNA. Such specimens may be type specimens and the target DNA may be a DNA barcode for recognizing known species and for discovery of species yet to be named.
Description
Method to Amplify DNA Sequences from Degraded Sources Field of the Invention The invention relates to a method to amplify DNA sequences from degraded sources using a combination approach involving NGS (next generation sequencing). More specifically, the method of the invention is a two-stage multiplex PCR (polymerase chain reaction) and NGS
approach for recovery of full length DNA from specimens, whereby the specimen contains degraded DNA. Such specimens may be type specimens and the target DNA may be a DNA
barcode for recognizing known species and for discovery of species yet to be named. The invention further relates to kits and systems for carrying out such method.
Background of the Invention Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen.
Many individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavorable for DNA preservation, success in sequence recovery has been uncertain.
The immense repositories of identified specimens in the world's natural history museums provide the opportunity to construct a DNA barcode reference library that can subsequently be used to identify newly collected specimens [1,2]. However, the scientific value of this library would be greatly enhanced if each species was represented by sequences from its type material, particularly the holotype. Without such information, there are many cases in which the correct application of taxon names is uncertain. For example, the analysis of type(s) is critical when the study of modern specimens suggests synonymy (e.g. [31) or when it indicates that a long-known species is actually a complex of two or more morphologically similar taxa (e.g. [41). The recovery of a barcode sequence from type material is also essential when it represents the only known record(s) for a taxon ¨ a situation that is surprisingly common [5].
Early studies have recovered sequence information from museum specimens, including beetles [6,7], flies [8,9,10], true bugs [11], and moths [12,13]. Some of these investigations analyzed specimens that were relatively young (<50 years), while others extracted DNA from whole specimens. However, Hausmann et al [12] and Rougerie et al [14]
recovered barcode
approach for recovery of full length DNA from specimens, whereby the specimen contains degraded DNA. Such specimens may be type specimens and the target DNA may be a DNA
barcode for recognizing known species and for discovery of species yet to be named. The invention further relates to kits and systems for carrying out such method.
Background of the Invention Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen.
Many individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavorable for DNA preservation, success in sequence recovery has been uncertain.
The immense repositories of identified specimens in the world's natural history museums provide the opportunity to construct a DNA barcode reference library that can subsequently be used to identify newly collected specimens [1,2]. However, the scientific value of this library would be greatly enhanced if each species was represented by sequences from its type material, particularly the holotype. Without such information, there are many cases in which the correct application of taxon names is uncertain. For example, the analysis of type(s) is critical when the study of modern specimens suggests synonymy (e.g. [31) or when it indicates that a long-known species is actually a complex of two or more morphologically similar taxa (e.g. [41). The recovery of a barcode sequence from type material is also essential when it represents the only known record(s) for a taxon ¨ a situation that is surprisingly common [5].
Early studies have recovered sequence information from museum specimens, including beetles [6,7], flies [8,9,10], true bugs [11], and moths [12,13]. Some of these investigations analyzed specimens that were relatively young (<50 years), while others extracted DNA from whole specimens. However, Hausmann et al [12] and Rougerie et al [14]
recovered barcode
- 2 -sequences from a single leg of type specimens more than 100 years old with a protocol that required six PCRs and twelve sequencing reactions (see details in [151).
Strutzenberger et al [16] reduced costs by processing specimens in batches of 95, but the basic protocol was unchanged, requiring substantial template DNA and careful inspection of data to ensure that contamination among wells had not produced chimeric sequences. As well, the failure of any single reaction led to an incomplete sequence for the barcode region.
Prior studies have often encountered difficulty in recovering sequence information from old museum specimens because of DNA degradation [17,18]. While protocols have improved, there are still important constraints [12,13,15,16]. Past studies have generally employed several PCR reactions to generate a set of short amplicons that were Sanger sequenced and assembled into a barcode record. When many amplification reactions are required, as in cases where difficulties in primer binding are encountered, template can be depleted before sequence is recovered. There is no easy solution because DNA extracts are small (<50 [tL) and concentrations are low (typically <0.5 pg/[tL) so dilution is rarely feasible [4,14]. As a consequence, sequence recovery from many type specimens is not currently possible.
Next-generation sequencers (NGS) are increasingly used for studies on both freshly collected and museum specimens [e.g. 191. Work on fresh specimens has shown that the barcode region can be recovered from hundreds of individuals at a time by using multiplex identifier (MID) tags to associate the sequence records from each specimen [20,21]. However, there are still issues with preferential amplification of certain fragments and inefficient amplification that leads to the inability to sequence the full target sequence. Taken together with the challenges of sequencing very small specimen size that contains heavily degraded DNA, it is desirable to provide a method that overcomes at least one disadvantage of known sequencing protocols.
Summary of the Invention The present invention provides methods, systems and kits that are useful for recovering sequences from degraded DNA present in a sample. When maintained within optimal archival conditions, DNA is highly stable and predicted to be viable for several millennia. Within the ambient environment however, or when exposed to particular stressors such as extreme heat, desiccation, irradiation, or known mutagenic compounds, genomic DNA breaks down rapidly and severely. For various applications and settings, this can prohibit genetic analyses when the quantity and quality of remaining DNA falls below the sensitivity thresholds of current
Strutzenberger et al [16] reduced costs by processing specimens in batches of 95, but the basic protocol was unchanged, requiring substantial template DNA and careful inspection of data to ensure that contamination among wells had not produced chimeric sequences. As well, the failure of any single reaction led to an incomplete sequence for the barcode region.
Prior studies have often encountered difficulty in recovering sequence information from old museum specimens because of DNA degradation [17,18]. While protocols have improved, there are still important constraints [12,13,15,16]. Past studies have generally employed several PCR reactions to generate a set of short amplicons that were Sanger sequenced and assembled into a barcode record. When many amplification reactions are required, as in cases where difficulties in primer binding are encountered, template can be depleted before sequence is recovered. There is no easy solution because DNA extracts are small (<50 [tL) and concentrations are low (typically <0.5 pg/[tL) so dilution is rarely feasible [4,14]. As a consequence, sequence recovery from many type specimens is not currently possible.
Next-generation sequencers (NGS) are increasingly used for studies on both freshly collected and museum specimens [e.g. 191. Work on fresh specimens has shown that the barcode region can be recovered from hundreds of individuals at a time by using multiplex identifier (MID) tags to associate the sequence records from each specimen [20,21]. However, there are still issues with preferential amplification of certain fragments and inefficient amplification that leads to the inability to sequence the full target sequence. Taken together with the challenges of sequencing very small specimen size that contains heavily degraded DNA, it is desirable to provide a method that overcomes at least one disadvantage of known sequencing protocols.
Summary of the Invention The present invention provides methods, systems and kits that are useful for recovering sequences from degraded DNA present in a sample. When maintained within optimal archival conditions, DNA is highly stable and predicted to be viable for several millennia. Within the ambient environment however, or when exposed to particular stressors such as extreme heat, desiccation, irradiation, or known mutagenic compounds, genomic DNA breaks down rapidly and severely. For various applications and settings, this can prohibit genetic analyses when the quantity and quality of remaining DNA falls below the sensitivity thresholds of current
- 3 -analytical equipment and procedures. Specimens held in biomedical or natural history collections degrade rapidly overtime, particularly when stored in compounds such as formalin, paraffin, or low concentration ethanol; forensic cases and environmental samples involving trace quantities (i.e. 'eDNA') can be inhibited by ultraviolet exposure or diluted beyond detection; and processed and manufactured animal and timber products may endure severe temperatures and desiccation, rendering the DNA (and source organism) imperceptible.
The present invention has been made to solve at least one foregoing problem of the prior art and therefore an aspect of the present invention is to provide a method for amplifying and characterizing DNA sequences from small sample amounts containing degraded DNA in an efficient, rapid and economical manner.
In aspects the method comprises a two stage nested multiplex PCR/NGS approach that is effective for amplification of a desired DNA sequence from a small sample of degraded DNA resulting in efficient, unbiased co-amplification of fragments spanning a desired gene region, in aspects a barcode region of a gene.
The present method has the advantage of requiring very little template DNA and providing protection against the failure of any particular amplification reaction due to the novel initial two step multiplex PCR developed approach. The method uses relatively few primers, however the primers are allowed to pair in any combination as opposed to being restricted to specific pairs ¨ all while avoiding common pitfalls (e.g. overlap amplification, primer incorporation and primer dimer sequencing). This is accomplished without the use of special enzymes beyond standard polymerases.
In one aspect, the invention relates to target-specific primers and compositions comprising such primers useful for the selective amplification of one or more target sequences associated with a barcode region in degraded DNA.
Primers are selected with respect to the target sequence to be amplified and the condition of the degraded DNA, that is, primers of about 150bp or more can be utilized to target degraded DNA. However it is understood by one of skill in the art that primers can be designed shorter in order to be able to target shorter segments of degraded DNA where there is limited sequence. The method of the invention can be used to detect and amplify highly degraded DNA
in specimens where even no DNA could be detected by other methods.
The present method may recover full-length barcodes from type specimens with heavily degraded DNA by employing a two-step multiplex PCR to generate short amplicons covering the barcode region and then using NGS for their characterization, i.e.
sequencing. In this
The present invention has been made to solve at least one foregoing problem of the prior art and therefore an aspect of the present invention is to provide a method for amplifying and characterizing DNA sequences from small sample amounts containing degraded DNA in an efficient, rapid and economical manner.
In aspects the method comprises a two stage nested multiplex PCR/NGS approach that is effective for amplification of a desired DNA sequence from a small sample of degraded DNA resulting in efficient, unbiased co-amplification of fragments spanning a desired gene region, in aspects a barcode region of a gene.
The present method has the advantage of requiring very little template DNA and providing protection against the failure of any particular amplification reaction due to the novel initial two step multiplex PCR developed approach. The method uses relatively few primers, however the primers are allowed to pair in any combination as opposed to being restricted to specific pairs ¨ all while avoiding common pitfalls (e.g. overlap amplification, primer incorporation and primer dimer sequencing). This is accomplished without the use of special enzymes beyond standard polymerases.
In one aspect, the invention relates to target-specific primers and compositions comprising such primers useful for the selective amplification of one or more target sequences associated with a barcode region in degraded DNA.
Primers are selected with respect to the target sequence to be amplified and the condition of the degraded DNA, that is, primers of about 150bp or more can be utilized to target degraded DNA. However it is understood by one of skill in the art that primers can be designed shorter in order to be able to target shorter segments of degraded DNA where there is limited sequence. The method of the invention can be used to detect and amplify highly degraded DNA
in specimens where even no DNA could be detected by other methods.
The present method may recover full-length barcodes from type specimens with heavily degraded DNA by employing a two-step multiplex PCR to generate short amplicons covering the barcode region and then using NGS for their characterization, i.e.
sequencing. In this
- 4 -manner the entire barcode region of a desired gene from a small specimen containing degraded DNA can be characterized.
The method of the invention is scalable and widely applicable, that is, has a taxonomic breadth. The method encompasses amplification of DNA over a wide variety of diverse animal groups. It has been scaled to work on 96 samples simultaneously with good success rates, and may be scaled further to several hundred sample simultaneously.
The method of the invention can be used with various conditions of DNA
degradation (e.g. samples decades to centuries old, formalin-fixed, fluid-preserved, or processed) and still lead to successful DNA amplification and in aspects, barcode recovery.
As the method is quick and cost-effective to sequence degraded DNA from very limited sources, this method has good potential in a variety of areas for example to researchers, food safety officials, forensic investigators, wildlife enforcement officers, biomedical technicians and so forth.
The effectiveness of the present method has been validated by recovering sequences from century-old specimens of Lepidoptera, including those where Sanger analysis completely failed. Importantly, in aspects, this two stage multiplex PCR/NGS
method escapes problems that often confront Sanger analysis, such as uncertain primer binding, amplification bias, and/or the need for large amounts of template DNA.
According to an aspect of the invention there is provided a method comprising a two-step multiplex PCR followed by NGS to sequence degraded DNA.
According to an aspect of the invention the method comprises two stages, one stage involving a two-step multiplex PCR and the other stage comprising NGS to recover/characterize the sequence of a barcode region in the sample comprising degraded DNA. In aspects the barcode region is of the cytochrome c oxidase I gene.
According to another aspect of the invention there is provided a method comprising multiplex nested PCR to form a plurality of amplicons from a degraded DNA
source. NGS is then utilized to recover the sequence from the plurality of degenerate amplicons generated by the two stage multiplex nested PCR, in aspects to characterize a barcode region of a gene.
According to another aspect of the invention there is provided a method comprising multiplex nested PCR, the method comprising performing a two-stage nested PCR
on a sample containing degraded DNA. The present invention is based, in part, on the novel use of two stages of specific hybridization between a homologous region in a probe and the complementary sequence in a nucleic acid template of the degraded DNA, each of which is
The method of the invention is scalable and widely applicable, that is, has a taxonomic breadth. The method encompasses amplification of DNA over a wide variety of diverse animal groups. It has been scaled to work on 96 samples simultaneously with good success rates, and may be scaled further to several hundred sample simultaneously.
The method of the invention can be used with various conditions of DNA
degradation (e.g. samples decades to centuries old, formalin-fixed, fluid-preserved, or processed) and still lead to successful DNA amplification and in aspects, barcode recovery.
As the method is quick and cost-effective to sequence degraded DNA from very limited sources, this method has good potential in a variety of areas for example to researchers, food safety officials, forensic investigators, wildlife enforcement officers, biomedical technicians and so forth.
The effectiveness of the present method has been validated by recovering sequences from century-old specimens of Lepidoptera, including those where Sanger analysis completely failed. Importantly, in aspects, this two stage multiplex PCR/NGS
method escapes problems that often confront Sanger analysis, such as uncertain primer binding, amplification bias, and/or the need for large amounts of template DNA.
According to an aspect of the invention there is provided a method comprising a two-step multiplex PCR followed by NGS to sequence degraded DNA.
According to an aspect of the invention the method comprises two stages, one stage involving a two-step multiplex PCR and the other stage comprising NGS to recover/characterize the sequence of a barcode region in the sample comprising degraded DNA. In aspects the barcode region is of the cytochrome c oxidase I gene.
According to another aspect of the invention there is provided a method comprising multiplex nested PCR to form a plurality of amplicons from a degraded DNA
source. NGS is then utilized to recover the sequence from the plurality of degenerate amplicons generated by the two stage multiplex nested PCR, in aspects to characterize a barcode region of a gene.
According to another aspect of the invention there is provided a method comprising multiplex nested PCR, the method comprising performing a two-stage nested PCR
on a sample containing degraded DNA. The present invention is based, in part, on the novel use of two stages of specific hybridization between a homologous region in a probe and the complementary sequence in a nucleic acid template of the degraded DNA, each of which is
5 followed by extension of the probe by DNA synthesis. The second stage utilizes the products of the first stage as a template.
In aspects, the method of the invention substantially reduces the formation of spurious reaction products in multiplex amplification reactions of large numbers of specific degraded nucleic acid sequences.
In aspects the present invention provides novel compositions useful in substantially reducing the formation of spurious reaction products in two part multiplex amplification reactions of large numbers of specific nucleic acid sequences from degraded DNA.
According to another aspect of the invention there is provided a multiplex PCR
assay mixture for amplification of a target degraded DNA, the mixture comprising a combination of a plurality of primer sets wherein a number of the primer sets are nested. In aspects, a number of the primer sets are 10bp and adapter-tailed primers. In further aspects, the primers (forward and reverse) include degeneracy at sites important for primer binding, i.e. 3' terminus for forward primer and 5' terminus for reverse primer, such that 12 forward and reverse primers provide a composition comprising 2010 primers.
According to an aspect of the invention there is provided a two stage method for obtaining a full length barcode sequence from specimens with degraded DNA, the method comprising two step multiplex nested PCR utilizing primers that span the entire barcode sequence that can pair in any combination to generate a plurality of amplicons while avoiding overlap amplification, primer incorporation and/or primer dimer sequencing;
and NGS for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.
In aspects the two step multiplex nested PCR utilizes primers that target non-adjacent fragments of the target sequence in each of the steps. Furthermore, the primers in the first step are designed such that undesired elongation is blocked (in one aspect are non-tailed) and are selected further to be paired with the next downstream reverse primers. The primers in the second step are adapter-tailed primers and may further incorporate a MID tag.
According to an aspect of the invention there is provided a method to generate redundant amplicons for a target DNA sequence of degenerated DNA, the method comprising:
(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy;
In aspects, the method of the invention substantially reduces the formation of spurious reaction products in multiplex amplification reactions of large numbers of specific degraded nucleic acid sequences.
In aspects the present invention provides novel compositions useful in substantially reducing the formation of spurious reaction products in two part multiplex amplification reactions of large numbers of specific nucleic acid sequences from degraded DNA.
According to another aspect of the invention there is provided a multiplex PCR
assay mixture for amplification of a target degraded DNA, the mixture comprising a combination of a plurality of primer sets wherein a number of the primer sets are nested. In aspects, a number of the primer sets are 10bp and adapter-tailed primers. In further aspects, the primers (forward and reverse) include degeneracy at sites important for primer binding, i.e. 3' terminus for forward primer and 5' terminus for reverse primer, such that 12 forward and reverse primers provide a composition comprising 2010 primers.
According to an aspect of the invention there is provided a two stage method for obtaining a full length barcode sequence from specimens with degraded DNA, the method comprising two step multiplex nested PCR utilizing primers that span the entire barcode sequence that can pair in any combination to generate a plurality of amplicons while avoiding overlap amplification, primer incorporation and/or primer dimer sequencing;
and NGS for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.
In aspects the two step multiplex nested PCR utilizes primers that target non-adjacent fragments of the target sequence in each of the steps. Furthermore, the primers in the first step are designed such that undesired elongation is blocked (in one aspect are non-tailed) and are selected further to be paired with the next downstream reverse primers. The primers in the second step are adapter-tailed primers and may further incorporate a MID tag.
According to an aspect of the invention there is provided a method to generate redundant amplicons for a target DNA sequence of degenerated DNA, the method comprising:
(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy;
- 6 -(b) using the amplicon products of (a) as a template, performing a second multiplex nested PCR comprising a plurality of adapter-tailed primers with optional MID
tags that hybridize to the amplicon products of (a), (c) repeating step (a) and then (b); and (d) pooling the products from (c).
In aspects the method then further comprises performing next generation sequencing to the pooled products from (d). The pooled products from (d) are optionally cleaned to remove any undesired genomic DNA, primer dimers and/or residual primers.
In aspects, undesired elongation in the first step of multiplex PCR can be achieved through various mechanisms such as use of non-complementary tails on the PCR1 primers or with the use of any type of agent that blocks elongation from the 5' end of the primers, i.e.
chemical conjugation.
According to another aspect of the present invention is a method for amplifying a barcode region from the cytochrome c oxidase 1 gene (COI) from a small specimen of degraded DNA using multiplex PCR, the method comprising:
- extracting the degraded DNA to provide a linear template;
- performing first multiplex nested PCR1 using a plurality of forward primers and downstream reverse primers that include degeneracy and hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;
- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCR1 reaction as a template using adapted tailed primers that hybridize to portions of said amplicons, - pooling all amplicon products; and - performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.
The multiplex PCR described herein is desirably performed under suitable conditions for hybridization.
According to an aspect of the invention is a method for detection and identification of a barcode region of the COI gene in a small specimen containing degraded DNA
to identify the taxonomic classification of said specimen, the method comprising;
- extracting linear degraded DNA from said specimen;
tags that hybridize to the amplicon products of (a), (c) repeating step (a) and then (b); and (d) pooling the products from (c).
In aspects the method then further comprises performing next generation sequencing to the pooled products from (d). The pooled products from (d) are optionally cleaned to remove any undesired genomic DNA, primer dimers and/or residual primers.
In aspects, undesired elongation in the first step of multiplex PCR can be achieved through various mechanisms such as use of non-complementary tails on the PCR1 primers or with the use of any type of agent that blocks elongation from the 5' end of the primers, i.e.
chemical conjugation.
According to another aspect of the present invention is a method for amplifying a barcode region from the cytochrome c oxidase 1 gene (COI) from a small specimen of degraded DNA using multiplex PCR, the method comprising:
- extracting the degraded DNA to provide a linear template;
- performing first multiplex nested PCR1 using a plurality of forward primers and downstream reverse primers that include degeneracy and hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;
- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCR1 reaction as a template using adapted tailed primers that hybridize to portions of said amplicons, - pooling all amplicon products; and - performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.
The multiplex PCR described herein is desirably performed under suitable conditions for hybridization.
According to an aspect of the invention is a method for detection and identification of a barcode region of the COI gene in a small specimen containing degraded DNA
to identify the taxonomic classification of said specimen, the method comprising;
- extracting linear degraded DNA from said specimen;
- 7 -- performing two step multiplex nested PCR on said linear degraded DNA
using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;
- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and - classifying said specimen.
According to another aspect of the invention is a kit for performing multiplex nested PCR on a small specimen comprising degraded DNA in order to determine the barcode region of the COI gene and thus classify the specimen, the kit comprising; primers specific for said barcode region of said COI gene, suitable buffers, reaction nucleotides, enzymes, optional stabilizers and instructions for use. In aspects, kits can be designed for any specimen type depending on the target gene of interest for amplification and sequencing.
According to another aspect, there is provided a method for amplifying degraded DNA, the method comprising:
amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.
In an aspect, the forward and reverse primers in the PCR1 reaction comprise block elongation moieties to block elongation from the 5' end of the primers.
In an aspect, the block elongation moieties comprise non-complementary tails.
In an aspect, the method comprises from about 2 to about 10 forward primers and from about 2 to about 10 reverse primers.
In an aspect, the method comprises 6 forward primers (F1, F2, F3, F4, F5, and F6) and 6 reverse primers (R1, R2, R3, R4, R5, and R6).
In an aspect, for PCR1, Fl, F3, and F5 are paired with R1, R2, R3, R4, R5, and R6 and F2, F4, and F6 are paired with R1, R2, R3, R4, and R5.
In an aspect, for PCR2, Fl is paired with R1, R2, and R3; F2 is paired with R2, R3, and R4; F3 is paired with R3, R4, and R5; F4 is paired with R5 and R6; and F6 is paired with R6.
In an aspect, the primers for PCR2 comprise adapter tailed primers for sequencing.
using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;
- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and - classifying said specimen.
According to another aspect of the invention is a kit for performing multiplex nested PCR on a small specimen comprising degraded DNA in order to determine the barcode region of the COI gene and thus classify the specimen, the kit comprising; primers specific for said barcode region of said COI gene, suitable buffers, reaction nucleotides, enzymes, optional stabilizers and instructions for use. In aspects, kits can be designed for any specimen type depending on the target gene of interest for amplification and sequencing.
According to another aspect, there is provided a method for amplifying degraded DNA, the method comprising:
amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.
In an aspect, the forward and reverse primers in the PCR1 reaction comprise block elongation moieties to block elongation from the 5' end of the primers.
In an aspect, the block elongation moieties comprise non-complementary tails.
In an aspect, the method comprises from about 2 to about 10 forward primers and from about 2 to about 10 reverse primers.
In an aspect, the method comprises 6 forward primers (F1, F2, F3, F4, F5, and F6) and 6 reverse primers (R1, R2, R3, R4, R5, and R6).
In an aspect, for PCR1, Fl, F3, and F5 are paired with R1, R2, R3, R4, R5, and R6 and F2, F4, and F6 are paired with R1, R2, R3, R4, and R5.
In an aspect, for PCR2, Fl is paired with R1, R2, and R3; F2 is paired with R2, R3, and R4; F3 is paired with R3, R4, and R5; F4 is paired with R5 and R6; and F6 is paired with R6.
In an aspect, the primers for PCR2 comprise adapter tailed primers for sequencing.
- 8 -In an aspect, the primers are degenerate.
According to an aspect, there is provided a method for sequencing degraded DNA, the method comprising amplifying redundant amplicons such that each region of the target DNA
sequence is covered by multiple amplicons, wherein the generation of specific amplicons is determined automatically by a combination of primer-template matching and the pattern of DNA degradation in the target sequence.
In accordance with an aspect, there is provided a method of amplifying a barcode region of a degraded DNA sample, the method comprising:
performing at least a PCR1 a reaction and a PCR1b reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCRla complement of amplicons and a PCR1b complement of amplicons, wherein the plurality of forward primers comprise primers Fi, F2, ... , F., in order from upstream to downstream of the target sequence, wherein n is a whole number;
wherein the plurality of reverse primers comprise primers Ri, R2, , Rm, in order from upstream to downstream of the target sequence, wherein m is a whole number;
wherein the plurality of reverse primers are downstream of Fi and the plurality of forward primers are upstream of Rn;
wherein the PCRla reaction comprises each odd-numbered forward primer starting with Fi and further comprises all or substantially all of the reverse primers;
and wherein the PCR1b reaction comprises each even-numbered forward primer starting with F2 and further comprises all or substantially all of the reverse primers that are upstream of F2.
In an aspect, the forward and reverse primers comprise block elongation moieties to block elongation from the 5' end of the primers and reduce non-target amplification.
In an aspect, the block elongation moieties comprise non-complementary tails.
In an aspect, the method further comprises performing a plurality of PCR2 reactions, PCR21, PCR22, PCR2., to amplify the PCRla and PCR1b complements of amplicons, wherein each PCR2 reaction uses a different forward primer and a different set of one or more downstream reverse primers; and wherein the PCR1 a complement of amplicons are amplified using odd-numbered forward primers and wherein the PCR1b complement of amplicons are amplified using even-numbered forward primers.
In an aspect, the method further comprises pooling the resulting amplicons.
According to an aspect, there is provided a method for sequencing degraded DNA, the method comprising amplifying redundant amplicons such that each region of the target DNA
sequence is covered by multiple amplicons, wherein the generation of specific amplicons is determined automatically by a combination of primer-template matching and the pattern of DNA degradation in the target sequence.
In accordance with an aspect, there is provided a method of amplifying a barcode region of a degraded DNA sample, the method comprising:
performing at least a PCR1 a reaction and a PCR1b reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCRla complement of amplicons and a PCR1b complement of amplicons, wherein the plurality of forward primers comprise primers Fi, F2, ... , F., in order from upstream to downstream of the target sequence, wherein n is a whole number;
wherein the plurality of reverse primers comprise primers Ri, R2, , Rm, in order from upstream to downstream of the target sequence, wherein m is a whole number;
wherein the plurality of reverse primers are downstream of Fi and the plurality of forward primers are upstream of Rn;
wherein the PCRla reaction comprises each odd-numbered forward primer starting with Fi and further comprises all or substantially all of the reverse primers;
and wherein the PCR1b reaction comprises each even-numbered forward primer starting with F2 and further comprises all or substantially all of the reverse primers that are upstream of F2.
In an aspect, the forward and reverse primers comprise block elongation moieties to block elongation from the 5' end of the primers and reduce non-target amplification.
In an aspect, the block elongation moieties comprise non-complementary tails.
In an aspect, the method further comprises performing a plurality of PCR2 reactions, PCR21, PCR22, PCR2., to amplify the PCRla and PCR1b complements of amplicons, wherein each PCR2 reaction uses a different forward primer and a different set of one or more downstream reverse primers; and wherein the PCR1 a complement of amplicons are amplified using odd-numbered forward primers and wherein the PCR1b complement of amplicons are amplified using even-numbered forward primers.
In an aspect, the method further comprises pooling the resulting amplicons.
- 9 -In an aspect, the primers for PCR2 are adapter-tailed for sequence analysis.
In an aspect, the primers for PCR2 are MID-tagged to associate amplicons with specific specimens, such that multiple specimens can be sequenced simultaneously.
In an aspect, n is from 2-10, such as 6.
In an aspect, m is from 2-10, such as 6.
In an aspect, the forward and reverse primers are as defined in Table 4.
In an aspect, the template is not depleted through use of the method.
In accordance with an aspect, there is provided a method of amplifying degraded DNA
according to the scheme shown in Figures 2a and 2b herein.
In an aspect, the method is for taxonomic classification of unknown specimens.
In an aspect, the primers are degenerate.
In an aspect, the method is for analyzing a plurality of specimens simultaneously.
In an aspect, the method is for amplification of a sample comprising small amounts of degraded DNA, such as at least about 0.1 ng of degraded DNA, such as at least about 0.5 ng, about 1 ng, about 10 ng, about 100 ng, about 500 ng, or from about 21tg to about 51.tg of degraded DNA.
The practice of the present subject matter may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, molecular biology (including recombinant techniques), cell biology, and biochemistry, which are within the skill of the art.
Such conventional techniques include, but are not limited to, preparation of synthetic polynucleotides, polymerization techniques, chemical and physical analysis of polymer particles, preparation of nucleic acid libraries, nucleic acid sequencing and analysis, and the like. Specific illustrations of suitable techniques can be used by reference to the examples provided herein. Other equivalent conventional procedures can also be used.
Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Hermanson, Bioconjugate Techniques, Second Edition (Academic Press, 2008);
Merkus, Particle Size Measurements (Springer, 2009); Rubinstein and Colby, Polymer Physics (Oxford University Press, 2003); and the like.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which these inventions belong. All patents, patent applications, published applications, and other publications referred
In an aspect, the primers for PCR2 are MID-tagged to associate amplicons with specific specimens, such that multiple specimens can be sequenced simultaneously.
In an aspect, n is from 2-10, such as 6.
In an aspect, m is from 2-10, such as 6.
In an aspect, the forward and reverse primers are as defined in Table 4.
In an aspect, the template is not depleted through use of the method.
In accordance with an aspect, there is provided a method of amplifying degraded DNA
according to the scheme shown in Figures 2a and 2b herein.
In an aspect, the method is for taxonomic classification of unknown specimens.
In an aspect, the primers are degenerate.
In an aspect, the method is for analyzing a plurality of specimens simultaneously.
In an aspect, the method is for amplification of a sample comprising small amounts of degraded DNA, such as at least about 0.1 ng of degraded DNA, such as at least about 0.5 ng, about 1 ng, about 10 ng, about 100 ng, about 500 ng, or from about 21tg to about 51.tg of degraded DNA.
The practice of the present subject matter may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, molecular biology (including recombinant techniques), cell biology, and biochemistry, which are within the skill of the art.
Such conventional techniques include, but are not limited to, preparation of synthetic polynucleotides, polymerization techniques, chemical and physical analysis of polymer particles, preparation of nucleic acid libraries, nucleic acid sequencing and analysis, and the like. Specific illustrations of suitable techniques can be used by reference to the examples provided herein. Other equivalent conventional procedures can also be used.
Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Hermanson, Bioconjugate Techniques, Second Edition (Academic Press, 2008);
Merkus, Particle Size Measurements (Springer, 2009); Rubinstein and Colby, Polymer Physics (Oxford University Press, 2003); and the like.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which these inventions belong. All patents, patent applications, published applications, and other publications referred
- 10 -to herein, both supra and infra, are incorporated by reference in their entirety. If a definition and/or description is set forth herein that is contrary to or otherwise inconsistent with any definition set forth in the patents, patent applications, published applications, and other publications that are herein incorporated by reference, the definition and/or description set forth herein prevails over the definition that is incorporated by reference.
As used herein, the terms "comprises," "comprising," "includes," "including,"
"has,"
"having" or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of features is not necessarily limited only to those features but may include other features not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, "or" refers to an inclusive-or and not to an exclusive-or. For example, a condition A
or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Brief Description of the Drawings The above and other aspects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
Figure 1 schematically depicts the two stage multiplex nested PCR/NGS
methodology of the present invention.
Figure 2 schematically depicts primer positions for the first and second rounds of PCR
(a) and all possible final amplicons (b). The initial round of PCR (PCR1) includes two separate reactions (a ¨ above broken line) using 10bp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR (PCR2) includes six separate reactions (a ¨ below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID
tags unique to that specimen. For increased multiplexing, each reverse PCR2 primer can also be tailed with a MID tag allowing a large number of possible combinations (e.g. adding 96 unique MID tags to the forward primers and 4 unique MID tags to the reverse primers allows 384 specimens to be multiplexed and individually tracked).
As used herein, the terms "comprises," "comprising," "includes," "including,"
"has,"
"having" or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of features is not necessarily limited only to those features but may include other features not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, "or" refers to an inclusive-or and not to an exclusive-or. For example, a condition A
or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Brief Description of the Drawings The above and other aspects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
Figure 1 schematically depicts the two stage multiplex nested PCR/NGS
methodology of the present invention.
Figure 2 schematically depicts primer positions for the first and second rounds of PCR
(a) and all possible final amplicons (b). The initial round of PCR (PCR1) includes two separate reactions (a ¨ above broken line) using 10bp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR (PCR2) includes six separate reactions (a ¨ below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID
tags unique to that specimen. For increased multiplexing, each reverse PCR2 primer can also be tailed with a MID tag allowing a large number of possible combinations (e.g. adding 96 unique MID tags to the forward primers and 4 unique MID tags to the reverse primers allows 384 specimens to be multiplexed and individually tracked).
-11 -Figure 3 shows the recovery of sequences from ten type specimens in each of three DNA categories. (a) Number of reads; (b) Per base coverage; (c) Number of base pairs (bp) recovered via NGS. HQ ¨ high quality; MQ ¨ medium quality, LQ ¨ low quality.
Mean (horizontal black line), standard deviation (edges of box), min and max (whiskers, *) are shown. The horizontal broken line in (c) represents a full-length (658bp) barcode.
Figure 4 shows a neighbor-joining tree showing 100% concordance between sequences generated from type specimens using NGS and Sanger sequencing. For each species, BOLD
Process IDs are shown for both the Sanger and NGS-generated (outlined in red) sequences.
Figure 5 shows a neighbor-Joining tree of barcodes generated century-old type specimens and contemporary congeneric taxa (where available). Barcodes from the 26 century-old specimens (outlined in red) were generated via NGS. Four cases involve confirmed or suspected synonymy: Celerna amplimargo and C. lerne, Aeolochroma caesia and A.
saturataria, Sarcinodes subvirgata and S. holzi, Pingasa furvifrons and P.
nobilis.
Figure 6 schematically shows the alignments of sequence records derived from two type specimens of Geometridae, one with high quality DNA (a) and one with low quality DNA
(b). The alignments show only a single representative of each distinct sequence. In many cases, there were hundreds or thousands of a particular sequence. High quality reads have high coverage across the entire 658bp barcode region and originate from a single source ¨ indicated by a single nucleotide (color) at each position in the contig. Low quality reads do not span the entire barcode region (i.e. they have regions lacking coverage) and often originate from multiple sources ¨ indicated by multiple nucleotides (colors) at certain positions in the contig.
Figure 7 shows that there is no negative impact on sequence recovery when NGS
throughput is increased by analyzing 95 samples simultaneously. "10-plex"
refers to amplifying and sequencing 10 samples in a single process, while "95-plex"
refers to amplifying and sequencing 95 samples in a single process. In addition to decreasing processing time, costs are cut almost 10-fold by moving from a 10- to 95-plex system. A similar move to a 384-plex system is currently being developed and would further cut costs significantly.
Figure 8 shows the effects of designing primers to target a specific taxonomic group.
In this example, primers designed to target animal DNA in general are compared to the same primers designed to target vertebrate DNA specifically. Both primer sets were used to amplify the same mammalian DNA, and the results clearly show a significant performance improvement by the vertebrate primers compared to the general primers. By making similar
Mean (horizontal black line), standard deviation (edges of box), min and max (whiskers, *) are shown. The horizontal broken line in (c) represents a full-length (658bp) barcode.
Figure 4 shows a neighbor-joining tree showing 100% concordance between sequences generated from type specimens using NGS and Sanger sequencing. For each species, BOLD
Process IDs are shown for both the Sanger and NGS-generated (outlined in red) sequences.
Figure 5 shows a neighbor-Joining tree of barcodes generated century-old type specimens and contemporary congeneric taxa (where available). Barcodes from the 26 century-old specimens (outlined in red) were generated via NGS. Four cases involve confirmed or suspected synonymy: Celerna amplimargo and C. lerne, Aeolochroma caesia and A.
saturataria, Sarcinodes subvirgata and S. holzi, Pingasa furvifrons and P.
nobilis.
Figure 6 schematically shows the alignments of sequence records derived from two type specimens of Geometridae, one with high quality DNA (a) and one with low quality DNA
(b). The alignments show only a single representative of each distinct sequence. In many cases, there were hundreds or thousands of a particular sequence. High quality reads have high coverage across the entire 658bp barcode region and originate from a single source ¨ indicated by a single nucleotide (color) at each position in the contig. Low quality reads do not span the entire barcode region (i.e. they have regions lacking coverage) and often originate from multiple sources ¨ indicated by multiple nucleotides (colors) at certain positions in the contig.
Figure 7 shows that there is no negative impact on sequence recovery when NGS
throughput is increased by analyzing 95 samples simultaneously. "10-plex"
refers to amplifying and sequencing 10 samples in a single process, while "95-plex"
refers to amplifying and sequencing 95 samples in a single process. In addition to decreasing processing time, costs are cut almost 10-fold by moving from a 10- to 95-plex system. A similar move to a 384-plex system is currently being developed and would further cut costs significantly.
Figure 8 shows the effects of designing primers to target a specific taxonomic group.
In this example, primers designed to target animal DNA in general are compared to the same primers designed to target vertebrate DNA specifically. Both primer sets were used to amplify the same mammalian DNA, and the results clearly show a significant performance improvement by the vertebrate primers compared to the general primers. By making similar
- 12 -primer modifications, the NGS method can theoretically be applied to any genetic sequence in any type of organism.
Figure 9 shows PCR success rates for general- and vertebrate-specific primers.
Both primer sets target the same gene region. To directly compare the primer sets, each set was used to amplify the same DNA from 95 fresh and 95 degraded vertebrate samples. In both cases, the vertebrate-specific primers outperformed the general primers.
Detailed Description of the Invention The following description of various exemplary embodiments is exemplary and explanatory only and is not to be construed as limiting or restrictive in any way. Other embodiments, features, objects, and advantages of the present teachings will be apparent from the description and accompanying drawings, and from the claims.
As used herein, "amplify", "amplifying" or "amplification reaction" and their derivatives, refer generally to any action or process whereby at least a portion of a nucleic acid molecule (referred to as a template nucleic acid molecule) is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule. The template nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded. In some embodiments, amplification includes a template-dependent in vitro enzyme-catalyzed reaction for the production of at least one copy of at least some portion of the nucleic acid molecule or the production of at least one copy of a nucleic acid sequence that is complementary to at least some portion of the nucleic acid molecule.
Amplification optionally includes linear or exponential replication of a nucleic acid molecule.
In some embodiments, such amplification is performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. At least some of the target sequences can be situated on the same nucleic acid molecule or on different target nucleic acid molecules included in the single amplification reaction. In some embodiments, "amplification"
includes amplification of at least some portion of DNA- and RNA-based nucleic acids alone, or in combination. The amplification reaction can include single or double-stranded nucleic acid substrates and can further including any of the amplification processes known to one of
Figure 9 shows PCR success rates for general- and vertebrate-specific primers.
Both primer sets target the same gene region. To directly compare the primer sets, each set was used to amplify the same DNA from 95 fresh and 95 degraded vertebrate samples. In both cases, the vertebrate-specific primers outperformed the general primers.
Detailed Description of the Invention The following description of various exemplary embodiments is exemplary and explanatory only and is not to be construed as limiting or restrictive in any way. Other embodiments, features, objects, and advantages of the present teachings will be apparent from the description and accompanying drawings, and from the claims.
As used herein, "amplify", "amplifying" or "amplification reaction" and their derivatives, refer generally to any action or process whereby at least a portion of a nucleic acid molecule (referred to as a template nucleic acid molecule) is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule. The template nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded. In some embodiments, amplification includes a template-dependent in vitro enzyme-catalyzed reaction for the production of at least one copy of at least some portion of the nucleic acid molecule or the production of at least one copy of a nucleic acid sequence that is complementary to at least some portion of the nucleic acid molecule.
Amplification optionally includes linear or exponential replication of a nucleic acid molecule.
In some embodiments, such amplification is performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. At least some of the target sequences can be situated on the same nucleic acid molecule or on different target nucleic acid molecules included in the single amplification reaction. In some embodiments, "amplification"
includes amplification of at least some portion of DNA- and RNA-based nucleic acids alone, or in combination. The amplification reaction can include single or double-stranded nucleic acid substrates and can further including any of the amplification processes known to one of
- 13 -ordinary skill in the art. In some embodiments, the amplification reaction includes polymerase chain reaction (PCR).
As used herein, "amplification conditions" and its derivatives, generally refers to conditions suitable for amplifying one or more nucleic acid sequences. Such amplification can be linear or exponential. In some embodiments, the amplification conditions can include isothermal conditions or alternatively can include thermocyling conditions, or a combination of isothermal and thermocycling conditions. In some embodiments, the conditions suitable for amplifying one or more nucleic acid sequences includes polymerase chain reaction (PCR) conditions. Typically, the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences, or to amplify an amplified target sequence ligated to one or more adapters, e.g., an adapter-ligated amplified target sequence.
Generally, the amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid. The amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification. Typically, but not necessarily, amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending and separating are repeated. Typically, the amplification conditions include cations such as Mg ++ or Mn (e.g., MgCl2, etc.) and can also include various modifiers of ionic strength.
As used herein, "target sequence" or "target sequence of interest" and its derivatives, refers generally to any single or double-stranded nucleic acid sequence that can be amplified or synthesized according to the disclosure, including any nucleic acid sequence suspected or expected to be present in a sample. In some embodiments, the target sequence is present in double-stranded form and includes at least a portion of the particular nucleotide sequence to be amplified or synthesized, or its complement, prior to the addition of target-specific primers or appended adapters. Target sequences can include the nucleic acids to which primers useful in the amplification or synthesis reaction can hybridize prior to extension by a polymerase. In some embodiments, the term refers to a nucleic acid sequence whose sequence identity, ordering or location of nucleotides is determined by one or more of the methods of the disclosure.
As used herein, "amplification conditions" and its derivatives, generally refers to conditions suitable for amplifying one or more nucleic acid sequences. Such amplification can be linear or exponential. In some embodiments, the amplification conditions can include isothermal conditions or alternatively can include thermocyling conditions, or a combination of isothermal and thermocycling conditions. In some embodiments, the conditions suitable for amplifying one or more nucleic acid sequences includes polymerase chain reaction (PCR) conditions. Typically, the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences, or to amplify an amplified target sequence ligated to one or more adapters, e.g., an adapter-ligated amplified target sequence.
Generally, the amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid. The amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification. Typically, but not necessarily, amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending and separating are repeated. Typically, the amplification conditions include cations such as Mg ++ or Mn (e.g., MgCl2, etc.) and can also include various modifiers of ionic strength.
As used herein, "target sequence" or "target sequence of interest" and its derivatives, refers generally to any single or double-stranded nucleic acid sequence that can be amplified or synthesized according to the disclosure, including any nucleic acid sequence suspected or expected to be present in a sample. In some embodiments, the target sequence is present in double-stranded form and includes at least a portion of the particular nucleotide sequence to be amplified or synthesized, or its complement, prior to the addition of target-specific primers or appended adapters. Target sequences can include the nucleic acids to which primers useful in the amplification or synthesis reaction can hybridize prior to extension by a polymerase. In some embodiments, the term refers to a nucleic acid sequence whose sequence identity, ordering or location of nucleotides is determined by one or more of the methods of the disclosure.
- 14 -As defined herein, "sample" is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target. In some embodiments, the sample comprises DNA, RNA, PNA, LNA, chimeric, hybrid, or multiplex-forms of nucleic acids. The sample can include any biological, animal, avian, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids. The term also includes any isolated nucleic acid sample such a genomic DNA, fresh-frozen or formalin-fixed paraffin-embedded nucleic acid specimen.
As used herein, "degraded DNA" is used in its broadest sense to include DNA
that is "falling apart" or broken down into smaller pieces. Degraded DNA may be reflective of: using very old DNA samples; using DNA extracted from formalin-fixed paraffin embedded samples;
freezing and thawing DNA samples repeatedly; leaving DNA samples at room temperature; or exposing DNA samples to heat or physical shearing.
As used herein, the term "primer" and its derivatives refer generally to any polynucleotide that can hybridize to a target sequence of interest. In some embodiments, the primer can also serve to prime nucleic acid synthesis. Typically, the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule. The primer may be comprised of any combination of nucleotides or analogs thereof, which may be optionally linked to form a linear polymer of any suitable length. In some embodiments, the primer is a single-stranded oligonucleotide or polynucleotide. (For purposes of this disclosure, the terms "polynucleotide"
and "oligonucleotide" are used interchangeably herein and do not necessarily indicate any difference in length between the two). In some embodiments, the primer is single-stranded but it can also be double-stranded. The primer optionally occurs naturally, as in a purified restriction digest, or can be produced synthetically. In some embodiments, the primer acts as a point of initiation for amplification or synthesis when exposed to amplification or synthesis conditions; such amplification or synthesis can occur in a template-dependent fashion and optionally results in formation of a primer extension product that is complementary to at least a portion of the target sequence. Exemplary amplification or synthesis conditions can include contacting the primer with a polynucleotide template (e.g., a template including a target sequence), nucleotides and an inducing agent such as a polymerase at a suitable temperature and pH to induce polymerization of nucleotides onto an end of the target-specific primer. If
As used herein, "degraded DNA" is used in its broadest sense to include DNA
that is "falling apart" or broken down into smaller pieces. Degraded DNA may be reflective of: using very old DNA samples; using DNA extracted from formalin-fixed paraffin embedded samples;
freezing and thawing DNA samples repeatedly; leaving DNA samples at room temperature; or exposing DNA samples to heat or physical shearing.
As used herein, the term "primer" and its derivatives refer generally to any polynucleotide that can hybridize to a target sequence of interest. In some embodiments, the primer can also serve to prime nucleic acid synthesis. Typically, the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule. The primer may be comprised of any combination of nucleotides or analogs thereof, which may be optionally linked to form a linear polymer of any suitable length. In some embodiments, the primer is a single-stranded oligonucleotide or polynucleotide. (For purposes of this disclosure, the terms "polynucleotide"
and "oligonucleotide" are used interchangeably herein and do not necessarily indicate any difference in length between the two). In some embodiments, the primer is single-stranded but it can also be double-stranded. The primer optionally occurs naturally, as in a purified restriction digest, or can be produced synthetically. In some embodiments, the primer acts as a point of initiation for amplification or synthesis when exposed to amplification or synthesis conditions; such amplification or synthesis can occur in a template-dependent fashion and optionally results in formation of a primer extension product that is complementary to at least a portion of the target sequence. Exemplary amplification or synthesis conditions can include contacting the primer with a polynucleotide template (e.g., a template including a target sequence), nucleotides and an inducing agent such as a polymerase at a suitable temperature and pH to induce polymerization of nucleotides onto an end of the target-specific primer. If
- 15 -double-stranded, the primer can optionally be treated to separate its strands before being used to prepare primer extension products. In some embodiments, the primer is an oligodeoxyribonucleotide or an oligoribonucleotide. In some embodiments, the primer can include one or more nucleotide analogs. The exact length and/or composition, including sequence, of the target-specific primer can influence many properties, including melting temperature (Tm), GC content, formation of secondary structures, repeat nucleotide motifs, length of predicted primer extension products, extent of coverage across a nucleic acid molecule of interest, number of primers present in a single amplification or synthesis reaction, presence of nucleotide analogs or modified nucleotides within the primers, and the like. In some embodiments, a primer can be paired with a compatible primer within an amplification or synthesis reaction to form a primer pair consisting or a forward primer and a reverse primer.
In some embodiments, the forward primer of the primer pair includes a sequence that is substantially complementary to at least a portion of a strand of a nucleic acid molecule, and the reverse primer of the primer of the primer pair includes a sequence that is substantially identical to at least of portion of the strand. In some embodiments, the forward primer and the reverse primer are capable of hybridizing to opposite strands of a nucleic acid duplex. Optionally, the forward primer primes synthesis of a first nucleic acid strand, and the reverse primer primes synthesis of a second nucleic acid strand, wherein the first and second strands are substantially complementary to each other, or can hybridize to form a double-stranded nucleic acid molecule. In some embodiments, one end of an amplification or synthesis product is defined by the forward primer and the other end of the amplification or synthesis product is defined by the reverse primer. In some embodiments, where the amplification or synthesis of lengthy primer extension products is required, such as amplifying an exon, coding region, or gene, several primer pairs can be created than span the desired length to enable sufficient amplification of the region. In some embodiments, a primer can include one or more cleavable groups. In some embodiments, primer lengths are in the range of about 10 to about 60 nucleotides, about 12 to about 50 nucleotides and about 15 to about 40 nucleotides in length.
Typically, a primer is capable of hybridizing to a corresponding target sequence and undergoing primer extension when exposed to amplification conditions in the presence of dNTPS and a polymerase. In some instances, the particular nucleotide sequence or a portion of the primer is known at the outset of the amplification reaction or can be determined by one or more of the methods disclosed herein. In some embodiments, the primer includes one or more cleavable groups at one or more locations within the primer.
In some embodiments, the forward primer of the primer pair includes a sequence that is substantially complementary to at least a portion of a strand of a nucleic acid molecule, and the reverse primer of the primer of the primer pair includes a sequence that is substantially identical to at least of portion of the strand. In some embodiments, the forward primer and the reverse primer are capable of hybridizing to opposite strands of a nucleic acid duplex. Optionally, the forward primer primes synthesis of a first nucleic acid strand, and the reverse primer primes synthesis of a second nucleic acid strand, wherein the first and second strands are substantially complementary to each other, or can hybridize to form a double-stranded nucleic acid molecule. In some embodiments, one end of an amplification or synthesis product is defined by the forward primer and the other end of the amplification or synthesis product is defined by the reverse primer. In some embodiments, where the amplification or synthesis of lengthy primer extension products is required, such as amplifying an exon, coding region, or gene, several primer pairs can be created than span the desired length to enable sufficient amplification of the region. In some embodiments, a primer can include one or more cleavable groups. In some embodiments, primer lengths are in the range of about 10 to about 60 nucleotides, about 12 to about 50 nucleotides and about 15 to about 40 nucleotides in length.
Typically, a primer is capable of hybridizing to a corresponding target sequence and undergoing primer extension when exposed to amplification conditions in the presence of dNTPS and a polymerase. In some instances, the particular nucleotide sequence or a portion of the primer is known at the outset of the amplification reaction or can be determined by one or more of the methods disclosed herein. In some embodiments, the primer includes one or more cleavable groups at one or more locations within the primer.
- 16 -As used herein, "target-specific primer" and its derivatives, refers generally to a single stranded or double-stranded polynucleotide, typically an oligonucleotide, that includes at least one sequence that is at least 50% complementary, typically at least 75%
complementary or at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% or at least 99% complementary, or identical, to at least a portion of a nucleic acid molecule that includes a target sequence. In such instances, the target-specific primer and target sequence are described as "corresponding" to each other.
In some embodiments, the target-specific primer is capable of hybridizing to at least a portion of its corresponding target sequence (or to a complement of the target sequence); such hybridization can optionally be performed under standard hybridization conditions or under stringent hybridization conditions. In some embodiments, the target-specific primer is not capable of hybridizing to the target sequence, or to its complement, but is capable of hybridizing to a portion of a nucleic acid strand including the target sequence, or to its complement. In some embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the target sequence itself; in other embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85%
complementary, more typically at least 90% complementary, more typically at least 95%
complementary, more typically at least 98% complementary, or more typically at least 99%
complementary, to at least a portion of the nucleic acid molecule other than the target sequence.
In some embodiments, the target-specific primer is substantially non-complementary to other target sequences present in the sample; optionally, the target-specific primer is substantially non-complementary to other nucleic acid molecules present in the sample. In some embodiments, nucleic acid molecules present in the sample that do not include or correspond to a target sequence (or to a complement of the target sequence) are referred to as "non-specific"
sequences or "non-specific nucleic acids". In some embodiments, the target-specific primer is designed to include a nucleotide sequence that is substantially complementary to at least a portion of its corresponding target sequence. In some embodiments, a target-specific primer is at least 95% complementary, or at least 99% complementary, or identical, across its entire length to at least a portion of a nucleic acid molecule that includes its corresponding target sequence. In some embodiments, a target-specific primer can be at least 90%, at least 95%
complementary or at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% or at least 99% complementary, or identical, to at least a portion of a nucleic acid molecule that includes a target sequence. In such instances, the target-specific primer and target sequence are described as "corresponding" to each other.
In some embodiments, the target-specific primer is capable of hybridizing to at least a portion of its corresponding target sequence (or to a complement of the target sequence); such hybridization can optionally be performed under standard hybridization conditions or under stringent hybridization conditions. In some embodiments, the target-specific primer is not capable of hybridizing to the target sequence, or to its complement, but is capable of hybridizing to a portion of a nucleic acid strand including the target sequence, or to its complement. In some embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the target sequence itself; in other embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85%
complementary, more typically at least 90% complementary, more typically at least 95%
complementary, more typically at least 98% complementary, or more typically at least 99%
complementary, to at least a portion of the nucleic acid molecule other than the target sequence.
In some embodiments, the target-specific primer is substantially non-complementary to other target sequences present in the sample; optionally, the target-specific primer is substantially non-complementary to other nucleic acid molecules present in the sample. In some embodiments, nucleic acid molecules present in the sample that do not include or correspond to a target sequence (or to a complement of the target sequence) are referred to as "non-specific"
sequences or "non-specific nucleic acids". In some embodiments, the target-specific primer is designed to include a nucleotide sequence that is substantially complementary to at least a portion of its corresponding target sequence. In some embodiments, a target-specific primer is at least 95% complementary, or at least 99% complementary, or identical, across its entire length to at least a portion of a nucleic acid molecule that includes its corresponding target sequence. In some embodiments, a target-specific primer can be at least 90%, at least 95%
- 17 -complementary, at least 98% complementary or at least 99% complementary, or identical, across its entire length to at least a portion of its corresponding target sequence. In some embodiments, a forward target-specific primer and a reverse target-specific primer define a target-specific primer pair that can be used to amplify the target sequence via template-dependent primer extension. Typically, each primer of a target-specific primer pair includes at least one sequence that is substantially complementary to at least a portion of a nucleic acid molecule including a corresponding target sequence but that is less than 50%
complementary to at least one other target sequence in the sample. In some embodiments, amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, each including at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. In some embodiments, the target-specific primer can be substantially non-complementary at its 3' end or its 5' end to any other target-specific primer present in an amplification reaction. In some embodiments, the target-specific primer can include minimal cross hybridization to other target-specific primers in the amplification reaction. In some embodiments, target-specific primers include minimal cross-hybridization to non-specific sequences in the amplification reaction mixture. In some embodiments, the target-specific primers include minimal self-complementarily. In some embodiments, the target-specific primers can include one or more cleavable groups located at the 3' end. In some embodiments, the target-specific primers can include one or more cleavable groups located near or about a central nucleotide of the target-specific primer. In some embodiments, one of more targets-specific primers includes only non-cleavable nucleotides at the 5' end of the target-specific primer. In some embodiments, a target specific primer includes minimal nucleotide sequence overlap at the 3' end or the 5' end of the primer as compared to one or more different target-specific primers, optionally in the same amplification reaction. In some embodiments 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more, target-specific primers in a single reaction mixture include one or more of the above embodiments. In some embodiments, substantially all of the plurality of target-specific primers in a single reaction mixture includes one or more of the above embodiments.
As used herein, "polymerase" and its derivatives, generally refers to any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically but not necessarily, such nucleotide polymerization can occur in a template-
complementary to at least one other target sequence in the sample. In some embodiments, amplification can be performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, each including at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. In some embodiments, the target-specific primer can be substantially non-complementary at its 3' end or its 5' end to any other target-specific primer present in an amplification reaction. In some embodiments, the target-specific primer can include minimal cross hybridization to other target-specific primers in the amplification reaction. In some embodiments, target-specific primers include minimal cross-hybridization to non-specific sequences in the amplification reaction mixture. In some embodiments, the target-specific primers include minimal self-complementarily. In some embodiments, the target-specific primers can include one or more cleavable groups located at the 3' end. In some embodiments, the target-specific primers can include one or more cleavable groups located near or about a central nucleotide of the target-specific primer. In some embodiments, one of more targets-specific primers includes only non-cleavable nucleotides at the 5' end of the target-specific primer. In some embodiments, a target specific primer includes minimal nucleotide sequence overlap at the 3' end or the 5' end of the primer as compared to one or more different target-specific primers, optionally in the same amplification reaction. In some embodiments 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more, target-specific primers in a single reaction mixture include one or more of the above embodiments. In some embodiments, substantially all of the plurality of target-specific primers in a single reaction mixture includes one or more of the above embodiments.
As used herein, "polymerase" and its derivatives, generally refers to any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically but not necessarily, such nucleotide polymerization can occur in a template-
- 18 -dependent fashion. Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization. Optionally, the polymerase can be a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases.
Typically, the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases. The term "polymerase" and its variants, as used herein, also refers to fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide.
In some embodiments, the second polypeptide can include a reporter enzyme or a processivity-enhancing domain. Optionally, the polymerase can possess 5' exonuclease activity or terminal transferase activity. In some embodiments, the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture. In some embodiments, the polymerase can include a hot-start polymerase or an aptamer based polymerase that optionally can be reactivated.
As used herein, the term "nucleotide" and its variants comprises any compound, including without limitation any naturally occurring nucleotide or analog thereof, which can bind selectively to, or can be polymerized by, a polymerase. Typically, but not necessarily, selective binding of the nucleotide to the polymerase is followed by polymerization of the nucleotide into a nucleic acid strand by the polymerase; occasionally however the nucleotide may dissociate from the polymerase without becoming incorporated into the nucleic acid strand, an event referred to herein as a "non-productive" event. Such nucleotides include not only naturally occurring nucleotides but also any analogs, regardless of their structure, that can bind selectively to, or can be polymerized by, a polymerase.
The term "extension" and its variants, as used herein, when used in reference to a given primer, comprises any in vivo or in vitro enzymatic activity characteristic of a given polymerase that relates to polymerization of one or more nucleotides onto an end of an existing nucleic acid molecule. Typically but not necessarily such primer extension occurs in a template-
Typically, the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases. The term "polymerase" and its variants, as used herein, also refers to fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide.
In some embodiments, the second polypeptide can include a reporter enzyme or a processivity-enhancing domain. Optionally, the polymerase can possess 5' exonuclease activity or terminal transferase activity. In some embodiments, the polymerase can be optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture. In some embodiments, the polymerase can include a hot-start polymerase or an aptamer based polymerase that optionally can be reactivated.
As used herein, the term "nucleotide" and its variants comprises any compound, including without limitation any naturally occurring nucleotide or analog thereof, which can bind selectively to, or can be polymerized by, a polymerase. Typically, but not necessarily, selective binding of the nucleotide to the polymerase is followed by polymerization of the nucleotide into a nucleic acid strand by the polymerase; occasionally however the nucleotide may dissociate from the polymerase without becoming incorporated into the nucleic acid strand, an event referred to herein as a "non-productive" event. Such nucleotides include not only naturally occurring nucleotides but also any analogs, regardless of their structure, that can bind selectively to, or can be polymerized by, a polymerase.
The term "extension" and its variants, as used herein, when used in reference to a given primer, comprises any in vivo or in vitro enzymatic activity characteristic of a given polymerase that relates to polymerization of one or more nucleotides onto an end of an existing nucleic acid molecule. Typically but not necessarily such primer extension occurs in a template-
- 19 -dependent fashion; during template-dependent extension, the order and selection of bases is driven by established base pairing rules, which can include Watson-Crick type base pairing rules or alternatively (and especially in the case of extension reactions involving nucleotide analogs) by some other type of base pairing paradigm. In one non-limiting example, extension occurs via polymerization of nucleotides on the 3'0H end of the nucleic acid molecule by the polymerase.
As used herein, "multiplex identifier tag (MID)" or "DNA tagging sequence" and its derivatives, refers generally to a unique short (6-14 nucleotide) nucleic acid sequence within an adapter that can act as a "key" to distinguish or separate a plurality of amplified target sequences in a sample. For the purposes of this disclosure, a DNA barcode or DNA tagging sequence can be incorporated into the nucleotide sequence of an adapter.
As used herein, a "barcode" is a short DNA sequence from a uniform locality on the genome used for identifying species.
As defined herein "multiplex amplification" refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel. The "plexy" or "plex" of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536-plex, 3072-plex, 6144-plex or higher.
As used herein, "nested PCR" means that two pairs of PCR primers were used for a single locus. The first pair amplifies the locus as seen in any PCR
experiment. The second pair of primers (nested primers) bind within the first PCR product and produce a second PCR
product that may be shorter than the first one.
As used herein "Next Generation Sequencing (NGS)" refers to various types of massive parallel sequencing techniques. NGS extends the process of sequencing by sequencing millions of fragments in parallel fashion. NGS basically incorporates library preparation, cluster generation, sequencing and data analysis. Several different types of NGS
platforms are commercially available.
DNA barcoding is a new system of species identification and discovery using a short section of DNA from a standardized region of the genome [1]. This DNA sequence is then used to identify different species in a manner analogous to a supermarket scanner using black stripes
As used herein, "multiplex identifier tag (MID)" or "DNA tagging sequence" and its derivatives, refers generally to a unique short (6-14 nucleotide) nucleic acid sequence within an adapter that can act as a "key" to distinguish or separate a plurality of amplified target sequences in a sample. For the purposes of this disclosure, a DNA barcode or DNA tagging sequence can be incorporated into the nucleotide sequence of an adapter.
As used herein, a "barcode" is a short DNA sequence from a uniform locality on the genome used for identifying species.
As defined herein "multiplex amplification" refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel. The "plexy" or "plex" of a given multiplex amplification refers generally to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy can be about 12-plex, 24-plex, 48-plex, 96-plex, 192-plex, 384-plex, 768-plex, 1536-plex, 3072-plex, 6144-plex or higher.
As used herein, "nested PCR" means that two pairs of PCR primers were used for a single locus. The first pair amplifies the locus as seen in any PCR
experiment. The second pair of primers (nested primers) bind within the first PCR product and produce a second PCR
product that may be shorter than the first one.
As used herein "Next Generation Sequencing (NGS)" refers to various types of massive parallel sequencing techniques. NGS extends the process of sequencing by sequencing millions of fragments in parallel fashion. NGS basically incorporates library preparation, cluster generation, sequencing and data analysis. Several different types of NGS
platforms are commercially available.
DNA barcoding is a new system of species identification and discovery using a short section of DNA from a standardized region of the genome [1]. This DNA sequence is then used to identify different species in a manner analogous to a supermarket scanner using black stripes
- 20 -of the UPC barcode to identify purchases. It would be very beneficial to be able to barcode any type of sample from any source, no matter how old or how it has been treated.
In particular, it is beneficial to be able to barcode specimens whereby the DNA may be degraded to certain extents.
The method of the invention (schematically shown in Figure 1) incorporates a novel two stage multiplex nested PCR approach to first amplify very small amounts of degraded DNA to produce a plurality of amplicons that cover the entire region of the gene or sequence of interest. The amplicons are then subject to sequence characterization by NGS methods. The method of the invention uses NGS to characterize/recover sequences of the pool of amplicons produced by the multiplex PCR from specimens with varying DNA qualities (i.e.
different levels of degradation). In combination these two provide for a novel method of amplification and characterization of DNA sequences from degraded sources.
The present method has use in one aspect for sequencing essentially the entire barcode of a specimen that may have varying degrees of DNA degradation, inclusive of specimens with almost no intact DNA. The present method can be used with as little as 2 [tg specimen sample size or more containing various degrees of degraded DNA. This will then provide utility with respect to identifying species and confirming species but also for a variety of other applications including biomedicine, forensics and environmental DNA (eDNA) monitoring where assembly of longer sequences from trace amounts of fragmented DNA is necessary. The present method can also be used with respect to foods where artificial sequences may be inserted therein. In general, the method provides for recovery of barcode sequences (or any desired sequence) and possibly promote development of portable devices for DNA barcoding.
A mitochondrial gene barcode is used to enable the identification of most animal species. For plants, mitochondrial genes do not differ sufficiently to distinguish among closely related species. The gene region being used for almost all animal groups is a 658 base-pair region in the mitochondrial cytochrome c oxidase 1 gene ("COI") (the Folmer region). It is highly effective in identifying a range of animal groups as well as birds, butterflies, fish and flies. The COI barcode is not effective for identifying plants because it evolves too slowly, but the two gene regions in the chloroplast, matK and rbcL, are approved as the barcode regions for land plants. For fungi, the internal transcribed spacer (ITS) region may be used. Other barcode regions are being identified and it will be understood that the methods described herein are applicable to any barcode region, whether currently known or identified in the future.
In particular, it is beneficial to be able to barcode specimens whereby the DNA may be degraded to certain extents.
The method of the invention (schematically shown in Figure 1) incorporates a novel two stage multiplex nested PCR approach to first amplify very small amounts of degraded DNA to produce a plurality of amplicons that cover the entire region of the gene or sequence of interest. The amplicons are then subject to sequence characterization by NGS methods. The method of the invention uses NGS to characterize/recover sequences of the pool of amplicons produced by the multiplex PCR from specimens with varying DNA qualities (i.e.
different levels of degradation). In combination these two provide for a novel method of amplification and characterization of DNA sequences from degraded sources.
The present method has use in one aspect for sequencing essentially the entire barcode of a specimen that may have varying degrees of DNA degradation, inclusive of specimens with almost no intact DNA. The present method can be used with as little as 2 [tg specimen sample size or more containing various degrees of degraded DNA. This will then provide utility with respect to identifying species and confirming species but also for a variety of other applications including biomedicine, forensics and environmental DNA (eDNA) monitoring where assembly of longer sequences from trace amounts of fragmented DNA is necessary. The present method can also be used with respect to foods where artificial sequences may be inserted therein. In general, the method provides for recovery of barcode sequences (or any desired sequence) and possibly promote development of portable devices for DNA barcoding.
A mitochondrial gene barcode is used to enable the identification of most animal species. For plants, mitochondrial genes do not differ sufficiently to distinguish among closely related species. The gene region being used for almost all animal groups is a 658 base-pair region in the mitochondrial cytochrome c oxidase 1 gene ("COI") (the Folmer region). It is highly effective in identifying a range of animal groups as well as birds, butterflies, fish and flies. The COI barcode is not effective for identifying plants because it evolves too slowly, but the two gene regions in the chloroplast, matK and rbcL, are approved as the barcode regions for land plants. For fungi, the internal transcribed spacer (ITS) region may be used. Other barcode regions are being identified and it will be understood that the methods described herein are applicable to any barcode region, whether currently known or identified in the future.
- 21 -The method of the invention has been demonstrated herein to recover the barcode region for COI from small amounts of template DNA, initially from a small number of Lepidoptera and subsequently from samples spanning several major insect orders, as well as arachnids, marine invertebrates, and land- and aquatic-based vertebrates.
However, it would be understood by one of skill in the art that the present method is very universal and in aspects can also be scaled for plants or other organisms, as well as for other gene regions. The method can be provided as a system in separate kits for invertebrates, mammals, fish and birds as non-limiting examples.
The present two stage multiplex PCR/NGS approach whereby it allows for all fragments to be amplified in a single multiplex PCR due to the multiplex nature of the PCR
reactions which allows for high primer redundancy. As a result, each DNA
extract processed with the present approach is exposed to amplification by approximately 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The multiplex PCR is performed such that initially generated amplicons using a plurality of primers act as a template for a subsequent round of multiplex PCR with different primer characteristics.
This is also in contrast to the traditional Sanger approach which utilizes multiple PCR
reactions for each fragment.
In the method a first step of multiplex PCR (PCR1) is performed using nested degenerate primers designed to hybridize to the extracted target DNA template.
To avoid preferential amplification of certain fragments and amplification bias, a second round of multiplex PCR (PCR2) is performed targeting non-adjacent fragments of the DNA
template using the first round PCR (PCR1) products (amplicons) as a template. The same reaction is basically repeated and then further in a nested approach (using nested PCR).
In the first stage PCR1 10bp-tailed primers are used while in PCR2 adapter-tailed primers that are also tailed with a multiplex identifier tag (MID) are used.
To produce more template options for the second multiplex PCR2 without increasing bias, each of the two first step PCR1 reactions contain selected forward primers for all downstream reverse primers to allow the same region of DNA to be covered by multiple amplicons ¨ thus produce more redundancy. Thus each primer pair in the multiplex second stage PCR2 is provided with multiple template amplicon options where only one needs to work to get full coverage of the target sequence. To further neutralize amplification bias, reactions are split into six reactions, one for each forward primer that is paired with the next 1-3 downstream reverse primers. Taken together, this cumulatively prevents overlap amplification,
However, it would be understood by one of skill in the art that the present method is very universal and in aspects can also be scaled for plants or other organisms, as well as for other gene regions. The method can be provided as a system in separate kits for invertebrates, mammals, fish and birds as non-limiting examples.
The present two stage multiplex PCR/NGS approach whereby it allows for all fragments to be amplified in a single multiplex PCR due to the multiplex nature of the PCR
reactions which allows for high primer redundancy. As a result, each DNA
extract processed with the present approach is exposed to amplification by approximately 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The multiplex PCR is performed such that initially generated amplicons using a plurality of primers act as a template for a subsequent round of multiplex PCR with different primer characteristics.
This is also in contrast to the traditional Sanger approach which utilizes multiple PCR
reactions for each fragment.
In the method a first step of multiplex PCR (PCR1) is performed using nested degenerate primers designed to hybridize to the extracted target DNA template.
To avoid preferential amplification of certain fragments and amplification bias, a second round of multiplex PCR (PCR2) is performed targeting non-adjacent fragments of the DNA
template using the first round PCR (PCR1) products (amplicons) as a template. The same reaction is basically repeated and then further in a nested approach (using nested PCR).
In the first stage PCR1 10bp-tailed primers are used while in PCR2 adapter-tailed primers that are also tailed with a multiplex identifier tag (MID) are used.
To produce more template options for the second multiplex PCR2 without increasing bias, each of the two first step PCR1 reactions contain selected forward primers for all downstream reverse primers to allow the same region of DNA to be covered by multiple amplicons ¨ thus produce more redundancy. Thus each primer pair in the multiplex second stage PCR2 is provided with multiple template amplicon options where only one needs to work to get full coverage of the target sequence. To further neutralize amplification bias, reactions are split into six reactions, one for each forward primer that is paired with the next 1-3 downstream reverse primers. Taken together, this cumulatively prevents overlap amplification,
- 22 -reduces amplification bias and results in redundant amplification so that if one particular primer set is not effective or fails then another is likely to cover for it.
To further avoid primer incorporation into the middle of sequence reads, certain reverse primers from PCR1 are omitted so that overlap amplicons cannot form, however, this reduces the amount of amplicons available as templates for PCR2 which leads to a loss of the amplification redundancy. Thus unwanted elongation by the polymerase is blocked in PCR1.
This was effected by the use of non-complementary tails on the PCR1 primers, however, any agent that blocks elongation (i.e. chemical conjugation) from the 5' end of the primers can be used.
For NGS, the primers used in the PCR2 are tailed with adapter sequences and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. The superior performance of NGS in sequence recovery is likely due, in part, to the developed multiplex nature of the PCR reactions which allowed high primer redundancy. As a result, each DNA
extract processed with the current protocol was exposed to amplification by 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The high diversity of primers undoubtedly meant that there was a greater chance of achieving the primer-template homology necessary for successful amplification. The higher success of the NGS
protocol (compared to Sanger sequencing) is likely also a consequence of the greater sensitivity of these sequencing platforms. This difference was evidenced by the fact that, in the initial experiment, 16 of 20 specimens which failed to generate a 164bp sequence via Sanger analysis, generated sequence reads for the same region with NGS. Subsequent experiments comparing Sanger sequencing to the NGS method showed a 5-20 fold increase in the number of recovered barcode sequences using the NGS method (Table 1). Furthermore, while increased sample age has a strong negative affect on barcode recovery via Sanger analysis, the NGS method recovers long barcodes regardless of age (Figure 6). The results show that it was possible to recover a full-length COI barcode with NGS from specimens that failed with Sanger analysis.
Table 1 ¨ Direct comparison of Sanger and NGS method on various taxonomic groups. This table compares the results of analyzing the same DNA using the best available Sanger sequencing method and the NGS method. The results of each experiment (experiment numbers correspond to those in Table 1) show that the NGS method yields significantly more and longer barcode sequences than the Sanger method, and in two cases is the only method that could produce barcode sequences.
Min Seq Max Seq Mean Seq o .5' Recovered Mode Seq Success s=1 = 0 Length Length Length ct j/D"- Seqs Length (bp) Rate . H (bp) (bp) (bp)
To further avoid primer incorporation into the middle of sequence reads, certain reverse primers from PCR1 are omitted so that overlap amplicons cannot form, however, this reduces the amount of amplicons available as templates for PCR2 which leads to a loss of the amplification redundancy. Thus unwanted elongation by the polymerase is blocked in PCR1.
This was effected by the use of non-complementary tails on the PCR1 primers, however, any agent that blocks elongation (i.e. chemical conjugation) from the 5' end of the primers can be used.
For NGS, the primers used in the PCR2 are tailed with adapter sequences and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. The superior performance of NGS in sequence recovery is likely due, in part, to the developed multiplex nature of the PCR reactions which allowed high primer redundancy. As a result, each DNA
extract processed with the current protocol was exposed to amplification by 2010 primers versus the approximately 20 primers used in an analogous Sanger analysis. The high diversity of primers undoubtedly meant that there was a greater chance of achieving the primer-template homology necessary for successful amplification. The higher success of the NGS
protocol (compared to Sanger sequencing) is likely also a consequence of the greater sensitivity of these sequencing platforms. This difference was evidenced by the fact that, in the initial experiment, 16 of 20 specimens which failed to generate a 164bp sequence via Sanger analysis, generated sequence reads for the same region with NGS. Subsequent experiments comparing Sanger sequencing to the NGS method showed a 5-20 fold increase in the number of recovered barcode sequences using the NGS method (Table 1). Furthermore, while increased sample age has a strong negative affect on barcode recovery via Sanger analysis, the NGS method recovers long barcodes regardless of age (Figure 6). The results show that it was possible to recover a full-length COI barcode with NGS from specimens that failed with Sanger analysis.
Table 1 ¨ Direct comparison of Sanger and NGS method on various taxonomic groups. This table compares the results of analyzing the same DNA using the best available Sanger sequencing method and the NGS method. The results of each experiment (experiment numbers correspond to those in Table 1) show that the NGS method yields significantly more and longer barcode sequences than the Sanger method, and in two cases is the only method that could produce barcode sequences.
Min Seq Max Seq Mean Seq o .5' Recovered Mode Seq Success s=1 = 0 Length Length Length ct j/D"- Seqs Length (bp) Rate . H (bp) (bp) (bp)
- 23 -;.. H H H ;.. H ;.. H ;..
H
0 w 0 w 0 w 0 w 0 w 0 w tt) c./D to c./D to c./) to c./D ari c/D to c./D
ct ct ct ct ct ct c/D Z c/D Z c/D Z c/D Z c/D Z c/D Z
3 Lepidoptera 95 10 74 84 39 407 658 158 342 164 279 11% 78%
7 Coleoptera 846 110 568 86 35 658 658 193 334 164 658 13% 67%
8 Arachnids 94 7 89 164 95 480 658 299 495 307 658 7% 95%
9 Arachnids 190 0 164 0 52 0 658 0 426 0 658 0% 86%
iles/
Rept 95 1 21 166 55 166 371 166 150 166 None 1% 22%
amphibians 13 Mammals 95 0 23 0 56 0 658 0 239 0 None 0% 24%
Shokralla eta! [20,21] used NGS to recover full-length barcodes from freshly collected specimens of Lepidoptera with a single primer pair. However, the present novel method now demonstrates that NGS can regularly recover complete or near-complete barcodes from century-old specimens with heavily degraded DNA. Moreover, because it requires little template DNA, much of each DNA extract remains for future analysis. Although analytical costs were approximately $10 CAD a specimen, a 4-fold increase in the number of specimens processed in each run is feasible with a move to a NGS platform generating more reads, resulting in an estimated cost of less than $3 per sample.
While initially only applied to 10 samples simultaneously, the NGS method can be applied to 96 samples simultaneously without decreasing sequence recovery (Figure 8).
Additionally, subsequent experiments have demonstrated the method to be successful for over 400 different families of animals, covering several different phyla (Table 2).
Samples fixed in formalin and preserved in ethanol were also successfully analyzed, from all major groups examined to date: spiders, freshwater insects, molluscs, crustaceans, reptiles, and mammals (Table 2). DNA barcodes have also been successfully recovered from forensic specimens and samples of heavily processed materials confiscated by wildlife enforcement officers (data not shown).
tµ.) Table 2 - This table lists each experiment used to develop, optimize, and enhance the NGS method. The purpose of each experiment is included, as o ,-, -.I
well as information on the samples employed for each experiment. The type of sequencing used for each experiment and overall success rates are o tµ.) listed. The degree of DNA damage was estimated based on the ease of which barcodes could be amplified using Sanger sequencing methods.
No. No.
vi Preservation Cause of Degree of Sequencing Exp Samples Purpose Sample Famili Success Method DNA Damage DNA
Damage method s es Rate Lepidoptera Low, medium, Sanger, 1 Initial test 30 1 Dry Age 100%
(types) high NGS
Lepidoptera 2 Test high throughput 95 8 Dry Age High NGS 100%
(types) Compare performance to Sanger, 3 Lepidoptera 94 20 Dry Unknown High 78%
Sanger NGS
Mixed Sanger, P
4 Primer test on other taxa 376 371 Dry None None 99% 0 Arthropods th NGS o N) Freshwater Sanger, u, Primer test on other taxa 94 85 Fluid None None 96 /0 u, invertebrates NGS t.) 6 Vertebrates Primer test on other taxa 95 93 Fluid None None Sanger, 98%
NGS
, Large scale primer test on Sanger, , 7 Coleoptera 846 70 Dry Age, unknown High 67%
problematic group NGS
Test ethanol preserved Sanger, 8 Arachnids 94 14 Fluid Age High 95%
specimens NGS
Test formalin fixed Formalin 9 Arachnids 190 20 Fluid High NGS 86%
specimens fixation Reptiles/am Test formalin fixed Formalin High Sanger, 95 12 Fluid 22%
phibians specimens fixation NGS Iv EPT's, Test formalin fixed unkno Formalin n Diptera specimens wn Fluid fixation High NGS 47% n Molluscs, Test formalin fixed Formalin 12 95 66 Fluid High NGS 19% tµ.) o crustaceans specimens fixation c:
Test formalin fixed Formalin Sanger, 13 Mammals 95 15 Fluid High
H
0 w 0 w 0 w 0 w 0 w 0 w tt) c./D to c./D to c./) to c./D ari c/D to c./D
ct ct ct ct ct ct c/D Z c/D Z c/D Z c/D Z c/D Z c/D Z
3 Lepidoptera 95 10 74 84 39 407 658 158 342 164 279 11% 78%
7 Coleoptera 846 110 568 86 35 658 658 193 334 164 658 13% 67%
8 Arachnids 94 7 89 164 95 480 658 299 495 307 658 7% 95%
9 Arachnids 190 0 164 0 52 0 658 0 426 0 658 0% 86%
iles/
Rept 95 1 21 166 55 166 371 166 150 166 None 1% 22%
amphibians 13 Mammals 95 0 23 0 56 0 658 0 239 0 None 0% 24%
Shokralla eta! [20,21] used NGS to recover full-length barcodes from freshly collected specimens of Lepidoptera with a single primer pair. However, the present novel method now demonstrates that NGS can regularly recover complete or near-complete barcodes from century-old specimens with heavily degraded DNA. Moreover, because it requires little template DNA, much of each DNA extract remains for future analysis. Although analytical costs were approximately $10 CAD a specimen, a 4-fold increase in the number of specimens processed in each run is feasible with a move to a NGS platform generating more reads, resulting in an estimated cost of less than $3 per sample.
While initially only applied to 10 samples simultaneously, the NGS method can be applied to 96 samples simultaneously without decreasing sequence recovery (Figure 8).
Additionally, subsequent experiments have demonstrated the method to be successful for over 400 different families of animals, covering several different phyla (Table 2).
Samples fixed in formalin and preserved in ethanol were also successfully analyzed, from all major groups examined to date: spiders, freshwater insects, molluscs, crustaceans, reptiles, and mammals (Table 2). DNA barcodes have also been successfully recovered from forensic specimens and samples of heavily processed materials confiscated by wildlife enforcement officers (data not shown).
tµ.) Table 2 - This table lists each experiment used to develop, optimize, and enhance the NGS method. The purpose of each experiment is included, as o ,-, -.I
well as information on the samples employed for each experiment. The type of sequencing used for each experiment and overall success rates are o tµ.) listed. The degree of DNA damage was estimated based on the ease of which barcodes could be amplified using Sanger sequencing methods.
No. No.
vi Preservation Cause of Degree of Sequencing Exp Samples Purpose Sample Famili Success Method DNA Damage DNA
Damage method s es Rate Lepidoptera Low, medium, Sanger, 1 Initial test 30 1 Dry Age 100%
(types) high NGS
Lepidoptera 2 Test high throughput 95 8 Dry Age High NGS 100%
(types) Compare performance to Sanger, 3 Lepidoptera 94 20 Dry Unknown High 78%
Sanger NGS
Mixed Sanger, P
4 Primer test on other taxa 376 371 Dry None None 99% 0 Arthropods th NGS o N) Freshwater Sanger, u, Primer test on other taxa 94 85 Fluid None None 96 /0 u, invertebrates NGS t.) 6 Vertebrates Primer test on other taxa 95 93 Fluid None None Sanger, 98%
NGS
, Large scale primer test on Sanger, , 7 Coleoptera 846 70 Dry Age, unknown High 67%
problematic group NGS
Test ethanol preserved Sanger, 8 Arachnids 94 14 Fluid Age High 95%
specimens NGS
Test formalin fixed Formalin 9 Arachnids 190 20 Fluid High NGS 86%
specimens fixation Reptiles/am Test formalin fixed Formalin High Sanger, 95 12 Fluid 22%
phibians specimens fixation NGS Iv EPT's, Test formalin fixed unkno Formalin n Diptera specimens wn Fluid fixation High NGS 47% n Molluscs, Test formalin fixed Formalin 12 95 66 Fluid High NGS 19% tµ.) o crustaceans specimens fixation c:
Test formalin fixed Formalin Sanger, 13 Mammals 95 15 Fluid High
24% vi specimens fixation NGS =
o
o
- 25 -Furthermore, new primer sets can be developed, a task facilitated by the well-parameterized barcode reference library for the animal kingdom, and subsequent experiments have demonstrated that a primer set designed for vertebrates provides increased barcode amplification in comparison with the standard primers outlined here (Figure 9). Indeed the method of the invention can be used to amplify and sequence any desired degraded DNA as primers can be tailored for any given sequence. Past research has employed NGS
to sequence genomes, but this study has demonstrated its value in probing sequence diversity in single gene regions when combined with two step multiplex PCR as described herein. A large-scale program to sequence type specimens would represent a major advance in stabilizing and validating the application of scientific names. As well, because many type specimens derive from developing nations, it would represent an important step in the repatriation of knowledge that will aid these nations in managing their biodiversity by enabling DNA-powered identification systems, a major advance in settings where the scientific workforce is small and biodiversity is high.
In a specific aspect, the method described herein involves the following protocol:
1) PCR1 a. Two separate reactions, one with primers Fl, F3, F5 + R1-R6 (PCR1a), the other with F2, F4, F6 + R2-R6 (PCR1b). Separate reaction are necessary to prevent non-target amplification. Primers are tailed with short non-complimentary sequence to prevent another form of non-target amplification.
2) PCR2 a. Six separate reactions, one for each forward primer plus the next three downstream reverse primers (or next two or next one, if only two or one downstream reverse primer exists).This reaction uses PCR1 product as template (PCRla product for PCR2 F 1,F3,F5 reactions, and PCR1b for PCR2 F2, F4, F6 reactions). PCR2 forward and reverse primers contain MID tags to associate amplicons with individual specimens, so that multiple specimens can be sequenced simultaneously.
3) PCR purification a. Following PCR2, all six reactions (or all 6 plates of reactions in the case of the 95-plex version) are pooled. We can do this because the MID tags will allow us to re-associate the resulting sequence reads with their original sample. An aliquot of the pooled reactions is purified for sequencing.
4) Sequencing
to sequence genomes, but this study has demonstrated its value in probing sequence diversity in single gene regions when combined with two step multiplex PCR as described herein. A large-scale program to sequence type specimens would represent a major advance in stabilizing and validating the application of scientific names. As well, because many type specimens derive from developing nations, it would represent an important step in the repatriation of knowledge that will aid these nations in managing their biodiversity by enabling DNA-powered identification systems, a major advance in settings where the scientific workforce is small and biodiversity is high.
In a specific aspect, the method described herein involves the following protocol:
1) PCR1 a. Two separate reactions, one with primers Fl, F3, F5 + R1-R6 (PCR1a), the other with F2, F4, F6 + R2-R6 (PCR1b). Separate reaction are necessary to prevent non-target amplification. Primers are tailed with short non-complimentary sequence to prevent another form of non-target amplification.
2) PCR2 a. Six separate reactions, one for each forward primer plus the next three downstream reverse primers (or next two or next one, if only two or one downstream reverse primer exists).This reaction uses PCR1 product as template (PCRla product for PCR2 F 1,F3,F5 reactions, and PCR1b for PCR2 F2, F4, F6 reactions). PCR2 forward and reverse primers contain MID tags to associate amplicons with individual specimens, so that multiple specimens can be sequenced simultaneously.
3) PCR purification a. Following PCR2, all six reactions (or all 6 plates of reactions in the case of the 95-plex version) are pooled. We can do this because the MID tags will allow us to re-associate the resulting sequence reads with their original sample. An aliquot of the pooled reactions is purified for sequencing.
4) Sequencing
- 26 -a. Purified products are quantified and diluted to sequencing manufacture's recommendation. The diluted product is then sequenced following manufacturer's instructions.
5) Data Analysis a. The resulting sequence reads are de-multiplexed via the MID tags and split into separate datasets, one for each specimen (typically 95 datasets) b. Each dataset is processed through a bioinformatics pipeline that trims off primer, MID, and adapter sequences and filters out low quality reads. The filtered reads, which ideally overlap with one another, are then arranged into a contiguous sequence, which ideally will be a full-length barcode. Formation of the contig can involve alignment of reads to a reference, but can in theory also be de novo (i.e. no reference sequence involved).
Examples Materials and Methods The following pertains specifically to the initial experiment, the purpose of which was to develop and optimize the NGS method. Subsequent experiments contained minor modifications, such as the use of additional MID tags in the primer sequences to increase throughput, or the use of different taxa and associated primers, but the overall design and principal of the protocol remained the same.
Type Specimens Tissue samples were obtained from 1820 specimens (mostly primary types but some were equally important non-types) of Geometridae (Lepidoptera) from the Natural History Museum (London) as part of a project to develop a strongly validated taxonomic system to support species inventories and studies of host plant use in Papua New Guinea [22,23].
Genitalic dissections of these specimens generated residual tissue that was held frozen until its use in the present study.
DNA Extraction All tissue samples were processed in an isolated 'clean' laboratory at the Canadian Centre for DNA Barcoding (CCDB; www.ccdb.ca) with dedicated reagents, supplies and protective clothing. Each sample was incubated overnight in lysis buffer, following a modified
5) Data Analysis a. The resulting sequence reads are de-multiplexed via the MID tags and split into separate datasets, one for each specimen (typically 95 datasets) b. Each dataset is processed through a bioinformatics pipeline that trims off primer, MID, and adapter sequences and filters out low quality reads. The filtered reads, which ideally overlap with one another, are then arranged into a contiguous sequence, which ideally will be a full-length barcode. Formation of the contig can involve alignment of reads to a reference, but can in theory also be de novo (i.e. no reference sequence involved).
Examples Materials and Methods The following pertains specifically to the initial experiment, the purpose of which was to develop and optimize the NGS method. Subsequent experiments contained minor modifications, such as the use of additional MID tags in the primer sequences to increase throughput, or the use of different taxa and associated primers, but the overall design and principal of the protocol remained the same.
Type Specimens Tissue samples were obtained from 1820 specimens (mostly primary types but some were equally important non-types) of Geometridae (Lepidoptera) from the Natural History Museum (London) as part of a project to develop a strongly validated taxonomic system to support species inventories and studies of host plant use in Papua New Guinea [22,23].
Genitalic dissections of these specimens generated residual tissue that was held frozen until its use in the present study.
DNA Extraction All tissue samples were processed in an isolated 'clean' laboratory at the Canadian Centre for DNA Barcoding (CCDB; www.ccdb.ca) with dedicated reagents, supplies and protective clothing. Each sample was incubated overnight in lysis buffer, following a modified
- 27 -protocol of Knolke et al [24], before DNA was extracted using a silica membrane-based method in either single columns or 96-well plate format [25]. To maximize the concentration of extracted DNA, elution from each silica membrane was performed with 30 [tI, of pre-warmed (to 56 C) 10mM Tris HC1.
Sanger Sequencing Since DNA quality varies greatly, even among specimens of similar age [2,8], each DNA extract was initially assessed by Sanger analysis. This involved an attempt to amplify both 164bp (C microLepF 1 tl + C TypeR1) and 94bp (C TypeF1 + C TypeR1) regions of the COI barcode [2,9]. PCR amplification and cycle sequencing employed standard CCDB
protocols [2,25,26] with amplicons bidirectionally sequenced on an ABI 3730XL
(Applied Biosystems). All traces were edited using CodonCode v. 4.2.7 (CodonCode Corporation) and the resulting 164bp and 94bp sequences were validated by comparison with sequences from conspecific individuals or, when they were unavailable, by Neighbor-Joining (NJ) analysis to ensure that each sequence branched as expected. These tests for sequence recovery permitted the assignment of DNA from each specimen to one of three categories: 1) High Quality (HQ) ¨ those that generated a 164bp sequence; 2) Medium Quality (MQ) ¨ those that generated a 94bp sequence; and 3) Low Quality (LQ) ¨ those that failed to generate any sequence. The present study examined ten specimens from each category with the goal of developing a NGS
protocol effective across varying levels of DNA degradation. The specimens (Table 3) selected for analysis included a single representative from each of 30 genera in the family Geometridae, all more than a century old (mean age = 111 years). Sequences, electropherograms and primer details for the specimens are on BOLD (dx.doi.org/10.5883/DS-NGSTYPES) and GenBank (see Table 3 for accession numbers).
tµ.) Table 3. Type specimens analyzed, including sequencing results and accession numbers c:
,-, -.1 o NGS
Sequence tµ.) Process ID No.
Sanger Age Sanger Min. Max. Avg.
Recovered Contig Read --4 (Sanger! Identification Status NGS
Genbank vi (Yrs) Group Coy. Coy. Coy. bp by NGS GenBank Archive NGS) Reads Acc.
Acc.
Acc.
Myrioblephara / PNGTY1837- 104 Syntype HQ 143804 7 115751 29924 658 pending pending SRR1867808 mixticolor I Cassephyra P
112 Holotype HQ 213007 72 146189 42992 658 pending pending SRR1867811 PNGTY1827- plenimargo o N) .
.
, I Psilalcis 109 Syntype HQ 106286 0 74012 20477 448 pending pending SRR1867812 ,I, PNGTY1843- auropurpurea , , , I Paralcidia 110 Syntype HQ 221885 5 168474 44541 657 pending pending SRR1867813 PNGTY1839- marginata Iv I Atmoceras n 110 Syntype HQ 143340 30 76376 28215 658 pending pending SRR1867814 1-3 PNGTY1823- plumosa n tµ...) 'a tA
o o --.) o 14 / Tripteridia 110 viridisecta Syntype HQ 188107 1 101191 37855 570 pending pending SRR1867815 14 / Gymnoscelis 111 PNGTY1834- ochriplaga Holotype HQ 186897 1 103399 38169 657 pending pending SRR1867816 14 / Axinoptera 110 fiata Holotype HQ 166116 0 83389 31838 474 pending pending SRR1867817 asc 14 / Calluga 112 semirasata Holotype HQ 215946 1 154302 43408 658 pending pending SRR1867818 123 Eois semirubra Holotype HQ 232024 106 143908 46803 658 pending pending SRR1867819 / PNGTY1831- 112 Collix ghoshaN/A MQ 11665 0 11082 2142 459 pending pending SRR1945335 dichobathra / PNGTY1838- 111 PapuarismeHolotype MQ 6479 0 5747 1165 569 pending pending SRR1945382 brunneata n.) o 1--, / PNGTY1835- 109 HyposidraN/A MQ 62208 0 45441 12301 570 pending pending SRR1945383 apiciftdva un / PNGTY1836- 101 Milionia know/el Syntype MQ 44190 1 43301 9153 658 pending pending SRR1945384 PNGTY587-13 Ctimene / PNGTY1846- 118 basistraga Syntype MQ 546 0 105 31 323 pending pending SRR1946575 15 obsoleta PNGTY639-13 Psendensemia P
/ PNGTY1842- 102 bursadoides Syntype MQ 134542 37 82031 27382 658 pending pending SRR1945385 , .
.
15 dignitosa co Lo N) u, PNGTY917-13 0"
Pingasa nob//is / PNGTY1840- 108 Holotype MQ 46793 6 24276 9516 658 pending pending SRR1945386 furvifrons , , , Aeolochroma / PNGTY1821- 121 Holotype MQ 68837 3 43442 14002 658 pending pending SRR1945387 caesia Sarcinodes / PNGTY1844- 106 Holotype MQ 99655 86 42405 20379 658 pending pending SRR1945388 subvirgata Iv n Celerena /erne n / PNGTY1828- 105 Holotype MQ 113363 1 42333 21657 569 pending -- pending -- SRR1945389 tµ...) amplimargo 'a un o o o C
n.) o / PNGTY1832- 104 Dyscheralcis Syntype LQ 2681 0 1278 424 514 N/A pending SRR1867935 retroflexa un 110 Alcis irrufata Holotype LQ 49944 2 37401 8881 657 N/A pending SRR1867936 14 / Cleora repetita Syntype LQ 7632 1 5708 731 454 N/A pending SRR1867937 PNGTY1830- suffusa P
.
.
PNGTY008- Spectrobasis .
N) 109 Syntype LQ 1468 0 1157 280 237 N/A N/A SRR1867938 u, 12* / N/A difjerens t.,..) N) .
PNGTY073- Desmoclysha 104 Holotype LQ 320 0 116 40 357 , 12* / N/A umpuncta , , PNGTY102- Sterrhochaeta 120 Syntype LQ 3081 0 215 91 324 12* / N/A minuta PNGTY120- Propithex 118 Holotype LQ 14863 0 1554 163 323 12* / N/A alternata Iv / PNGTY1825- 105 Bursadopsis Syntype LQ 133263 4 71444 20742 634 N/A pending SRR1867942 n plenifascia n t...., 'a u, =
=
14 / Chloroclystis 117 Syntype LQ 4411 0 1685 647 419 N/A pending SRR1867943 PNGTY1829- rufofasciata Polyacme 112 straminea Holotype LQ 141402 23 71197 28117 658 N/A pending SRR1867944 brunneata The four Process ID's marked with an asterisk (*) represent specimens where NGS analysis generated sequence reads from multiple species. HQ ¨
high quality; MQ ¨ medium quality; LQ ¨ low quality; N/A ¨ not applicable.
1-d c7, Next-generation Sequencing DNA degradation often limits PCR amplicons to <200bp in specimens that are more than 50 years old [18], precluding efforts to recover the entire barcode region with one or two primer sets. As a consequence, primer sets were designed to amplify fragments ranging in length from 120bp to 148bp with enough overlap to permit recovery of the 658bp barcode region. These primers needed to be tailed with adapter sequences for analysis on an Ion Torrent PGM (Life Technologies) and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. Ten sets of MID-tagged primers, each consisting of six forward and six reverse primers, were employed to analyze ten type specimens per NGS run (Table 4).
Table 4. Primers used in the first (PCR1) and second (PCR2) reactions to allow the analysis of 10 specimens in an Ion Torrent PGM run.
PCR Code Primer Name Sequence (5'-3') MID
Adapter ATTCAACCAATCATA
Fl LepF 1-Sanger-ion 1 None None AAGATATTGG
AT TRRWRA TGATCAA
F2 AncientLepF2-Sanger-ion 1 None None RTWTATAAT
TTATAATTGGDGGRT
F3 AncientLepF3 -Sanger-ion 1 None None AGWAGWATWRTWR
F4 AncientLepF4-Sanger-ion 1 None None AWAVVVGG
ATTTTTWSWCTWCA
F5 AncientLepF5 -Sanger-ion 1 None None TWTDGCWGG
TATTTGTWTGAKCW
F6 AncientLepF6-Sanger-ion 1 None None RTWKKWATTAC
WGGTATWACTATRA
R1 AncientLepR 1 -Sanger-ion 1 None None ARAAAAT TA T
TCARAAWCTWATRT
R2 AncientLepR2-Sanger-ion2 None None TRTTTADWCG
ARDGGDGGRTAWAC
R3 AncientLepR3-Sanger-ion3 None None WGTTCAWCC
GTWGWAATRAARTT
R4 AncientLepR4-Sanger-ion4 None None DATWGCWCC
GTTARWARTATDGT
R5 AncientLepR5-Sanger-ion5 None None RATDGCWCC
TAAACTTCTGGATGT
R6 LepR1-Sanger-ion6 None None CCAAAAAATCA
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF 1-ion 1 CTCAG AT TCAAC CAA A
ss 1 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion 1 CTCAG AT TRRWRAT G A
ss 1 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3 -ion 1 CTCAG TTATAATTGG A
ss 1 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion 1 CTCAG AGWAGWATW A
ss 1 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5 -ion 1 CTCAG ATTTTTWSWC A
ss 1 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion 1 CTCAG TATTTGTWTG A
ss 1 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
= IonXpre PCR2 Fl LepF 1 -ion2 CTCAG ATTCAACCAA A
ss2 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion2 CTCAG ATTRRWRAT A
ss2 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion2 CTCAG TTATAATTGG A
ss2 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion2 CTCAG AGWAGWAT A
ss2 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion2 CTCAG ATTTTTWSWC A
ss2 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion2 CTCAG TATTTGTWTG A
ss2 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion3 CTCAG ATTCAACCAA A
ss3 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion3 CTCAG ATTRRWRATG A
ss3 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F3 AncientLepF3-ion3 CTCAG TTATAATTGG A
ss3 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion3 CTCAG AGWAGWATW A
ss3 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion3 CTCAG ATTTTTWSWC A
ss3 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion3 CTCAG TATTTGTWTG A
ss3 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion4 CTCAG ATTCAACCAA A
ss4 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion4 CTCAG ATTRRWRATG A
ss4 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion4 CTCAG TTATAATTGG A
ss4 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion4 CTCAG AGWAGWATW A
ss4 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion4 CTCAG ATTTTTWSWC A
ss4 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion4 CTCAG TATTTGTWTG A
ss4 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion5 CTCAG ATTCAACCAA A
ss5 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F2 AncientLepF2-ion5 CTCAG ATTRRWRAT A
ss5 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion5 CTCAG TTATAATTGG A
ss5 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion5 CTCAG AGWAGWAT A
ss5 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion5 CTCAG ATTTTTWSW A
ss5 CTWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion5 CTCAG TATTTGTWTG A
ss5 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion6 CTCAG ATTCAACCAA A
ss6 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion6 CTCAG ATTRRWRATG A
ss6 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion6 CTCAG TTATAATTGG A
ss6 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion6 CTCAG AGWAGWATW A
ss6 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion6 CTCAG ATTTTTWSWC A
ss6 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion6 CTCAG TATTTGTWTG A
ss6 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
= IonXpre PCR2 Fl LepF I -ion7 CTCAG ATTCAACCAA A
ss7 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion7 CTCAG ATTRRWRATG A
ss7 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion7 CTCAG TTATAATTGG A
ss7 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion7 CTCAG AGWAGWATW A
ss7 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion7 CTCAG ATTTTTWSWC A
ss7 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion7 CTCAG TATTTGTWTG A
ss7 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion8 CTCAG ATTCAACCAA A
ss 8 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion8 CTCAG ATTRRWRATG A
ss 8 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F3 AncientLepF3-ion8 CTCAG TTATAATTGG A
ss 8 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion8 CTCAG AGWAGWATW A
ss 8 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion8 CTCAG ATTTTTWSWC A
ss 8 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion8 CTCAG TATTTGTWTG A
ss8 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF 1 -ion9 CTCAG ATTCAACCAA A
ss9 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion9 CTCAG ATTRRWRAT A
ss9 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion9 CTCAG TTATAATTGG A
ss9 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion9 CTCAG AGWAGWAT A
ss9 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion9 CTCAG ATTTTTWSW A
ss9 CTWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion9 CTCAG TATTTGTWTG A
ss9 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepFl-ion10 CTCAG ATTCAACCAA A
ss10 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F2 AncientLepF2-ion10 CTCAG ATTRRWRATG A
ss10 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion10 CTCAG TTATAATTGG A
ss10 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion10 CTCAG AGWAGWATW A
ss10 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion10 CTCAG ATTTTTWSWC A
ss10 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion10 CTCAG TATTTGTWTG A
ss10 AKCWRTWKKWATTAC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R1 AncientLepRl-ionl-trP 1 WGGTATWACTATRAAR trP 1 ss 1 AAAATTAT
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R2 AncientLepR2-ion2-trP 1 TCARAAWCTWATRTTR trP 1 ss2 TTTADWCG
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R3 AncientLepR3-ion3-trP 1 ARDGGDGGRTAWACWG trP 1 ss3 TTCAWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R4 AncientLepR4-ion4-trP1 GTWGWAATRAARTTDA trP1 ss4 TWGCWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R5 AncientLepR5-ion5-trP 1 GTTARWARTATDGTRAT trP1 ss5 DGCWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R6 LepRl-i on6-trP1 TAAACTTCTGGATGTCC trP1 ss6 AAAAAATCA
The "Code" column refers to primer labels in Fig. 2. The COI binding region within each primer sequence is shown in black, while the 10bp tail (PCR1) or MID tag (PCR2) is shown in blue.
The "key sequence" (required for Ion Torrent sequencing) is shown in green and the sequencing adapters are shown in red. The 10bp tails on the PCR1 primers are technically IonXpress MID
tags, but they serve only to block short amplicons from acting as primers during PCR1. They were chosen over random decamer tails to maximize primer-template matching in PCR2. The same forward and reverse PCR1 primers are used for all ten samples in the first round of PCR. In the second round of PCR, samples are discriminated by using ten different sets of MID-tagged forward PCR2 primers (the same set of PCR2 reverse primers is used for all ten samples).
Optimization of NGS Protocols Optimization studies tested the impact of varied primer combinations, number of PCR
cycles, differential concentrations of primers and nesting of PCRs. Efforts to multiplex all six forward and reverse primers in a single reaction were unsuccessful because the small regions of overlap were preferentially amplified over the six target fragments. Splitting the PCR into two reactions, each targeting non-adjacent fragments (e.g. PCR1 = F 1+R1, F3+R3, F5+R5; PCR2 =
F2+R2, F4+R4, F6+R6), solved this issue, but revealed another problem: the dominance of certain amplicons. This problem was overcome by mixing the forward primers with the full complement of reverse primers (e.g. PCR1 = F 1+F3+F5 + six reverse primers; PCR2 =
F2+F4+F6 + 5 reverse primers). This allowed each forward primer to potentially pair with several downstream reverse primers, creating redundancy that improved sequence recovery while reducing the dominance of any particular amplicon. For example, depending upon the quality of the template DNA, the barcode segment amplified by primers F4+R4 could be amplified by any of the twelve combinations of Fl, F2, F3 or F4 paired with R4, RS or R6. This redundancy aided the recovery of full-length barcodes from specimens with varied degrees of DNA degradation or with particular primer mismatches (as evidenced by the lack of a certain product in the final sequence array).
When DNA quality is poor, primer binding becomes increasingly important to "kick start"
amplification [26]. Perfect primer binding is impossible when diverse taxa are analyzed, but the prospects for recovery of desired amplicons can be improved by raising the number of PCR cycles and by increasing the primer degeneracy. Both tactics were employed in the present NGS protocol.
Two rounds of PCR were employed, with 60 cycles in the first and 40 cycles in the second.
All forward and reverse primers included degeneracy at the sites most important for primer binding (3' terminus for F, 5' terminus for R). Considering this degeneracy, the 12 forward and reverse primers were actually a cocktail of 2010 primers. Other factors were found to have important impacts on final outcomes. For example, initial tests revealed that primers with the 33bp-40bp adapter/MID tails required for NGS were less effective in generating product than the same primers without tails, a difference that was particularly strong for LQ extracts. This difference was probably due to interference with primer binding caused by the formation of secondary structures in the primers with tails. Although primers without tails produced the highest amplification success, their use allowed short, non-target amplicons to act as primers generating chimeric amplicons which combined sequence information from primers and the specimen.
To overcome this problem, 10bp tails lacking complementarity to any region in the target genomes were added to the 5' terminus of all primers. Their presence inhibited polymerase elongation when short amplicons or primer dimers attempted to act as primers, preventing the formation of chimeric amplicons while avoiding the secondary structure issues inherent with longer tails. Although the first round of PCR was effective in generating amplicons, a second round of PCR was used to introduce the adapter-tailed primers for sequence analysis. It likely had the additional benefit of reducing amplification bias because it involved six separate reactions, one for each forward primer, dampening amplification bias by limiting primer competition.
Final NGS Protocol These experimental studies led to the development of a two-stage, nested, multiplex PCR
protocol which produced sequence records spanning the barcode region. The first round of PCR
included two reactions for each specimen (PCR 1.1 and PCR 1.2 in Fig. 2a), each consuming 21..t.L
of genomic DNA as template. Each reaction included three forward primers (F
1+F3+F5 or F2+F4+F6) with six and five reverse primers respectively, allowing each forward primer to generate from 1-6 amplicons, depending on the quality of DNA and its binding position in relation to the reverse primers. Detailed reaction components (final volume = 12.5 [tL) are provided in Table 5. Thermocycling consisted of 94 C for 2 minutes, 60 cycles of {94 C for 40 seconds, 48 C
for 40 seconds, 72 C for 30 seconds}, and a final extension of 72 C for 5 minutes.
Table 5. Components of PCR reactions in the NGS protocol.
PCR 1.1 PCR 1.2 PCR 2.1, 2.2' PCR 2.5 PCR 2.6 10% Trehalose 5.125 [IL 5.25 [IL 5.75 [IL 5.875 [IL 6.0 [IL
H20 0.13 [IL 0.13 [IL 0.13 [IL 0.13 [IL 0.13 [IL
5X Buffer 2.5 [IL 2.5 [IL 2.5 [IL 2.5 [IL 2.5 [IL
25 mM MgCl2 1.25 [IL 1.25 [IL 1.25 [IL 1.25 [IL 1.25 [IL
[IM primers 0.125 [IL each 0.125 [IL each 0.125 [IL each 0.125 [IL each 0.125 [IL each 10 [IM dNTP 0.0625 [IL 0.0625 [IL 0.0625 [IL 0.0625 [IL
0.0625 [IL
Taq (5U1 [IL) 0.06 [IL 0.06 [IL 0.06 [IL 0.06 [IL 0.06 [IL
Template 2 [IL 2 [IL 2 [IL 2 [IL 2 [IL
TOTAL 12.5 [IL 12.5 [IL 12.5 [IL 12.5 [IL 12.5 [IL
Reactions differ only in the number of primers and the amount of trehalose.
Trehalose sourced from Fluka Analytical; Hyclone ultra-pure water from Thermo Fisher Scientific;
Buffer, MgCl2, and Taq polymerase from KAPA Biosystems; primers from Integrated DNA
Technologies.
In Figure 2 the primer positions for the first and second rounds of PCR (a) and all possible final amplicons (b) is shown. The initial round of PCR includes two separate reactions (a - above broken line) using 10bp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR includes six separate reactions (a -below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID tags unique to that specimen. To assign each amplicon to a particular reaction (i.e. 2.1, 2.2, 2.3, etc.), each reverse PCR2 primer is tailed with a MID tag unique for each reaction in the second round of PCR.
The second round of PCR used product from the first PCR reactions as template and included six reactions per specimen (PCR 2.1-2.6 in Fig. 2a), each coupling a single forward primer with one to three reverse primers and using 24, of the appropriate primary PCR product as template. It boosted amplicon yields while also adding the required sequencing adapters. Each secondary PCR generated 1-3 amplicons which collectively spanned the COI
barcode region (Fig.
2b). The first four PCRs (2.1-2.4 in Fig. 2a) contained forward primers F1-F4, each combined with the three immediately downstream reverse primers (e.g. F1+R1+R2+R3). The fifth PCR (2.5 in Fig. 2a) combined F5 with R5 and R6, while the sixth PCR (2.6 in Fig. 2a) combined F6 with R6. All of these reactions employed primers with adapter tails and MID tags to enable NGS to discriminate fragments and/or individuals in post processing. Detailed reaction components (final volume = 12.5 L) are provided in Table 5. Thermocycling consisted of 94 C for 2 minutes, 40 cycles of {94 C for 40 seconds, 48 C for 40 seconds, 72 C for 30 seconds}, and a final extension of 72 C for 5 minutes.
The secondary PCR products from each specimen (six reactions) were pooled and a double size selection protocol (PCRClean DX kit ¨ Aline Biosciences) was employed to remove genomic DNA, primer dimers and residual primers. The first cleanup step was designed to remove any high molecular weight genomic DNA (>800bp) that might reflect recent contamination (e.g. human DNA from researchers working with the specimens). Briefly, the PCR product and magnetic beads were incubated in a 2:1 ratio (volume PCR product: volume beads) for 8 minutes at room temperature followed by 2 minutes on a magnet. The pellet of beads was discarded, while the supernatant was retained for the second cleanup step which was designed to bind molecular weights ranging from 250bp-700bp (i.e. the PCR products) onto beads, while lower molecular weight DNA (primer dimers, residual primers) remained in solution. This step was carried out by mixing enough beads and sterile water to generate a 5:4 ratio (PCR product:
beads) and incubated for 8 minutes followed by two minutes on a magnet. The supernatant was discarded and the pellet of beads was washed three times with 80% ethanol before the PCR products were eluted from the beads with 36 tL of sterile water. Following cleanup, the concentration of each purified PCR
product was measured on a Qubit 2.0 spectrophotometer using the Qubit dsDNA HS
Assay Kit (Life Technologies) and all 10 samples were normalized to 1 ng/ L and mixed in equal proportions. From this mixture, the final sequencing template library was created by making a 1/300 dilution. An Ion PGM Template 0T2 400 kit (Life Technologies) was used for template preparation and sequencing was carried out on an Ion Torrent PGM following the manufacturer's instructions. Sequencing was performed on a 316 chip using an Ion PGM
Sequencing 400 Kit (Life Technologies).
Data Analysis Raw data from each Ion Torrent PGM run were uploaded to the Galaxy platform for analysis (https://usegalaxy.org/) [27]. Several filters were applied to remove low quality, short, and non-target reads before an alignment was constructed to assemble the full barcode contig.
Representative examples of the sequence reads recovered from HQ and LQ
extracts are shown in Figure 6. The resultant FASTA file was then exported to permit comparisons with Sanger-generated sequences in BOLD. The authenticity of each NGS-generated sequence was subsequently validated by querying the sequence against the BOLD
Identification Engine (www.boldsystems.org) to check for contamination or non-target amplification.
Further validation was performed via Neighbor-Joining (NJ) analysis that included the NGS-generated sequences as well as sequences from recently collected specimens of the same species or close relatives. The compiled reads from each run were deposited in the Sequence Read Archive (SRA;
http://www.ncbi.nlm.nih.gov/sra) under study accession 5RP055961 (see Table 3 for individual sample accession numbers), while the barcode contig for each specimen was deposited in the BOLD dataset (dx.doi.org/10.5883/DS-NGSTYPES) and in GenBank (see Table 3 for accession numbers).
Results Because the NGS protocol allowed the simultaneous processing often specimens, just three runs were required to analyze the 30 specimens. The average number of sequence reads per specimen showed five-fold variation (182K, 59K, 36K), while the average depth of coverage per base showed six-fold variation (36K, 12K, 6K) across the three DNA categories (Figs. 3a and 3b).
The number of reads per specimen averaged 90K, resulting in an average coverage depth of 18K
per base. Sequences were recovered from every specimen with reads averaging 610bp, 578bp, and 458bp for the HQ, MQ and LQ extracts respectively (Fig. 3c). Barcode compliant sequences (>487bp) were recovered from 8 HQ, 8 MQ, and 4 LQ specimens (Table 3), while sequence records >400bp were recovered from 25 of 30 specimens (83%). In fact, more than 200bp of sequence data was recovered from all 30 specimens (Table 3). The recovery of sequences from ten type specimens in each of three DNA categories was shown in Figure 3.
The sequences generated by NGS samples from the HQ and MQ specimens were perfectly matched in their zones of overlap to the shorter sequences generated by Sanger analysis (Fig. 4).
The protocol does involve 100 cycles of PCR amplification, but there was no evidence of artifacts when the NGS sequences were compared to their Sanger counterparts (Fig. 4).
Further confirmation of their validity was provided by the fact that they grouped with sequences from closely allied taxa (Fig. 5). It was more difficult to verify the sequences obtained via NGS from the LQ specimens because they had no Sanger counterparts for comparison. In six cases, the NGS
sequences clearly derived from a single species, but reads from the other four specimens appeared to originate from two or more species. Obvious contaminants (e.g. fungi, bacteria) were easily removed during post processing, but some sequences in these four records appeared to derive from closely allied species or pseudogenes. In principle, the contaminants and authentic sequences could be discriminated if reference sequences were available from modern specimens of these species, but they were not. Because the four specimens showing these admixtures generated the fewest sequence reads and the lowest depth of coverage, it is likely that their DNA
was heavily degraded (Table 3). Once contemporary sequences for these species become available, it should be possible to recognize the authentic sequences.
To summarize, the current method works on a plurality of samples simultaneously with high success rates for good quality degraded DNA with a slight drop for lower quality degraded DNA. The method still works for samples that may contain almost no intact DNA.
Lowest quality degraded DNA was still amplified and characterized using the method of the invention and shown to recover >500bp sequences from samples that failed using traditional Sanger approaches. The method may be used universally on any type of degraded DNA sample for many applications including environmental, forensics and food industry (cooked foods contain degraded DNA), generally in any application where DNA is degraded due to age, environment, processing and so forth. The method can be customized for invertebrates, mammals, fish, birds and so forth. In one aspect, the method effectively amplifies and characterizes entire barcode regions for use in biological classification. This will be helpful for classification of old specimens such as for example those found in museums [2-5,28], as demonstrated in two recent studies [29,30].
The invention can be provided as a system in a kit containing the desired primers, buffers, enzymes, instructions for use and so forth. A kit may be customized for a particular specimen, a specimen that would comprise degraded DNA.
It is to be noted that the term "a" or "an" entity refers to one or more of that entity. For example, "a characteristic" refers to one or more characteristics or at least one characteristic. As such, the terms "a" (or "an"), "one or more" and "at least one" are used interchangeably herein. It is also to be noted that the terms "comprising", "including", and "having"
have been used interchangeably.
Ranges: throughout this disclosure, various aspects described herein can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope described herein. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
It will be understood that any aspects described as "comprising" certain components may also "consist of' or "consist essentially of," wherein "consisting of' has a closed-ended or restrictive meaning and "consisting essentially of' means including the components specified but excluding other components except for materials present as impurities, unavoidable materials present as a result of processes used to provide the components, and components added for a purpose other than achieving the technical effect described herein. For example, a composition defined using the phrase "consisting essentially of' encompasses any known pharmaceutically acceptable additive, excipient, diluent, carrier, and the like. Typically, a composition consisting essentially of a set of components will comprise less than 5% by weight, typically less than 3%
by weight, more typically less than 1% by weight of non-specified components.
It will be understood that any component defined herein as being included may be explicitly excluded from the claimed invention by way of proviso or negative limitation.
Many patent applications, patents, and publications are referred to herein to assist in understanding the aspects described. Each of these references are incorporated herein by reference in their entirety.
The foregoing examples and detailed description are offered by way of illustration and not by way of limitation. All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the scope of the appended claims.
References 1. Hebert PDN, Cywinska A, Ball S, deWaard JR. Biological identifications through DNA
barcodes. Proc R Soc Lond B Biol Sci. 2003; 270: 313-321.
2. Hebert PDN, deWaard JR, Zakharov EV, Prosser SWJ, Sones JE, McKeown JTA, et al. A DNA
`Barcode Blitz': Rapid digitization and sequencing of a natural history collection. PLoS ONE.
2013; 8: e68535. doi:10.1371/journal.pone.0068535.
3. Mutanen M, Kekkonen M, Prosser SW, Hebert PDN, Kaila L. One species in eight: DNA
barcodes from type specimens resolve a taxonomic quagmire. Mol Ecol Resour.
2014; doi:
10.1111/1755-0998.12361.
4. Hausmann A, Hebert PDN, Mitchell A, Rougerie A, Sommerer M, Edwards T.
Revision of the Australian Oenochroma vinaria Guenee, 1858 species-complex (Lepidoptera:
Geometridae, Oenochrominae): DNA barcoding reveals cryptic diversity and assesses status of type specimen without dissection. Zootaxa. 2009a; 2239: 1-21.
5. Kirchman JJ, Witt CC, McGuire JA, Graves GR. DNA from a 100-year-old holotype confirms the validity of a potentially extinct hummingbird species. Biol Lett. 2010; 6:
112-115.
6. Gilbert MTP, Moore W, Melchior L, Worobey M. DNA extraction from dry museum beetles without conferring external morphological damage. 2007; PLoS ONE. 2: e272.
doi :10.1371/j ournal .pone .0000272.
7. Thomsen PF, Elias S, Gilbert MTP, Haile J, Munch K, Kuzmina S, et al. Non-destructive sampling of ancient insect DNA. PLoS ONE. 2009; 4: e5048.
doi:10.1371/journal.pone.0005048.
8. Dean MD, Ballard JWO. Factors affecting mitochondrial DNA quality from museum preserved Drosophila simulans. Entomol Exp Appl. 2001; 98: 279-283.
9. Hernandez-Triana LM, Prosser SW, Rodriguez-Perez MA, Chaverri LG, Hebert PDN, Gregory, TR. Recovery of DNA barcodes from blackfly museum specimens (Diptera:
Simuliidae) using primer sets that target a variety of sequence lengths. Mol Ecol Resour. 2013;
14: 508-518. doi:
10.1111/1755-0998.12208.
10. Van Houdt JKJ, Breman FC, Virgilio M, De Meyer M. Recovering full DNA
barcodes from natural history collections of Tephritid fruitflies (Tephritidae, Diptera) using mini barcodes. Mol Ecol Resour. 2010; 10: 459-465.
11. Bluemel JK, King RA, Virant-Doberlet M, Symondson WOC. Primers for identification of type and other archived specimens of Aphrodes leafhoppers (Hemiptera, Cicadellidae). Mol Ecol Resour. 2011; 11: 770-774.
12. Hausmann A, Sommerer M, Rougerie R, Hebert P. Hypobapta tachyhalotaria n.
sp. from Tasmania ¨ an example of a new species revealed by DNA barcoding (Lepidoptera, Geometridae).
Spixiana. 2009b; 32: 161-166.
13. Lees DC, Lack HW, Rougerie R, Hernandez-Lopez A, Raus T, Avtzis ND, et al.
Tracking origins of invasive herbivores using herbaria and archival DNA: the case of the horse-chestnut leafminer. Front Ecol Environ. 2011; 9: 322-328.
14. Rougerie R, Naumann S, Nassig WA. Morphology and molecules reveal unexpected cryptic diversity in the enigmatic genus Sinobirma Bryk, 1944 (Lepidoptera:
Saturniidae). PLoS ONE.
2012; 7: e43920. doi:10.1371/journal.pone.0043920.
15. Lees DC, Rougerie R, Zeller-Lukashort C, Kristensen NP. DNA mini-barcodes in taxonomic assignment: a morphologically unique new homoneurous moth clade from the Indian Himalayas described in Micropterix (Lepidoptera, Micropterigidae). Zool Scr. 2010; 39:
642-661.
16. Strutzenberger P, Brehm G, Fiedler K. DNA barcode sequencing from old type specimens as a tool in taxonomy: A case study in the diverse genus Eois (Lepidoptera:
Geometridae). PLoS
ONE. 2012; 7: e49710.
17. Zimmermann J, Hajibabaei M, Blackburn DC, Hanken J, Cantin E, Posfai J, et al. DNA damage in preserved specimens and tissue samples: a molecular assessment. Front Zool.
2008; 5: 18.
18. Allentoft ME, Collins M, Harker D, Haile J, Oskam CL, Hale ML et al. The half-life of DNA
in bone: measuring decay kinetics in 158 dated fossils. Proc Biol Sci. 2012;
279: 4724-4733.
19. Rowe KC, Singhal S, Macmanes MD, Ayroles JF, Morelli TL, Rubidge EM, et al. Museum genomics: low-cost and high-accuracy genetic data from historical specimens.
Mol Ecol Resour.
2011; 11: 1082-1092. doi:10.1111/j.1755-0998.2011.03052.x.
20. Shokralla S, Gibson JF, Nikbakht H, Janzen DH, Hallwachs W, Hajibabaei M.
Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA
barcode capture from single specimens. Mol Ecol Resour. 2014; 14: 892-901.
21. Shokralla S, Porter T, Gibson J, Dobosz R, Janzen DH, et al. Massively parallel multiplex DNA sequencing for specimen identification using an Illumina Mi Seq platform.
Sci Rep, 2015; 5:
9687.
22. Holloway JD, Miller SE, Pollock DM, Helgen L, Darrow K. GONGED
(Geometridae of New Guinea Electronic Database): a progress report on development of an online facility of images.
Spixiana. 2009; 32: 122-123.
23. Miller SE. DNA barcode enabled ecological research on Geometridae in Papua New Guinea.
Spixiana. 2014; 37: 245-246.
24. Knolke S, Erlacher S, Hausmann A, Miller MA, Segerer AH. A procedure for combined genitalia dissection and DNA extraction in Lepidoptera. Insect Syst Evol.
2005; 35: 401-409.
25. Ivanova NV, deWaard JR, Hebert PDN. An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes. 2006; 6: 998-1002.
26. deWaard JR, Ivanova NV, Hajibabaei M, Hebert PDN. Assembling DNA barcodes.
Methods Mol Biol. 2008; 410: 275-293.
27. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol.
2010; Chapter 19:
Unit 19.10: 11-21.
Sanger Sequencing Since DNA quality varies greatly, even among specimens of similar age [2,8], each DNA extract was initially assessed by Sanger analysis. This involved an attempt to amplify both 164bp (C microLepF 1 tl + C TypeR1) and 94bp (C TypeF1 + C TypeR1) regions of the COI barcode [2,9]. PCR amplification and cycle sequencing employed standard CCDB
protocols [2,25,26] with amplicons bidirectionally sequenced on an ABI 3730XL
(Applied Biosystems). All traces were edited using CodonCode v. 4.2.7 (CodonCode Corporation) and the resulting 164bp and 94bp sequences were validated by comparison with sequences from conspecific individuals or, when they were unavailable, by Neighbor-Joining (NJ) analysis to ensure that each sequence branched as expected. These tests for sequence recovery permitted the assignment of DNA from each specimen to one of three categories: 1) High Quality (HQ) ¨ those that generated a 164bp sequence; 2) Medium Quality (MQ) ¨ those that generated a 94bp sequence; and 3) Low Quality (LQ) ¨ those that failed to generate any sequence. The present study examined ten specimens from each category with the goal of developing a NGS
protocol effective across varying levels of DNA degradation. The specimens (Table 3) selected for analysis included a single representative from each of 30 genera in the family Geometridae, all more than a century old (mean age = 111 years). Sequences, electropherograms and primer details for the specimens are on BOLD (dx.doi.org/10.5883/DS-NGSTYPES) and GenBank (see Table 3 for accession numbers).
tµ.) Table 3. Type specimens analyzed, including sequencing results and accession numbers c:
,-, -.1 o NGS
Sequence tµ.) Process ID No.
Sanger Age Sanger Min. Max. Avg.
Recovered Contig Read --4 (Sanger! Identification Status NGS
Genbank vi (Yrs) Group Coy. Coy. Coy. bp by NGS GenBank Archive NGS) Reads Acc.
Acc.
Acc.
Myrioblephara / PNGTY1837- 104 Syntype HQ 143804 7 115751 29924 658 pending pending SRR1867808 mixticolor I Cassephyra P
112 Holotype HQ 213007 72 146189 42992 658 pending pending SRR1867811 PNGTY1827- plenimargo o N) .
.
, I Psilalcis 109 Syntype HQ 106286 0 74012 20477 448 pending pending SRR1867812 ,I, PNGTY1843- auropurpurea , , , I Paralcidia 110 Syntype HQ 221885 5 168474 44541 657 pending pending SRR1867813 PNGTY1839- marginata Iv I Atmoceras n 110 Syntype HQ 143340 30 76376 28215 658 pending pending SRR1867814 1-3 PNGTY1823- plumosa n tµ...) 'a tA
o o --.) o 14 / Tripteridia 110 viridisecta Syntype HQ 188107 1 101191 37855 570 pending pending SRR1867815 14 / Gymnoscelis 111 PNGTY1834- ochriplaga Holotype HQ 186897 1 103399 38169 657 pending pending SRR1867816 14 / Axinoptera 110 fiata Holotype HQ 166116 0 83389 31838 474 pending pending SRR1867817 asc 14 / Calluga 112 semirasata Holotype HQ 215946 1 154302 43408 658 pending pending SRR1867818 123 Eois semirubra Holotype HQ 232024 106 143908 46803 658 pending pending SRR1867819 / PNGTY1831- 112 Collix ghoshaN/A MQ 11665 0 11082 2142 459 pending pending SRR1945335 dichobathra / PNGTY1838- 111 PapuarismeHolotype MQ 6479 0 5747 1165 569 pending pending SRR1945382 brunneata n.) o 1--, / PNGTY1835- 109 HyposidraN/A MQ 62208 0 45441 12301 570 pending pending SRR1945383 apiciftdva un / PNGTY1836- 101 Milionia know/el Syntype MQ 44190 1 43301 9153 658 pending pending SRR1945384 PNGTY587-13 Ctimene / PNGTY1846- 118 basistraga Syntype MQ 546 0 105 31 323 pending pending SRR1946575 15 obsoleta PNGTY639-13 Psendensemia P
/ PNGTY1842- 102 bursadoides Syntype MQ 134542 37 82031 27382 658 pending pending SRR1945385 , .
.
15 dignitosa co Lo N) u, PNGTY917-13 0"
Pingasa nob//is / PNGTY1840- 108 Holotype MQ 46793 6 24276 9516 658 pending pending SRR1945386 furvifrons , , , Aeolochroma / PNGTY1821- 121 Holotype MQ 68837 3 43442 14002 658 pending pending SRR1945387 caesia Sarcinodes / PNGTY1844- 106 Holotype MQ 99655 86 42405 20379 658 pending pending SRR1945388 subvirgata Iv n Celerena /erne n / PNGTY1828- 105 Holotype MQ 113363 1 42333 21657 569 pending -- pending -- SRR1945389 tµ...) amplimargo 'a un o o o C
n.) o / PNGTY1832- 104 Dyscheralcis Syntype LQ 2681 0 1278 424 514 N/A pending SRR1867935 retroflexa un 110 Alcis irrufata Holotype LQ 49944 2 37401 8881 657 N/A pending SRR1867936 14 / Cleora repetita Syntype LQ 7632 1 5708 731 454 N/A pending SRR1867937 PNGTY1830- suffusa P
.
.
PNGTY008- Spectrobasis .
N) 109 Syntype LQ 1468 0 1157 280 237 N/A N/A SRR1867938 u, 12* / N/A difjerens t.,..) N) .
PNGTY073- Desmoclysha 104 Holotype LQ 320 0 116 40 357 , 12* / N/A umpuncta , , PNGTY102- Sterrhochaeta 120 Syntype LQ 3081 0 215 91 324 12* / N/A minuta PNGTY120- Propithex 118 Holotype LQ 14863 0 1554 163 323 12* / N/A alternata Iv / PNGTY1825- 105 Bursadopsis Syntype LQ 133263 4 71444 20742 634 N/A pending SRR1867942 n plenifascia n t...., 'a u, =
=
14 / Chloroclystis 117 Syntype LQ 4411 0 1685 647 419 N/A pending SRR1867943 PNGTY1829- rufofasciata Polyacme 112 straminea Holotype LQ 141402 23 71197 28117 658 N/A pending SRR1867944 brunneata The four Process ID's marked with an asterisk (*) represent specimens where NGS analysis generated sequence reads from multiple species. HQ ¨
high quality; MQ ¨ medium quality; LQ ¨ low quality; N/A ¨ not applicable.
1-d c7, Next-generation Sequencing DNA degradation often limits PCR amplicons to <200bp in specimens that are more than 50 years old [18], precluding efforts to recover the entire barcode region with one or two primer sets. As a consequence, primer sets were designed to amplify fragments ranging in length from 120bp to 148bp with enough overlap to permit recovery of the 658bp barcode region. These primers needed to be tailed with adapter sequences for analysis on an Ion Torrent PGM (Life Technologies) and with multiplex identifier (MID) tags to distinguish sequence reads from each specimen. Ten sets of MID-tagged primers, each consisting of six forward and six reverse primers, were employed to analyze ten type specimens per NGS run (Table 4).
Table 4. Primers used in the first (PCR1) and second (PCR2) reactions to allow the analysis of 10 specimens in an Ion Torrent PGM run.
PCR Code Primer Name Sequence (5'-3') MID
Adapter ATTCAACCAATCATA
Fl LepF 1-Sanger-ion 1 None None AAGATATTGG
AT TRRWRA TGATCAA
F2 AncientLepF2-Sanger-ion 1 None None RTWTATAAT
TTATAATTGGDGGRT
F3 AncientLepF3 -Sanger-ion 1 None None AGWAGWATWRTWR
F4 AncientLepF4-Sanger-ion 1 None None AWAVVVGG
ATTTTTWSWCTWCA
F5 AncientLepF5 -Sanger-ion 1 None None TWTDGCWGG
TATTTGTWTGAKCW
F6 AncientLepF6-Sanger-ion 1 None None RTWKKWATTAC
WGGTATWACTATRA
R1 AncientLepR 1 -Sanger-ion 1 None None ARAAAAT TA T
TCARAAWCTWATRT
R2 AncientLepR2-Sanger-ion2 None None TRTTTADWCG
ARDGGDGGRTAWAC
R3 AncientLepR3-Sanger-ion3 None None WGTTCAWCC
GTWGWAATRAARTT
R4 AncientLepR4-Sanger-ion4 None None DATWGCWCC
GTTARWARTATDGT
R5 AncientLepR5-Sanger-ion5 None None RATDGCWCC
TAAACTTCTGGATGT
R6 LepR1-Sanger-ion6 None None CCAAAAAATCA
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF 1-ion 1 CTCAG AT TCAAC CAA A
ss 1 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion 1 CTCAG AT TRRWRAT G A
ss 1 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3 -ion 1 CTCAG TTATAATTGG A
ss 1 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion 1 CTCAG AGWAGWATW A
ss 1 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5 -ion 1 CTCAG ATTTTTWSWC A
ss 1 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion 1 CTCAG TATTTGTWTG A
ss 1 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
= IonXpre PCR2 Fl LepF 1 -ion2 CTCAG ATTCAACCAA A
ss2 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion2 CTCAG ATTRRWRAT A
ss2 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion2 CTCAG TTATAATTGG A
ss2 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion2 CTCAG AGWAGWAT A
ss2 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion2 CTCAG ATTTTTWSWC A
ss2 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion2 CTCAG TATTTGTWTG A
ss2 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion3 CTCAG ATTCAACCAA A
ss3 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion3 CTCAG ATTRRWRATG A
ss3 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F3 AncientLepF3-ion3 CTCAG TTATAATTGG A
ss3 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion3 CTCAG AGWAGWATW A
ss3 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion3 CTCAG ATTTTTWSWC A
ss3 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion3 CTCAG TATTTGTWTG A
ss3 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion4 CTCAG ATTCAACCAA A
ss4 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion4 CTCAG ATTRRWRATG A
ss4 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion4 CTCAG TTATAATTGG A
ss4 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion4 CTCAG AGWAGWATW A
ss4 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion4 CTCAG ATTTTTWSWC A
ss4 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion4 CTCAG TATTTGTWTG A
ss4 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion5 CTCAG ATTCAACCAA A
ss5 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F2 AncientLepF2-ion5 CTCAG ATTRRWRAT A
ss5 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion5 CTCAG TTATAATTGG A
ss5 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion5 CTCAG AGWAGWAT A
ss5 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion5 CTCAG ATTTTTWSW A
ss5 CTWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion5 CTCAG TATTTGTWTG A
ss5 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion6 CTCAG ATTCAACCAA A
ss6 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion6 CTCAG ATTRRWRATG A
ss6 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion6 CTCAG TTATAATTGG A
ss6 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion6 CTCAG AGWAGWATW A
ss6 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion6 CTCAG ATTTTTWSWC A
ss6 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion6 CTCAG TATTTGTWTG A
ss6 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
= IonXpre PCR2 Fl LepF I -ion7 CTCAG ATTCAACCAA A
ss7 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion7 CTCAG ATTRRWRATG A
ss7 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion7 CTCAG TTATAATTGG A
ss7 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion7 CTCAG AGWAGWATW A
ss7 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion7 CTCAG ATTTTTWSWC A
ss7 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion7 CTCAG TATTTGTWTG A
ss7 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF I -ion8 CTCAG ATTCAACCAA A
ss 8 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion8 CTCAG ATTRRWRATG A
ss 8 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F3 AncientLepF3-ion8 CTCAG TTATAATTGG A
ss 8 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion8 CTCAG AGWAGWATW A
ss 8 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion8 CTCAG ATTTTTWSWC A
ss 8 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion8 CTCAG TATTTGTWTG A
ss8 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepF 1 -ion9 CTCAG ATTCAACCAA A
ss9 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F2 AncientLepF2-ion9 CTCAG ATTRRWRAT A
ss9 GATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion9 CTCAG TTATAATTGG A
ss9 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion9 CTCAG AGWAGWAT A
ss9 WRTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion9 CTCAG ATTTTTWSW A
ss9 CTWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion9 CTCAG TATTTGTWTG A
ss9 AKCWRTWKKWATTAC
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre Fl LepFl-ion10 CTCAG ATTCAACCAA A
ss10 TCATAAAGATATTGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre PCR2 F2 AncientLepF2-ion10 CTCAG ATTRRWRATG A
ss10 ATCAARTWTATAAT
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F3 AncientLepF3-ion10 CTCAG TTATAATTGG A
ss10 DGGRTTTGGWAATTG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F4 AncientLepF4-ion10 CTCAG AGWAGWATW A
ss10 RTWRAWAVWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F5 AncientLepF5-ion10 CTCAG ATTTTTWSWC A
ss10 TWCATWTDGCWGG
CCATCTCATCCCTGCGTGTCTCCGA
IonXpre F6 AncientLepF6-ion10 CTCAG TATTTGTWTG A
ss10 AKCWRTWKKWATTAC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R1 AncientLepRl-ionl-trP 1 WGGTATWACTATRAAR trP 1 ss 1 AAAATTAT
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R2 AncientLepR2-ion2-trP 1 TCARAAWCTWATRTTR trP 1 ss2 TTTADWCG
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R3 AncientLepR3-ion3-trP 1 ARDGGDGGRTAWACWG trP 1 ss3 TTCAWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R4 AncientLepR4-ion4-trP1 GTWGWAATRAARTTDA trP1 ss4 TWGCWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R5 AncientLepR5-ion5-trP 1 GTTARWARTATDGTRAT trP1 ss5 DGCWCC
CCTCTCTATGGGCAGTCGGTGAT
IonXpre R6 LepRl-i on6-trP1 TAAACTTCTGGATGTCC trP1 ss6 AAAAAATCA
The "Code" column refers to primer labels in Fig. 2. The COI binding region within each primer sequence is shown in black, while the 10bp tail (PCR1) or MID tag (PCR2) is shown in blue.
The "key sequence" (required for Ion Torrent sequencing) is shown in green and the sequencing adapters are shown in red. The 10bp tails on the PCR1 primers are technically IonXpress MID
tags, but they serve only to block short amplicons from acting as primers during PCR1. They were chosen over random decamer tails to maximize primer-template matching in PCR2. The same forward and reverse PCR1 primers are used for all ten samples in the first round of PCR. In the second round of PCR, samples are discriminated by using ten different sets of MID-tagged forward PCR2 primers (the same set of PCR2 reverse primers is used for all ten samples).
Optimization of NGS Protocols Optimization studies tested the impact of varied primer combinations, number of PCR
cycles, differential concentrations of primers and nesting of PCRs. Efforts to multiplex all six forward and reverse primers in a single reaction were unsuccessful because the small regions of overlap were preferentially amplified over the six target fragments. Splitting the PCR into two reactions, each targeting non-adjacent fragments (e.g. PCR1 = F 1+R1, F3+R3, F5+R5; PCR2 =
F2+R2, F4+R4, F6+R6), solved this issue, but revealed another problem: the dominance of certain amplicons. This problem was overcome by mixing the forward primers with the full complement of reverse primers (e.g. PCR1 = F 1+F3+F5 + six reverse primers; PCR2 =
F2+F4+F6 + 5 reverse primers). This allowed each forward primer to potentially pair with several downstream reverse primers, creating redundancy that improved sequence recovery while reducing the dominance of any particular amplicon. For example, depending upon the quality of the template DNA, the barcode segment amplified by primers F4+R4 could be amplified by any of the twelve combinations of Fl, F2, F3 or F4 paired with R4, RS or R6. This redundancy aided the recovery of full-length barcodes from specimens with varied degrees of DNA degradation or with particular primer mismatches (as evidenced by the lack of a certain product in the final sequence array).
When DNA quality is poor, primer binding becomes increasingly important to "kick start"
amplification [26]. Perfect primer binding is impossible when diverse taxa are analyzed, but the prospects for recovery of desired amplicons can be improved by raising the number of PCR cycles and by increasing the primer degeneracy. Both tactics were employed in the present NGS protocol.
Two rounds of PCR were employed, with 60 cycles in the first and 40 cycles in the second.
All forward and reverse primers included degeneracy at the sites most important for primer binding (3' terminus for F, 5' terminus for R). Considering this degeneracy, the 12 forward and reverse primers were actually a cocktail of 2010 primers. Other factors were found to have important impacts on final outcomes. For example, initial tests revealed that primers with the 33bp-40bp adapter/MID tails required for NGS were less effective in generating product than the same primers without tails, a difference that was particularly strong for LQ extracts. This difference was probably due to interference with primer binding caused by the formation of secondary structures in the primers with tails. Although primers without tails produced the highest amplification success, their use allowed short, non-target amplicons to act as primers generating chimeric amplicons which combined sequence information from primers and the specimen.
To overcome this problem, 10bp tails lacking complementarity to any region in the target genomes were added to the 5' terminus of all primers. Their presence inhibited polymerase elongation when short amplicons or primer dimers attempted to act as primers, preventing the formation of chimeric amplicons while avoiding the secondary structure issues inherent with longer tails. Although the first round of PCR was effective in generating amplicons, a second round of PCR was used to introduce the adapter-tailed primers for sequence analysis. It likely had the additional benefit of reducing amplification bias because it involved six separate reactions, one for each forward primer, dampening amplification bias by limiting primer competition.
Final NGS Protocol These experimental studies led to the development of a two-stage, nested, multiplex PCR
protocol which produced sequence records spanning the barcode region. The first round of PCR
included two reactions for each specimen (PCR 1.1 and PCR 1.2 in Fig. 2a), each consuming 21..t.L
of genomic DNA as template. Each reaction included three forward primers (F
1+F3+F5 or F2+F4+F6) with six and five reverse primers respectively, allowing each forward primer to generate from 1-6 amplicons, depending on the quality of DNA and its binding position in relation to the reverse primers. Detailed reaction components (final volume = 12.5 [tL) are provided in Table 5. Thermocycling consisted of 94 C for 2 minutes, 60 cycles of {94 C for 40 seconds, 48 C
for 40 seconds, 72 C for 30 seconds}, and a final extension of 72 C for 5 minutes.
Table 5. Components of PCR reactions in the NGS protocol.
PCR 1.1 PCR 1.2 PCR 2.1, 2.2' PCR 2.5 PCR 2.6 10% Trehalose 5.125 [IL 5.25 [IL 5.75 [IL 5.875 [IL 6.0 [IL
H20 0.13 [IL 0.13 [IL 0.13 [IL 0.13 [IL 0.13 [IL
5X Buffer 2.5 [IL 2.5 [IL 2.5 [IL 2.5 [IL 2.5 [IL
25 mM MgCl2 1.25 [IL 1.25 [IL 1.25 [IL 1.25 [IL 1.25 [IL
[IM primers 0.125 [IL each 0.125 [IL each 0.125 [IL each 0.125 [IL each 0.125 [IL each 10 [IM dNTP 0.0625 [IL 0.0625 [IL 0.0625 [IL 0.0625 [IL
0.0625 [IL
Taq (5U1 [IL) 0.06 [IL 0.06 [IL 0.06 [IL 0.06 [IL 0.06 [IL
Template 2 [IL 2 [IL 2 [IL 2 [IL 2 [IL
TOTAL 12.5 [IL 12.5 [IL 12.5 [IL 12.5 [IL 12.5 [IL
Reactions differ only in the number of primers and the amount of trehalose.
Trehalose sourced from Fluka Analytical; Hyclone ultra-pure water from Thermo Fisher Scientific;
Buffer, MgCl2, and Taq polymerase from KAPA Biosystems; primers from Integrated DNA
Technologies.
In Figure 2 the primer positions for the first and second rounds of PCR (a) and all possible final amplicons (b) is shown. The initial round of PCR includes two separate reactions (a - above broken line) using 10bp tailed primers and genomic DNA as template (shown in parentheses below reaction names). The second round of PCR includes six separate reactions (a -below broken line) using adapter-tailed primers and the products from the first PCR reactions as template (shown in parentheses below reaction names). The second PCR can generate up to 15 amplicons spanning the entire COI barcode region (b). To assign each amplicon to a particular type specimen, each forward PCR2 primer is tailed with MID tags unique to that specimen. To assign each amplicon to a particular reaction (i.e. 2.1, 2.2, 2.3, etc.), each reverse PCR2 primer is tailed with a MID tag unique for each reaction in the second round of PCR.
The second round of PCR used product from the first PCR reactions as template and included six reactions per specimen (PCR 2.1-2.6 in Fig. 2a), each coupling a single forward primer with one to three reverse primers and using 24, of the appropriate primary PCR product as template. It boosted amplicon yields while also adding the required sequencing adapters. Each secondary PCR generated 1-3 amplicons which collectively spanned the COI
barcode region (Fig.
2b). The first four PCRs (2.1-2.4 in Fig. 2a) contained forward primers F1-F4, each combined with the three immediately downstream reverse primers (e.g. F1+R1+R2+R3). The fifth PCR (2.5 in Fig. 2a) combined F5 with R5 and R6, while the sixth PCR (2.6 in Fig. 2a) combined F6 with R6. All of these reactions employed primers with adapter tails and MID tags to enable NGS to discriminate fragments and/or individuals in post processing. Detailed reaction components (final volume = 12.5 L) are provided in Table 5. Thermocycling consisted of 94 C for 2 minutes, 40 cycles of {94 C for 40 seconds, 48 C for 40 seconds, 72 C for 30 seconds}, and a final extension of 72 C for 5 minutes.
The secondary PCR products from each specimen (six reactions) were pooled and a double size selection protocol (PCRClean DX kit ¨ Aline Biosciences) was employed to remove genomic DNA, primer dimers and residual primers. The first cleanup step was designed to remove any high molecular weight genomic DNA (>800bp) that might reflect recent contamination (e.g. human DNA from researchers working with the specimens). Briefly, the PCR product and magnetic beads were incubated in a 2:1 ratio (volume PCR product: volume beads) for 8 minutes at room temperature followed by 2 minutes on a magnet. The pellet of beads was discarded, while the supernatant was retained for the second cleanup step which was designed to bind molecular weights ranging from 250bp-700bp (i.e. the PCR products) onto beads, while lower molecular weight DNA (primer dimers, residual primers) remained in solution. This step was carried out by mixing enough beads and sterile water to generate a 5:4 ratio (PCR product:
beads) and incubated for 8 minutes followed by two minutes on a magnet. The supernatant was discarded and the pellet of beads was washed three times with 80% ethanol before the PCR products were eluted from the beads with 36 tL of sterile water. Following cleanup, the concentration of each purified PCR
product was measured on a Qubit 2.0 spectrophotometer using the Qubit dsDNA HS
Assay Kit (Life Technologies) and all 10 samples were normalized to 1 ng/ L and mixed in equal proportions. From this mixture, the final sequencing template library was created by making a 1/300 dilution. An Ion PGM Template 0T2 400 kit (Life Technologies) was used for template preparation and sequencing was carried out on an Ion Torrent PGM following the manufacturer's instructions. Sequencing was performed on a 316 chip using an Ion PGM
Sequencing 400 Kit (Life Technologies).
Data Analysis Raw data from each Ion Torrent PGM run were uploaded to the Galaxy platform for analysis (https://usegalaxy.org/) [27]. Several filters were applied to remove low quality, short, and non-target reads before an alignment was constructed to assemble the full barcode contig.
Representative examples of the sequence reads recovered from HQ and LQ
extracts are shown in Figure 6. The resultant FASTA file was then exported to permit comparisons with Sanger-generated sequences in BOLD. The authenticity of each NGS-generated sequence was subsequently validated by querying the sequence against the BOLD
Identification Engine (www.boldsystems.org) to check for contamination or non-target amplification.
Further validation was performed via Neighbor-Joining (NJ) analysis that included the NGS-generated sequences as well as sequences from recently collected specimens of the same species or close relatives. The compiled reads from each run were deposited in the Sequence Read Archive (SRA;
http://www.ncbi.nlm.nih.gov/sra) under study accession 5RP055961 (see Table 3 for individual sample accession numbers), while the barcode contig for each specimen was deposited in the BOLD dataset (dx.doi.org/10.5883/DS-NGSTYPES) and in GenBank (see Table 3 for accession numbers).
Results Because the NGS protocol allowed the simultaneous processing often specimens, just three runs were required to analyze the 30 specimens. The average number of sequence reads per specimen showed five-fold variation (182K, 59K, 36K), while the average depth of coverage per base showed six-fold variation (36K, 12K, 6K) across the three DNA categories (Figs. 3a and 3b).
The number of reads per specimen averaged 90K, resulting in an average coverage depth of 18K
per base. Sequences were recovered from every specimen with reads averaging 610bp, 578bp, and 458bp for the HQ, MQ and LQ extracts respectively (Fig. 3c). Barcode compliant sequences (>487bp) were recovered from 8 HQ, 8 MQ, and 4 LQ specimens (Table 3), while sequence records >400bp were recovered from 25 of 30 specimens (83%). In fact, more than 200bp of sequence data was recovered from all 30 specimens (Table 3). The recovery of sequences from ten type specimens in each of three DNA categories was shown in Figure 3.
The sequences generated by NGS samples from the HQ and MQ specimens were perfectly matched in their zones of overlap to the shorter sequences generated by Sanger analysis (Fig. 4).
The protocol does involve 100 cycles of PCR amplification, but there was no evidence of artifacts when the NGS sequences were compared to their Sanger counterparts (Fig. 4).
Further confirmation of their validity was provided by the fact that they grouped with sequences from closely allied taxa (Fig. 5). It was more difficult to verify the sequences obtained via NGS from the LQ specimens because they had no Sanger counterparts for comparison. In six cases, the NGS
sequences clearly derived from a single species, but reads from the other four specimens appeared to originate from two or more species. Obvious contaminants (e.g. fungi, bacteria) were easily removed during post processing, but some sequences in these four records appeared to derive from closely allied species or pseudogenes. In principle, the contaminants and authentic sequences could be discriminated if reference sequences were available from modern specimens of these species, but they were not. Because the four specimens showing these admixtures generated the fewest sequence reads and the lowest depth of coverage, it is likely that their DNA
was heavily degraded (Table 3). Once contemporary sequences for these species become available, it should be possible to recognize the authentic sequences.
To summarize, the current method works on a plurality of samples simultaneously with high success rates for good quality degraded DNA with a slight drop for lower quality degraded DNA. The method still works for samples that may contain almost no intact DNA.
Lowest quality degraded DNA was still amplified and characterized using the method of the invention and shown to recover >500bp sequences from samples that failed using traditional Sanger approaches. The method may be used universally on any type of degraded DNA sample for many applications including environmental, forensics and food industry (cooked foods contain degraded DNA), generally in any application where DNA is degraded due to age, environment, processing and so forth. The method can be customized for invertebrates, mammals, fish, birds and so forth. In one aspect, the method effectively amplifies and characterizes entire barcode regions for use in biological classification. This will be helpful for classification of old specimens such as for example those found in museums [2-5,28], as demonstrated in two recent studies [29,30].
The invention can be provided as a system in a kit containing the desired primers, buffers, enzymes, instructions for use and so forth. A kit may be customized for a particular specimen, a specimen that would comprise degraded DNA.
It is to be noted that the term "a" or "an" entity refers to one or more of that entity. For example, "a characteristic" refers to one or more characteristics or at least one characteristic. As such, the terms "a" (or "an"), "one or more" and "at least one" are used interchangeably herein. It is also to be noted that the terms "comprising", "including", and "having"
have been used interchangeably.
Ranges: throughout this disclosure, various aspects described herein can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope described herein. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
It will be understood that any aspects described as "comprising" certain components may also "consist of' or "consist essentially of," wherein "consisting of' has a closed-ended or restrictive meaning and "consisting essentially of' means including the components specified but excluding other components except for materials present as impurities, unavoidable materials present as a result of processes used to provide the components, and components added for a purpose other than achieving the technical effect described herein. For example, a composition defined using the phrase "consisting essentially of' encompasses any known pharmaceutically acceptable additive, excipient, diluent, carrier, and the like. Typically, a composition consisting essentially of a set of components will comprise less than 5% by weight, typically less than 3%
by weight, more typically less than 1% by weight of non-specified components.
It will be understood that any component defined herein as being included may be explicitly excluded from the claimed invention by way of proviso or negative limitation.
Many patent applications, patents, and publications are referred to herein to assist in understanding the aspects described. Each of these references are incorporated herein by reference in their entirety.
The foregoing examples and detailed description are offered by way of illustration and not by way of limitation. All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the scope of the appended claims.
References 1. Hebert PDN, Cywinska A, Ball S, deWaard JR. Biological identifications through DNA
barcodes. Proc R Soc Lond B Biol Sci. 2003; 270: 313-321.
2. Hebert PDN, deWaard JR, Zakharov EV, Prosser SWJ, Sones JE, McKeown JTA, et al. A DNA
`Barcode Blitz': Rapid digitization and sequencing of a natural history collection. PLoS ONE.
2013; 8: e68535. doi:10.1371/journal.pone.0068535.
3. Mutanen M, Kekkonen M, Prosser SW, Hebert PDN, Kaila L. One species in eight: DNA
barcodes from type specimens resolve a taxonomic quagmire. Mol Ecol Resour.
2014; doi:
10.1111/1755-0998.12361.
4. Hausmann A, Hebert PDN, Mitchell A, Rougerie A, Sommerer M, Edwards T.
Revision of the Australian Oenochroma vinaria Guenee, 1858 species-complex (Lepidoptera:
Geometridae, Oenochrominae): DNA barcoding reveals cryptic diversity and assesses status of type specimen without dissection. Zootaxa. 2009a; 2239: 1-21.
5. Kirchman JJ, Witt CC, McGuire JA, Graves GR. DNA from a 100-year-old holotype confirms the validity of a potentially extinct hummingbird species. Biol Lett. 2010; 6:
112-115.
6. Gilbert MTP, Moore W, Melchior L, Worobey M. DNA extraction from dry museum beetles without conferring external morphological damage. 2007; PLoS ONE. 2: e272.
doi :10.1371/j ournal .pone .0000272.
7. Thomsen PF, Elias S, Gilbert MTP, Haile J, Munch K, Kuzmina S, et al. Non-destructive sampling of ancient insect DNA. PLoS ONE. 2009; 4: e5048.
doi:10.1371/journal.pone.0005048.
8. Dean MD, Ballard JWO. Factors affecting mitochondrial DNA quality from museum preserved Drosophila simulans. Entomol Exp Appl. 2001; 98: 279-283.
9. Hernandez-Triana LM, Prosser SW, Rodriguez-Perez MA, Chaverri LG, Hebert PDN, Gregory, TR. Recovery of DNA barcodes from blackfly museum specimens (Diptera:
Simuliidae) using primer sets that target a variety of sequence lengths. Mol Ecol Resour. 2013;
14: 508-518. doi:
10.1111/1755-0998.12208.
10. Van Houdt JKJ, Breman FC, Virgilio M, De Meyer M. Recovering full DNA
barcodes from natural history collections of Tephritid fruitflies (Tephritidae, Diptera) using mini barcodes. Mol Ecol Resour. 2010; 10: 459-465.
11. Bluemel JK, King RA, Virant-Doberlet M, Symondson WOC. Primers for identification of type and other archived specimens of Aphrodes leafhoppers (Hemiptera, Cicadellidae). Mol Ecol Resour. 2011; 11: 770-774.
12. Hausmann A, Sommerer M, Rougerie R, Hebert P. Hypobapta tachyhalotaria n.
sp. from Tasmania ¨ an example of a new species revealed by DNA barcoding (Lepidoptera, Geometridae).
Spixiana. 2009b; 32: 161-166.
13. Lees DC, Lack HW, Rougerie R, Hernandez-Lopez A, Raus T, Avtzis ND, et al.
Tracking origins of invasive herbivores using herbaria and archival DNA: the case of the horse-chestnut leafminer. Front Ecol Environ. 2011; 9: 322-328.
14. Rougerie R, Naumann S, Nassig WA. Morphology and molecules reveal unexpected cryptic diversity in the enigmatic genus Sinobirma Bryk, 1944 (Lepidoptera:
Saturniidae). PLoS ONE.
2012; 7: e43920. doi:10.1371/journal.pone.0043920.
15. Lees DC, Rougerie R, Zeller-Lukashort C, Kristensen NP. DNA mini-barcodes in taxonomic assignment: a morphologically unique new homoneurous moth clade from the Indian Himalayas described in Micropterix (Lepidoptera, Micropterigidae). Zool Scr. 2010; 39:
642-661.
16. Strutzenberger P, Brehm G, Fiedler K. DNA barcode sequencing from old type specimens as a tool in taxonomy: A case study in the diverse genus Eois (Lepidoptera:
Geometridae). PLoS
ONE. 2012; 7: e49710.
17. Zimmermann J, Hajibabaei M, Blackburn DC, Hanken J, Cantin E, Posfai J, et al. DNA damage in preserved specimens and tissue samples: a molecular assessment. Front Zool.
2008; 5: 18.
18. Allentoft ME, Collins M, Harker D, Haile J, Oskam CL, Hale ML et al. The half-life of DNA
in bone: measuring decay kinetics in 158 dated fossils. Proc Biol Sci. 2012;
279: 4724-4733.
19. Rowe KC, Singhal S, Macmanes MD, Ayroles JF, Morelli TL, Rubidge EM, et al. Museum genomics: low-cost and high-accuracy genetic data from historical specimens.
Mol Ecol Resour.
2011; 11: 1082-1092. doi:10.1111/j.1755-0998.2011.03052.x.
20. Shokralla S, Gibson JF, Nikbakht H, Janzen DH, Hallwachs W, Hajibabaei M.
Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA
barcode capture from single specimens. Mol Ecol Resour. 2014; 14: 892-901.
21. Shokralla S, Porter T, Gibson J, Dobosz R, Janzen DH, et al. Massively parallel multiplex DNA sequencing for specimen identification using an Illumina Mi Seq platform.
Sci Rep, 2015; 5:
9687.
22. Holloway JD, Miller SE, Pollock DM, Helgen L, Darrow K. GONGED
(Geometridae of New Guinea Electronic Database): a progress report on development of an online facility of images.
Spixiana. 2009; 32: 122-123.
23. Miller SE. DNA barcode enabled ecological research on Geometridae in Papua New Guinea.
Spixiana. 2014; 37: 245-246.
24. Knolke S, Erlacher S, Hausmann A, Miller MA, Segerer AH. A procedure for combined genitalia dissection and DNA extraction in Lepidoptera. Insect Syst Evol.
2005; 35: 401-409.
25. Ivanova NV, deWaard JR, Hebert PDN. An inexpensive, automation-friendly protocol for recovering high-quality DNA. Mol Ecol Notes. 2006; 6: 998-1002.
26. deWaard JR, Ivanova NV, Hajibabaei M, Hebert PDN. Assembling DNA barcodes.
Methods Mol Biol. 2008; 410: 275-293.
27. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol.
2010; Chapter 19:
Unit 19.10: 11-21.
28. Miller SE, Hausmann A, Hallwachs W, Janzen DH. Advancing taxonomy and bioinventories with DNA barcodes. Phil. Trans. R. Soc. B. 2016; 371 20150339. doi:
10.1098/rstb.2015.0339.
10.1098/rstb.2015.0339.
29. Spiedel W, Hausmann A, Muller GC, Kravchenko V, Mooser J, Witt TJ, et al.
Taxonomy 2.0:
Sequencing of old type specimens supports the description of two new species of the Lasiocampa decolorata group from Morocco (Lepidoptera: Lasiocampidae). Zootaxa, 2015;
3999: 401-412.
Taxonomy 2.0:
Sequencing of old type specimens supports the description of two new species of the Lasiocampa decolorata group from Morocco (Lepidoptera: Lasiocampidae). Zootaxa, 2015;
3999: 401-412.
30. Hausmann A, Miller SE, Holloway JD, deWaard JR, Pollock D, Prosser SWJ, Hebert PDN.
Calibrating the taxonomy of a megadiverse insect family: 3000 DNA barcodes from geometrid type. Genome. 2016; 0,0. doi:10.1139/gen-2015-0197.
Calibrating the taxonomy of a megadiverse insect family: 3000 DNA barcodes from geometrid type. Genome. 2016; 0,0. doi:10.1139/gen-2015-0197.
Claims (40)
1. A two stage method for obtaining a full length barcode sequence from specimens with degraded DNA, the method comprising two step multiplex nested PCR utilizing primers that hybridize to portions of the barcode sequence that can pair in any combination to generate a plurality of amplicons that span the entire barcode sequence while avoiding overlap amplification, primer incorporation and/or primer dimer sequencing; and NGS
for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.
for sequencing the plurality of amplicons generated by the two step multiplex nested PCR and providing the barcode sequence.
2. The method of claim 1, wherein said two step multiplex nested PCR co-amplifies fragments covering the barcode region.
3. The method of claim 2, wherein said two step multiplex nested PCR
comprises:
a first multiplex PCR1 wherein said primers that hybridize to the barcode sequence and simultaneously blocking undesired elongation to form a plurality of amplicons; and a second multiplex PCR2 utilizing the amplicons from PCR1 as a template and a plurality of primers that are adapter-tailed, wherein in PCR1 forward primers are selected for all downstream reverse primers.
comprises:
a first multiplex PCR1 wherein said primers that hybridize to the barcode sequence and simultaneously blocking undesired elongation to form a plurality of amplicons; and a second multiplex PCR2 utilizing the amplicons from PCR1 as a template and a plurality of primers that are adapter-tailed, wherein in PCR1 forward primers are selected for all downstream reverse primers.
4. The method of claim 3, wherein said primers of PCR1 are tailed with short, non-complementary sequences.
5. The method of any one of claims 1 to 4, wherein said specimen contains at least 0.1 ng of degraded DNA, such as at least 0.5 ng, 1 ng, 10 ng, 100 ng, 500 ng, or from 2µg-5µg of degraded DNA.
6. The method of any one of claims 1 to 5, wherein said barcode sequence is the 658 base-pair region in the mitochondrial cytochrome c oxidase 1 gene (COI).
7. The method of any one of claims 1 to 5, wherein said barcode sequence is matK or rbcL for identifying plants.
8. A method to generate a plurality of redundant amplicons for a target degenerated DNA
sequence, the method comprising:
(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy;
(b) using the amplicon products of (a) as a template, performing a second multiplex nested PCR comprising a plurality of adapter-tailed primers with optional MID
tags that hybridize to the amplicon products of (a), and (c) pooling the products from (b).
sequence, the method comprising:
(a) performing a first multiplex nested PCR using a plurality of primers that hybridize to portions of the target DNA sequence while blocking undesired elongation to form a plurality of amplicons, wherein forward primers are selected with all downstream reverse primers to produce amplicon redundancy;
(b) using the amplicon products of (a) as a template, performing a second multiplex nested PCR comprising a plurality of adapter-tailed primers with optional MID
tags that hybridize to the amplicon products of (a), and (c) pooling the products from (b).
9. The method of claim 8, further comprising removing undesired genomic DNA, primer dimers and/or residual primers.
10. The method of claim 8 or 9, further comprising performing next generation sequencing to the pooled products from (c).
11. A method for amplifying and characterizing a barcode region from the cytochrome c oxidase 1 gene (COI) in a small specimen of degraded DNA using multiplex PCR, the method comprising:
- extracting the degraded DNA to provide a linear template;
- performing first multiplex nested PCR1 using a plurality of forward primers and downstream reverse primers that hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;
- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCR1 reaction as a template using adapted tailed primers with optional multiplex identifier tags (MID) that hybridize to portions of said amplicons to generate a more degenerate larger pool of amplicons, - pooling all amplicon products, said amplicon products spanning and overlapping the entire length of said barcode region; and - performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.
- extracting the degraded DNA to provide a linear template;
- performing first multiplex nested PCR1 using a plurality of forward primers and downstream reverse primers that hybridize to regions of said barcode region and simultaneously blocking undesired elongation such that a plurality of amplicons is created;
- performing a second multiplex PCR2 using the multiple amplicons generated from the first PCR1 reaction as a template using adapted tailed primers with optional multiplex identifier tags (MID) that hybridize to portions of said amplicons to generate a more degenerate larger pool of amplicons, - pooling all amplicon products, said amplicon products spanning and overlapping the entire length of said barcode region; and - performing next generation sequencing on the pooled amplicon products to determine the barcode sequence.
12. The method of claim 11 furthering comprising comparing said determined barcode sequence to a bank of characterized sequences to determine the species of said specimen.
13. A method for detection and identification of a barcode region of the COI gene in a small specimen containing degraded DNA to identify the phylogeny of said specimen, the method comprising;
- extracting linear degraded DNA from said specimen;
- performing two step multiplex nested PCR on said linear degraded DNA
using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;
- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and - classifying said specimen.
- extracting linear degraded DNA from said specimen;
- performing two step multiplex nested PCR on said linear degraded DNA
using primers that hybridize to said barcode region to create a plurality of redundant amplicons spanning the barcode region of the COI gene;
- performing next generation sequencing on said redundant amplicons to provide a sequence of the barcode region of the COI gene; and - classifying said specimen.
14. A kit for performing two step multiplex nested PCR on a small specimen comprising degraded DNA in order to determine the barcode region of the COI gene and thus classify the specimen, the kit comprising; primers specific for said barcode sequence, buffers, optional stabilizers, enzymes and instructions for use.
15. A method for amplifying degraded DNA, the method comprising:
amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.
amplifying the degraded DNA in a PCR 1 reaction in at least two separate reaction vessels using pairs of nested forward and reverse primers, wherein the two reactions vessels comprise different combinations of the forward and reverse primers to produce a plurality of redundant amplicons; and amplifying the redundant amplicons in a PCR2 reaction using one reaction vessel per forward primer, wherein each forward primer is mixed with a different combination of reverse primers.
16. The method of claim 15, wherein the forward and reverse primers in the PCR1 reaction comprise block elongation moieties to block elongation from the 5' end of the primers.
17. The method of claim 16, wherein the block elongation moieties comprise non-complementary tails.
18. The method of any one of claims 15 to 17, comprising from about 2 to about 10 forward primers and from about 2 to about 10 reverse primers.
19. The method of claim 18, comprising 6 forward primers (F1, F2, F3, F4, F5, and F6) and 6 reverse primers (R1, R2, R3, R4, R5, and R6).
20. The method of claim 21, wherein for PCR1, F1, F3, and F5 are paired with R1, R2, R3, R4, R5, and R6 and F2, F4, and F6 are paired with R1, R2, R3, R4, and R5.
21. The method of claim 20, wherein for PCR2, F1 is paired with R1, R2, and R3; F2 is paired with R2, R3, and R4; F3 is paired with R3, R4, and R5; F4 is paired with R5 and R6;
and F6 is paired with R6.
and F6 is paired with R6.
22. The method of any one of claims 15 to 21, wherein the primers for PCR2 comprise adapter tailed primers for sequencing.
23. The method of any one of claims 15 to 22, wherein the primers are degenerate.
24. A method for sequencing degraded DNA, the method comprising amplifying redundant amplicons such that each region of the target DNA sequence is covered by multiple amplicons, wherein the generation of specific amplicons is determined automatically by a combination of primer-template matching and the pattern of DNA degradation in the target sequence.
25. A method of amplifying a barcode region of a degraded DNA sample, the method comprising:
performing at least a PCR1a reaction and a PCR1b reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCR1a complement of amplicons and a PCR1b complement of amplicons, wherein the plurality of forward primers comprise primers F1, F2, ... , F n, in order from upstream to downstream of the target sequence, wherein n is a whole number;
wherein the plurality of reverse primers comprise primers R1, R2, ... , R m, in order from upstream to downstream of the target sequence, wherein m is a whole number;
wherein the plurality of reverse primers are downstream of F1 and the plurality of forward primers are upstream of R n;
wherein the PCR1 a reaction comprises each odd-numbered forward primer starting with F1 and further comprises all or substantially all of the reverse primers;
and wherein the PCR1b reaction comprises each even-numbered forward primer starting with F2 and further comprises all or substantially all of the reverse primers that are upstream of F2.
performing at least a PCR1a reaction and a PCR1b reaction utilizing a plurality of forward and reverse primers, respectively yielding a PCR1a complement of amplicons and a PCR1b complement of amplicons, wherein the plurality of forward primers comprise primers F1, F2, ... , F n, in order from upstream to downstream of the target sequence, wherein n is a whole number;
wherein the plurality of reverse primers comprise primers R1, R2, ... , R m, in order from upstream to downstream of the target sequence, wherein m is a whole number;
wherein the plurality of reverse primers are downstream of F1 and the plurality of forward primers are upstream of R n;
wherein the PCR1 a reaction comprises each odd-numbered forward primer starting with F1 and further comprises all or substantially all of the reverse primers;
and wherein the PCR1b reaction comprises each even-numbered forward primer starting with F2 and further comprises all or substantially all of the reverse primers that are upstream of F2.
26. The method of claim 25, wherein the forward and reverse primers comprise block elongation moieties to block elongation from the 5' end of the primers and reduce non-target amplification.
27. The method of claim 26, wherein the block elongation moieties comprise non-complementary tails.
28. The method of any one of claims 25 to 27, further comprising performing a plurality of PCR2 reactions, PCR2 1, PCR2 2, PCR2n, to amplify the PCR1a and PCR1b complements of amplicons, wherein each PCR2 reaction uses a different forward primer and a different set of one or more downstream reverse primers; and wherein the PCR1a complement of amplicons are amplified using odd-numbered forward primers and wherein the PCR1b complement of amplicons are amplified using even-numbered forward primers.
29. The method of claim 28, further comprising pooling the resulting amplicons.
30. The method of claim 28 or 29, wherein the primers for PCR2 are adapter-tailed for sequence analysis.
31. The method of any one of claims 28 to 30, wherein the primers for PCR2 are MID-tagged to associate amplicons with specific specimens, such that multiple specimens can be sequenced simultaneously.
32. The method of any one of claims 25 to 31, wherein n is from 2-10, such as 6.
33. The method of any one of claims 25 to 32, wherein m is from 2-10, such as 6.
34. The method of any one of claims 25 to 33, wherein the forward and reverse primers are as defined in Table 4.
35. The method of any one of claims 1 to 34, wherein the template is not depleted through use of the method.
36. A method of amplifying degraded DNA according to the scheme shown in Figures 2a and 2b herein.
37. The method of any one of claims 1 to 36, for taxonomic classification of unknown specimens.
38. The method of any one of claims 1 to 37, wherein the primers are degenerate.
39. The method of any one of claims 1 to 38, for analyzing a plurality of specimens simultaneously.
40. The method of any one of claims 1 to 39, wherein the method is for amplification of a sample comprising small amounts of degraded DNA, such as at least about 0.1 ng of degraded DNA, such as at least about 0.5 ng, about 1 ng, about 10 ng, about 100 ng, about 500 ng, or from about 2µg to about 5µg of degraded DNA.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201562206487P | 2015-08-18 | 2015-08-18 | |
| US62/206,487 | 2015-08-18 | ||
| PCT/CA2016/050970 WO2017027975A1 (en) | 2015-08-18 | 2016-08-18 | Method to amplify dna sequences from degraded sources |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CA3032535A1 true CA3032535A1 (en) | 2017-02-23 |
Family
ID=58050529
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CA3032535A Abandoned CA3032535A1 (en) | 2015-08-18 | 2016-08-18 | Method to amplify dna sequences from degraded sources |
Country Status (2)
| Country | Link |
|---|---|
| CA (1) | CA3032535A1 (en) |
| WO (1) | WO2017027975A1 (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018223188A1 (en) * | 2017-06-06 | 2018-12-13 | Murdoch Childrens Research Institute | Assay |
| CN108192897A (en) * | 2018-02-11 | 2018-06-22 | 云南省烟草农业科学研究院 | One grows tobacco rbcl genetic fragments and its application |
| CN109979536B (en) * | 2019-03-07 | 2022-12-23 | 青岛市疾病预防控制中心(青岛市预防医学研究院) | Species identification method based on DNA bar code |
| CN110331210B (en) * | 2019-04-29 | 2022-08-02 | 华南农业大学 | Mini-Barcoding primer for acquiring DNA bar code of collected beetle specimen and application thereof |
| CN113140256A (en) * | 2020-01-20 | 2021-07-20 | 深圳华大智造科技有限公司 | Substance DNA tracing method |
| CN113186338B (en) * | 2020-09-14 | 2022-07-26 | 中国科学院植物研究所 | Universal primer for identifying angiosperm plant species and application thereof |
-
2016
- 2016-08-18 CA CA3032535A patent/CA3032535A1/en not_active Abandoned
- 2016-08-18 WO PCT/CA2016/050970 patent/WO2017027975A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2017027975A1 (en) | 2017-02-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Burrell et al. | The use of museum specimens with high-throughput DNA sequencers | |
| Baloğlu et al. | A workflow for accurate metabarcoding using nanopore MinION sequencing | |
| Duhaime et al. | Towards quantitative metagenomics of wild viruses and other ultra‐low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method | |
| US9809840B2 (en) | Reference markers for biological samples | |
| JP7332733B2 (en) | High molecular weight DNA sample tracking tags for next generation sequencing | |
| Shokralla et al. | Next‐generation DNA barcoding: using next‐generation sequencing to enhance and accelerate DNA barcode capture from single specimens | |
| CA3032535A1 (en) | Method to amplify dna sequences from degraded sources | |
| Zhang et al. | Lighting up single-nucleotide variation in situ in single cells and tissues | |
| US10102337B2 (en) | Digital measurements from targeted sequencing | |
| US20160115544A1 (en) | Molecular barcoding for multiplex sequencing | |
| CN110878345A (en) | Increasing confidence in allele calls by molecular counting | |
| CN105121664A (en) | Methods of sequencing nucleic acids in mixtures and compositions related thereto | |
| KR20110106922A (en) | Single Cell Nucleic Acid Analysis | |
| CA3057163A1 (en) | Methods of attaching adapters to sample nucleic acids | |
| D’Ercole et al. | A SMRT approach for targeted amplicon sequencing of museum specimens (Lepidoptera)—patterns of nucleotide misincorporation | |
| Méndez-García et al. | Metagenomic protocols and strategies | |
| CN109072296B (en) | Methods for direct target sequencing using nuclease protection | |
| Hsieh et al. | A rapid insect species identification system using mini‐barcode pyrosequencing | |
| WO2018235938A1 (en) | Method of sequencing and analyzing nucleic acid | |
| WO2019117704A1 (en) | Methods for detecting pathogenicity of ganoderma sp. | |
| US10927405B2 (en) | Molecular tag attachment and transfer | |
| Puritz et al. | Expressed Exome Capture Sequencing (EecSeq): a method for cost-effective exome sequencing for all organisms with or without genomic resources | |
| Bargiela et al. | Metagenomic protocols and strategies | |
| Wilson | Document Title: Assessing Deep Sequencing Technology for Human Forensic Mitochondrial DNA Analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FZDE | Discontinued |
Effective date: 20221108 |
|
| FZDE | Discontinued |
Effective date: 20221108 |