WO2019086531A1 - Linear consensus sequencing - Google Patents
Linear consensus sequencing Download PDFInfo
- Publication number
- WO2019086531A1 WO2019086531A1 PCT/EP2018/079854 EP2018079854W WO2019086531A1 WO 2019086531 A1 WO2019086531 A1 WO 2019086531A1 EP 2018079854 W EP2018079854 W EP 2018079854W WO 2019086531 A1 WO2019086531 A1 WO 2019086531A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- target nucleic
- adaptor
- strand
- single stranded
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
Definitions
- Consensus sequencing on single molecule sequencers generally relies on sequencing a circular template such that the library insert is read multiple times in a single long polymerase read.
- the use of circular templates in sequencing is also known in the art. See U.S. Pat. Nos. 7,302,146 and 8,153,375.
- Linear nucleic acids can be converted into a circular form for amplification and subsequent detection and quantification, see U.S. Pat. No. RE44265.
- the present invention is a novel efficient method of creating nucleic acid templates suitable for sequencing.
- the method is an improvement that uncouples circle formation and sequencing.
- the method yields a linear double stranded product which contains many copies of the library molecule.
- the product can be read in a linear manner (single pass) so that the original short insert is read multiple times without the need to sequence it in a circular manner.
- the method has multiple advantages described in detail below.
- the invention is a method of forming a concatenated nucleic acid template for analysis, comprising the steps of: ligating at least one end of a double stranded target nucleic acid to a first adaptor to form an adapted target nucleic acid; separating the strands of the adapted target nucleic acid to form a single stranded adapted target nucleic acid; circularizing the single stranded adapted target nucleic acid to form a single stranded circle comprising at least one first adaptor sequence; annealing a primer to the single stranded circle; extending the primer with a DNA polymerase to generate a nucleic acid strand comprising multiple copies of the target nucleic acid; generating a copy strand of the nucleic acid strand forming a concatemer comprising multiple copies of the target nucleic acid; ligating second adaptor to the concatenate wherein one strand of the second adaptor comprises a primer binding site thereby forming a
- the method may further comprise a step of amplifying the adapted target nucleic acid.
- the amplification primers may comprise sequences at least partially complementary to the adaptors.
- the adaptors may comprise modifications for capturing a strand of single stranded adapted target nucleic acids after strand separation. The modification can be a 5'-phosphate group or a ligand for a capture molecule.
- the method can further comprise a step of removing one of the two single strands of the adapted target nucleic acid, e.g., by exonuclease digestion or affinity capture.
- the first adaptor may comprise a barcode such as a unique molecular ID (UID) and a sample ID (SID).
- UID unique molecular ID
- SID sample ID
- the circularization step may comprise the use of a circularization probe.
- the method may further comprise a step of removing uncircularized nucleic acids after circularization.
- the primer is target-specific and in other embodiments the primer is a collection of one or more random sequences such as a mixture of random hexamers.
- the method DNA polymerase may be a strand displacing polymerase.
- the amplification may be multiple displacement amplification (MDA) or its variation rolling circle amplification (RCA).
- MDA multiple displacement amplification
- RCA variation rolling circle amplification
- the concatemer may be fragmented to desired size.
- the invention is a method of making a library of concatenated nucleic acid templates for sequencing comprising: ligating at least one end of double stranded target nucleic acids in a sample to a first adaptor to form adapted target nucleic acids; separating the strands of the adapted target nucleic acids to form single stranded adapted target nucleic acids; circularizing the single stranded adapted target nucleic acids to form single stranded circles comprising at least one first adaptor sequence; annealing a primer to the adaptor sequence in the single stranded circles; extending the primer with a DNA polymerase to generate nucleic acid strands comprising multiple copies of the target nucleic acid; generating copy strands of the nucleic acid strands forming concatemers comprising multiple copies of the target nucleic acid; ligating second adaptors to the concatenates wherein one strand of the second adaptor comprises a sequencing primer binding site thereby forming a library of
- the method may comprise a step of amplifying the adapted target nucleic acids.
- the amplification primers for the library may comprise sequences at least partially complementary to the adaptors.
- the adaptors may comprise modifications for capturing a strand of single stranded adapted target nucleic acids, such as a 5'-phosphate group or a ligand for a capture molecule.
- the method may comprise a step of removing one of the two single strands of the adapted target nucleic acids, e.g., by exonuclease digestion or affinity capture.
- the first adaptor may comprise at least one barcode such as a unique molecular ID (UID) and a sample ID (SID).
- the circularization step may comprise the use of a circularization probe.
- the method may comprise a step of removing uncircularized nucleic acids.
- the primer for the library may be a collection of one or more random sequences such as a mixture of random hexamers.
- the DNA polymerase may be a strand displacing polymerase.
- the amplification for the library may be multiple displacement amplification (MDA) or its variation rolling circle amplification (RCA).
- MDA multiple displacement amplification
- RCA variation rolling circle amplification
- the method may further comprise a step of fragmenting the concatemer.
- the method may further comprise a step of debranching the concatemer.
- the invention is a method of forming a concatenated nucleic acid template for analysis, comprising the steps of: separating the strands of a target nucleic acid in a sample to form a single stranded target nucleic acid; circularizing the single stranded target nucleic acid to form a single stranded circle; annealing a primer to the single stranded circle; extending the primer with a DNA polymerase to generate a nucleic acid strand comprising multiple copies of the target nucleic acid; generating a copy strand of the nucleic acid strand forming a concatemer comprising multiple copies of the target nucleic acid; ligating an adaptor to the concatemer wherein one strand of the adaptor comprises a primer binding site thereby forming a concatenated nucleic acid template for analysis.
- the method may further comprise a step of fragmenting the concatemers.
- the method may further comprise a step of debranching the concatemers.
- the invention is a method of determining the sequence of a double-stranded target nucleic acid in a sample comprising: forming a concatenated nucleic acid template by any of the methods set forth above; contacting the sample with a sequencing primer complementary to the primer binding site in the second adaptor; and extending the sequencing primer with a nucleic acid polymerase thereby determining the sequence of the target nucleic acid.
- the sequence may be determined by a method utilizing a nanopore.
- Figure 1 shows a general scheme of the Linear Consensus Sequencing method.
- Figure 2 is a detailed diagram of the sequencing step.
- Figure 3 is a diagram of a consensus read obtained from subreads.
- Figure 4 shows a reduction in error rate through consensus sequencing. DETAILED DESCRIPTION OF THE INVENTION Definitions
- sample refers to any composition containing or presumed to contain target nucleic acid.
- sample includes a sample of tissue or fluid isolated from an individual for example, skin, plasma, serum, spinal fluid, lymph fluid, synovial fluid, urine, tears, blood cells, organs and tumors, and also to samples of in vitro cultures established from cells taken from an individual, including the formalin- fixed paraffin embedded tissues (FFPET) and nucleic acids isolated therefrom.
- FFFPET formalin- fixed paraffin embedded tissues
- a sample may also include cell-free material, such as cell-free blood fraction that contains cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA).
- nucleic acid refers to polymers of nucleotides (e.g., ribonucleotides and deoxyribonucleotides, both natural and non-natural) including DNA, RNA, and their subcategories, such as cDNA, mRNA, etc.
- a nucleic acid may be single-stranded or double-stranded and will generally contain 5'-3' phosphodiester bonds, although in some cases, nucleotide analogs may have other linkages.
- Nucleic acids may include naturally occurring bases (adenosine, guanosine, cytosine, uracil and thymidine) as well as non-natural bases.
- non-natural bases include those described in, e.g., Seela et al, (1999) Helv. Chim. Acta 82:1640.
- the non-natural bases may have a particular function, e.g., increasing the stability of the nucleic acid duplex, inhibiting nuclease digestion or blocking primer extension or strand polymerization.
- Polynucleotide and "oligonucleotide” are used interchangeably.
- Polynucleotide is a single-stranded or a double-stranded nucleic acid.
- Oligonucleotide is a term sometimes used to describe a shorter polynucleotide.
- Oligonucleotides are prepared by any suitable method known in the art, for example, by a method involving direct chemical synthesis as described in Narang et al. (1979) Meth. Enzymol. 68:90-99; Brown et al. (1979) Meth. Enzymol. 68:109-151; Beaucage et al. (1981) Tetrahedron Lett. 22:1859-1862; Matteucci et al. (1981) /. Am. Chem. Soc. 103:3185-3191.
- primer refers to a single-stranded oligonucleotide which hybridizes with a sequence in the target nucleic acid ("primer binding site") and is capable of acting as a point of initiation of synthesis along a complementary strand of nucleic acid under conditions suitable for such synthesis.
- adaptor means a nucleotide sequence that may be added to another sequence so as to import additional properties to that sequence.
- An adaptor is typically an oligonucleotide that can be single- or double-stranded, or may have both a single-stranded portion and a double-stranded portion.
- Ligation refers to a condensation reaction joining two nucleic acid strands wherein a 5'-phosphate group of one molecule reacts with the 3'- hydroxyl group of another molecule.
- Ligation is typically an enzymatic reaction catalyzed by a ligase or a topoisomerase.
- Ligation may join two single strands to create one single-stranded molecule.
- Ligation may also join two strands each belonging to a double-stranded molecule thus joining two double-stranded molecules.
- Ligation may also join both strands of a double-stranded molecule to both strands of another double-stranded molecule thus joining two double- stranded molecules.
- Ligation may also join two ends of a strand within a double- stranded molecule thus repairing a nick in the double-stranded molecule.
- barcode refers to a nucleic acid sequence that can be detected and identified. Barcodes can be incorporated into various nucleic acids. Barcodes are sufficiently long e.g., 2, 5, 20 nucleotides, so that in a sample, the nucleic acids incorporating the barcodes can be distinguished or grouped according to the barcodes.
- multiplex identifier or "MID” refers to a barcode that identifies a source of a target nucleic acids (e.g., a sample from which the nucleic acid is derived). All or substantially all the target nucleic acids from the same sample will share the same MID. Target nucleic acids from different sources or samples can be mixed and sequenced simultaneously. Using the MIDs the sequence reads can be assigned to individual samples from which the target nucleic acids originated.
- UID unique molecular identifier
- universal primer and "universal priming binding site” or “universal priming site” refer to a primer and primer binding site present in (typically, through in vitro addition to) different target nucleic acids.
- the universal priming site is added to the plurality of target nucleic acids using adaptors or using target-specific (non-universal) primers having the universal priming site in the 5'- portion.
- the universal primer can bind to and direct primer extension from the universal priming site.
- the term “universal” refers to a nucleic acid molecule (e.g., primer or other oligonucleotide) that can be added to any target nucleic acid and perform its function irrespectively of the target nucleic acid sequence.
- the universal molecule may perform its function by hybridizing to the complement, e.g., a universal primer to a universal primer binding site or a universal circularization oligonucleotide to a universal primer sequence.
- target sequence target nucleic acid
- target refer to a portion of the nucleic acid sequence in the sample which is to be detected or analyzed.
- target includes all variants of the target sequence, e.g., one or more mutant variants and the wild type variant.
- Amplification refers to a process of making additional copies of the target nucleic acid. Amplification can have more than one cycle, e.g., multiple cycles of exponential amplification. Amplification may also have only one cycle
- the copy may have additional sequences, e.g., those present in the primers used for amplification.
- the copies may be interspersed with adapter sequences.
- the product is linearized and sequenced on a single molecule sequencer in a linear manner (single pass). In this process, the original target sequence is read multiple times without the need to sequence it in a circular manner. The parsing of sub reads and identification of unique molecules is aided by the addition of an intervening adaptor sequence.
- the adaptor may contain a constant or variable unique region (barcode).
- the resulting long concatemer molecule consists of sense or antisense (not both) repeats of the original target molecule.
- the advantages include low target molecule input.
- the MDA reaction is capable of generating microgram quantities of DNA from a very small amount of template.
- This method is suitable for clinical applications such as analysis of cell-free DNA (cfDNA, including circulating tumor DNA, ctDNA and cell-free fetal DNA).
- cfDNA cell-free DNA
- Another advantage is high fidelity stemming from the use of a proof reading DNA polymerase to copy the circularized molecule .
- Yet another advantage is obviating the need of barcodes such as UIDs to track the copies of an original molecule (as is the case with PCR amplification).
- UIDs to track the copies of an original molecule
- RCA/MDA the copies of the original molecule are joined in the concatenate.
- Yet another advantage is the ability to adjust the length of the template to meet the need of the sequencer.
- the concatemer molecule can be sheared to desired length as dictated by sequencing read length and original insert size. The steps of the method are shown in a diagram on Figure 1 and described in detail below.
- the present invention comprises detecting a target nucleic acid in a sample.
- the sample is derived from a subject or a patient.
- the sample may comprise a fragment of a solid tissue or a solid tumor derived from the subject or the patient, e.g., by biopsy.
- the sample may also comprise body fluids (e.g., urine, sputum, serum, plasma or lymph, saliva, sputum, sweat, tear, cerebrospinal fluid, amniotic fluid, synovial fluid, pericardial fluid, peritoneal fluid, pleural fluid, cystic fluid, bile, gastric fluid, intestinal fluid, and/or fecal samples),
- the sample may comprise whole blood or blood fractions where tumor cells may be present.
- the sample especially a liquid sample may comprise cell-free material such as cell-free DNA or RNA including cell-free tumor DNA or tumor RNA.
- the present invention is especially suitable for analyzing rare and low quantity targets.
- the sample is a cell- free sample, e.g., cell-free blood-derived sample where cell-free tumor DNA or tumor RNA are present.
- the sample is a cultured sample, e.g., a culture or culture supernatant containing or suspected to contain an infectious agent or nucleic acids derived from the infectious agent.
- the infectious agent is a bacterium, a protozoan, a virus or a mycoplasma.
- the target nucleic acid is characteristic of a human subject, e.g., the HLA or KIR sequence defining the subject's unique HLA or KIR genotype.
- all the sequences in the sample are target nucleic acids e.g., in shotgun genomic sequencing.
- a double-stranded target nucleic acid is converted into the template configuration of the invention.
- the target nucleic acid occurs in nature in a single-stranded form (e.g., RNA, including mRNA, micro RNA, viral RNA; or single-stranded viral DNA).
- RNA including mRNA, micro RNA, viral RNA; or single-stranded viral DNA.
- the single-stranded target nucleic acid is converted into double-stranded form to enable the further steps of the claimed method.
- target nucleic acids may be fragmented although in some applications longer target nucleic acids may be desired to achieve a longer read.
- the target nucleic acid is naturally fragmented, e.g., circulating cell-free DNA (cfDNA) or chemically degraded DNA such as the one founds in preserved samples.
- the target nucleic acid is fragmented in vitro, e.g., by physical means such as sonication or by endonuclease digestion, e.g., restriction digestion.
- the invention is a method comprising a step of amplifying the target nucleic acid.
- PCR amplification is used to target a specific genomic region for further analysis of the template prepared by the method of the invention.
- the amplification may be by polymerase chain reaction (PCR) or any other method that utilizes oligonucleotide primers.
- the amplification primers are used to introduce auxiliary sequences into the target nucleic acid.
- the amplification primers are used to introduce DNA modifications, e.g., those to enable exonuclease digestion or target capture e.g., with an affinity molecule such as streptavidin capturing biotinylated nucleic acids.
- the target-specific primers are used as a pair of distinct oligonucleotides, e.g., a forward and a reverse primer.
- a sequence can be added to the 5'-end of the forward and the reverse primer.
- a universal sequence comprising an adaptor is added.
- the method may also utilize an adaptor such as a universal adaptor sequence that is conjugated to the target sequence.
- the adaptor comprises a registration sequence (a known sequence that can be used to bioinformatically to separate out the iterations of the target sequnce within the concatemer.
- the adaptor can be used to incorporate a modification that can be used to enrich for a single stranded template prior to circular ization.
- the modification is a ligand for a capture moiety, e.g., biotin.
- the modification is a 5'- phosphorylation protecting the strand from exonuclease degradation that removes the complementary strand without the modification.
- the adaptor may also comprise primer binding sites such as amplification primer binding sites and sequencing primer binding sites.
- primer binding sites such as amplification primer binding sites and sequencing primer binding sites.
- the modifications described above e.g., biotin or 5'-phosphate
- adaptors are added independent of the sequence of the target nucleic acid, for example, by ligation.
- the target nucleic acids in a sample receive the same adaptor molecule at each end.
- the adaptor may have a Y-structure, see e.g., U.S. Patent Nos. 8053192, 8182989 and 8822150.
- the adaptor molecules are ligated to the target nucleic acid.
- the ligation can be a blunt-end ligation or a more efficient cohesive-end ligation.
- the target nucleic acid or the adaptors may be rendered blunt-ended by strand-filling, i.e., extending a 3'-terminus by a DNA polymerase to eliminate a 5'-overhang.
- the blunt-ended adaptors and target nucleic acid may be rendered cohesive by addition of a single nucleotide to the 3'-end of the adaptor and a single complementary nucleotide to the 3'-ends of the target nucleic acid, e.g., by a DNA polymerase or a terminal transferase.
- the adaptors and the target nucleic acid may acquire cohesive ends (overhangs) by digestion with restriction endonucleases. The latter option is more advantageous for known target sequences that are known to contain the restriction enzyme recognition site. In some embodiments, other enzymatic steps may be required to accomplish the ligation.
- a polynucleotide kinase may be used to add 5'-phosphates to the target nucleic acid molecules and adaptor molecules.
- adaptors are present in the 5'-regions of target- specific primers and are added via primer extension or amplification.
- the adaptor molecules are in vitro synthesized artificial sequences. In other embodiments, the adaptor molecules are in vitro synthesized naturally-occurring sequences. In yet other embodiments, the adaptor molecules are isolated naturally occurring molecules.
- the invention comprises introduction of barcodes into the target nucleic acids. Sequencing individual molecules typically requires molecular barcodes such as described e.g., in U.S. Patent Nos. 7,393,665, 8,168,385, 8,481,292, 8,685,678, and 8,722,368.
- a unique molecular barcode is a short artificial sequence added to each molecule in a sample such as a patient's sample typically during the earliest steps of in vitro manipulations. The barcode marks the molecule and its progeny.
- the unique molecular barcode (UID) has multiple uses.
- Barcodes allow tracking each individual nucleic acid molecule in the sample to assess, e.g., the presence and amount of circulating tumor DNA (ctDNA) molecules in a patient's blood in order to detect and monitor cancer without a biopsy. See U.S. patent applications 14/209,807 and 14/774,518.
- Unique molecular barcodes can also be used for sequencing error correction. The entire progeny of a single target molecule is marked with the same barcode and forms a barcoded family. A variation in the sequence not shared by all members of the barcoded family is discarded as an artifact and not a true mutation. Barcodes can also be used for positional deduplication and target quantification, as the entire family represents a single molecule in the original sample. See Id.
- barcodes also serve as registration sequences separating copies of the target nucleic acids within concatemers described herein.
- adaptors comprise one or more barcodes.
- a barcode can be a multiplex sample ID (MID) used to identify the source of the sample where samples are mixed (multiplexed).
- the barcode may also serve as a unique molecular ID (UID) used to identify each original molecule and its progeny.
- the barcode may also be a combination of a UID and an MID. In some embodiments, a single barcode is used as both UID and MID.
- each barcode comprises a predefined sequence. In other embodiments, the barcode comprises a random sequence. Barcodes can be 1- 20 nucleotides long.
- the method further comprises a step of separating the strands of the target nucleic acid or the adapted target nucleic acid. In some embodiments, both of the separated strands are retained for downstream analysis, e.g., sequencing.
- the two strands may be separated by physical means, i.e., alkaline denaturation or heat denaturation.
- the strands can also be separated enzymatically e.g., by selective degradation of one strand by a nuclease such as en do- or an exonuclease or a combination thereof.
- one strand can be marked by the presence of deoxyuracils that are converted to abasic sites by uracil-N-DNA glycosylase (UNG).
- UNG uracil-N-DNA glycosylase
- the same strand is subsequently degraded by heat with an optional aid of an exonuclease.
- one strand is retained for further analysis via affinity capture.
- a biotinylated strand may be captured on a streptavidin containing substrate after denaturation and retained while the complementary strand may not be retained or be discarded.
- the method further comprises a step of circularizing a single stranded nucleic acid, (see Figure 1).
- the method comprises direct ligation of the ends of the single stranded nucleic acid.
- the ligation step utilizes a ligase or another enzyme with a similar activity or a non-enzymatic reagent.
- the ligase can be a DNA or RNA ligase, e.g., of viral or bacterial origin such as T4 or E. colt ligase, or thermostable ligases Afu, Taq, Tfl or Tth
- an alternative enzyme e.g., topoisomerase can be used.
- a non-enzymatic reagent can be used to form the phosphor-diester bond between the 5'-phosphate of the primer extension product and the 3'-OH of the adaptor as described e.g., in US20140193860.
- the ligase is a single stranded DNA ligase such as one available in Accel-NGS TM IS DNA Library Kit (Swift Biosciences, Ann Arbor, Mich.) or Thermophage Ligase or its derivatives such as Circligase TM (Epicentre Tech., Madison, Wise.)
- circularization step utilizes a circularization probe.
- the circularization probe is complementary to the adaptor sequences.
- the step includes contacting the 5'-phophorylated strand of the adapted target nucleic acid with a circularization oligonucleotide (probe) to generate a hybrid structure wherein the adaptor sequences in the adapted target nucleic acid strand are hybridized to the circularization probe so that the ends of the strand are brought into proximity.
- the strands are brought into ligatable proximity.
- a gap is left between the 5'- and 3'-ends of the target nucleic acid strand.
- the circularization further comprises an extension step wherein the 3'-end of the adapted target nucleic acid is extended to come into ligatable proximity with its 5'- end.
- the invention further comprises a ligation step comprising ligating the 5'- and 3'-ends of the adapted target nucleic acid thereby forming a circular single stranded molecule.
- the invention comprises an exonuclease digestion step wherein the linear nucleic acids possibly comprising un- circularized target nucleic acids or adapted target nucleic acids or any oligonucleotides present in the reaction mixture are removed from the reaction mixture.
- the exonuclease is a bacterial exonuclease such as E.coli exonuclease, e.g., Exo V, Exo III, ExoVI, Exol, T5 exonuclease, T7 exonuclease or Lambda exonuclease or a combination thereof.
- the invention comprises a primer extension step. Following the formation of a circle, a primer may anneal to the primer binding site to initiate strand synthesis by primer extension.
- a gene-specific primer is used.
- a universal primer e.g., a primer with a binding site present in the adaptor sequence can be used.
- random priming e.g., with a plurality of random oligonucleotides is used. In some embodiments, the priming utilizes a plurality of random hexamers.
- a strand displacing polymerase is used to perform rolling circle amplification (RCA) comprising synthesis of a strand comprising multiple iterations of a sequence complementary to a circular molecule from a single primer. Extension of a reverse primer binding to the primer binding site in adaptor sequence in the nascent strand is used to synthesize a second strand forming a double-stranded copy molecule.
- the DNA polymerase used for RCA is a strand displacing polymerase.
- the DNA polymerase has a 3'-5' exonuclease activity.
- the polymerase is a viral polymerase, e.g., phi29 DNA polymerase.
- a strand displacing polymerase is used to perform amultiple displacement amplification (MDA) from multiple primers annealing throughout the template strand.
- the primers are a collection of oligonucleotides.
- the oligonucleotides in the collection have a random sequence, e.g., the primers are a collection of random hexamers.
- the primers have modifications that prevent their degradation by a 5'-3' exonuclease, e.g., the exonuclease activity of the DNA polymerase.
- the same collection of primers primes synthesis of a second strand forming a double-stranded copy molecule.
- the DNA polymerase used for RCA is a strand displacing polymerase.
- the DNA polymerase has a 3'-5' exonuclease activity.
- the polymerase is a viral polymerase, e.g., phi29 DNA polymerase.
- a combination of RCA and MDA is used.
- the invention comprises a step of forming a concatemer.
- a concatemer is a nucleic acid containing multiple copies of a particular sequence, (such as a target sequence) arranged in tandem and separated by a registration sequence (such as an adaptor).
- the concatemer is a product of rolling circle amplification (RCA).
- the concatemer is a product of multiple displacement amplification (MDA).
- the concatemer is a product of a combination of RCA and MDA.
- the linear concatemers are reduced in size by a method selected from physical shearing (heat, sonication, size exclusion chromatography, electrophoresis, hydrodynamic shearing), enzymatic digestion (partial restriction digestion, DNase I digestion, transposase fragmentation, any non-specific nuclease) or chemical shearing, e.g., treatment with divalent metal cations.
- the amplification products including MDA products are debranched. The debranching may comprise treatment of the strand displacing polymerase products with an additional enzyme such as a single strand-specific endonuclease, e.g., SI endonuclease.
- the present invention further comprises a step of ligating a second adaptor to the ends of the concatemer comprising multiple copies of the adapted target nucleic acid.
- the ligation of the second adaptor can be a blunt-end or a cohesive-end ligation.
- the target nucleic acid or the adaptors may be rendered blunt-ended by strand-filling, i.e., extending a 3'-terminus by a DNA polymerase to eliminate a 5'-overhang.
- the blunt-ended adaptors and target nucleic acid may be rendered cohesive by addition of a single nucleotide to the 3'-end of the adaptor and a single complementary nucleotide to the 3'-ends of the target nucleic acid, e.g., by a DNA polymerase or a terminal transferase.
- the adaptors and the target nucleic acid may acquire cohesive ends (overhangs) by digestion with restriction endonucleases.
- a polynucleotide kinase may be used to add 5'-phosphates to the target nucleic acid molecules and adaptor molecules.
- the adaptor comprises a single-stranded region.
- the signal stranded region may comprise two non-complementary strands (Y-adaptor) or comprise only one strand (asymmetric adaptor) as shown in Figure 1, where one strand is longer.
- the invention is a method of making a library of target nucleic acids.
- the library comprises linear concatemers of target nucleic acids separated by a registration sequence.
- the library is suitable for sequencing of the target nucleic acids.
- the method comprises a step wherein first adaptors are ligated to the multiple target nucleic acids in the sample to create a library of adapted molecules.
- the molecules in the library comprise target sequences flanked by adaptor sequences.
- the adapted target nucleic acids are circularized as described herein to form a library of circular adapted target nucleic acids.
- the library is then subjected to primer extension to form a library of concatemers each comprising multiple copies of an adapted target nucleic acid.
- the library of linear concatemers is adjusted for size as described herein.
- the method further comprises a step wherein second adaptors are ligated to the library of concatemers.
- the final product is a library of concatemers of adapted target nucleic acids flanked by second adaptors.
- Each molecule in the library comprises multiple iterations of the target nucleic acid separated by registration sequences.
- the registration sequence comprises all or a part of the sequence of the first adaptor.
- the present invention comprises detecting target nucleic acids in a sample by nucleic acid sequencing. Multiple nucleic acids, including all the nucleic acids in a sample may be converted into the template configuration of the invention and sequenced.
- the library of concatemers of target nucleic acids can be subjected to nucleic acid sequencing.
- the single stranded region of the second adaptor comprises a primer binding site, e.g., a sequencing primer binding site. As shown in Figure 1, the primer binding sites can initiate a sequencing read of each strand.
- the sequencing read will contain multiple copies of each strand of the target sequence e.g., the read of the top strand and the bottom strand of the concatemer. Thereby each read will contain multiple iterations of the (+) strand and the (-) strand of the target nucleic acid.
- Sequencing can be performed by any method known in the art. Especially advantageous is the high-throughput single molecule sequencing. Examples of such technologies include the Illumina HiSeq platform (Illumina, San Diego, Cal.), Ion Torrent platform (Life Technologies, Grand Island, NY), Pacific Biosciences platform utilizing the SMRT ( Pacific Biosciences, Menlo Park, Cal.) or a platform utilizing nanopore technology such as those manufactured by Oxford Nanopore Technologies (Oxford, UK) or Roche Sequencing Solutions (Santa Clara, Cal.) and any other presently existing or future DNA sequencing technology that does or does not involve sequencing by synthesis.
- the sequencing step may utilize platform-specific sequencing primers. Binding sites for these primers may be introduced in the method of the invention as described herein, i.e., by being a part of second adaptors or amplification primers.
- the invention comprises a step of determining a consensus sequence. Sequencing of concatemer library molecules produces reads which contain tandemly repeated copies of the target molecule (sub-reads). The sub-read sequences can be used to compute a high-accuracy consensus sequence of the original target nucleic acid molecule ( Figure 3). Figure 4 demonstrates that as the number of sub-reads used to compute the consensus sequence increases, the error rate decreases drastically from over 5% to about 0.5%.
- the sequencing step involves sequence analysis including a step of sequence aligning.
- aligning is used to determine a consensus sequence from a plurality of sequences, e.g., a plurality having the same barcodes (UID).
- barcodes are used to determine a consensus from a plurality of sequences all having an identical barcode (UID).
- barcodes (UIDs) are used to eliminate artifacts, i.e., variations existing in some but not all sequences having an identical barcode (UID). Such artifacts resulting from PCR errors or sequencing errors can be eliminated.
- the number of each sequence in the sample can be quantified by quantifying relative numbers of sequences with each barcode (UID) in the sample.
- UID barcode
- Each UID represents a single molecule in the original sample and counting different UIDs associated with each sequence variant can determine the fraction of each sequence in the original sample.
- a person skilled in the art will be able to determine the number of sequence reads necessary to determine a consensus sequence.
- the relevant number is reads per UID ("sequence depth") necessary for an accurate quantitative result.
- the desired depth is 5-50 reads per UID.
- Adapted target nucleic acids were prepared using the Kapa Hyper Plus kit (Roche Sequencing Solutions, Pleasanton, Cal.) according to the manufacturer's instructions. Input of E. coli genomic DNA was 500 ng per reaction in 35 ⁇ water. Kapa Hyper plus library workflow was used for shearing, repair and A-tailing. DNA was sheared with Kapa Frag into fragments of approximately 300 bp.
- Size selection and purification of the fragmented DNA was performed using Kapa Pure beads, with a 0.9X - 0.7X cut.
- the resulting library had an insert size of approximately 200 bp.
- oligonucleotides were suspended in annealing buffer (50 mM
- the ligation reaction was set up as follows and incubated at 20 °C for 15 minutes.
- This step comprises the denaturation of the template using heat which renders the library molecules single stranded, followed by single stranded circularization using a single strand DNA-specific ligase.
- thermostable single stranded binding protein such as that provided by NEB: ET SSB (M2401S). This specific protein is not required - any thermostable SSB (e.g., archaeal) would suffice.
- Epicentre CircLigase (CL4111K) or CircLigase II (CL9021K) were used according to the manufacturer's instructions.
- the adapter ligated DNA from the previous step was incubated with or without ET SSB protein added.
- the reaction was incubated at 95°C for 15 minutes after which the reaction was cooled to 60°C and the CircLigase master mix added.
- Circularization step was performed as follows:
- This step uses single strand-specific (Exonuclease I) and double strand-specific (Exonuclease III) DNA exonucleases to remove all non-circularized molecules.
- the enzymes Exonuclease I (M0293S) and Exonuclease III (M0206S) from New England Biolabs were used according to the manufacturer's instructions.
- the circular templates were diluted and phosphorothioate-protected random heptamers were annealed.
- the heptamers exonuclease resistant random primers, ThermoFisher Scientific (S0181) and phi29 DNA polymerase 10 U/ ⁇ (NEB M0269S) were used according to the manufacturers' instructions. volume Concentration.
- Annealed DNA 20 ⁇ phi29 DNA polymerase 2 ⁇ 1 20U phi29 polymerase buffer 10X 5 ⁇ IX dNTP mix (10 mM each dNTP) ⁇ ⁇ 0.2 mM each dNTP
- This step is comprises shearing the MDA product to the desired size followed by sequencing adapter ligation, using standard Kapa Hyper Prep chemistry.
- oligonucleotides were suspended in annealing buffer (50 mM NaCl; 10 mM Tris, pH 8.0) to a concentration of 100 ⁇ . Two oligos comprising the adaptor were combined in equal molar amounts. The mixed oligonucleotides were heated to 94°C for 2 minutes and gradually cooled at 2% ramp to 20°C.
- annealing buffer 50 mM NaCl; 10 mM Tris, pH 8.0
- Figure 3 shows how consensus analysis can be performed from the RCA concatemers to obtain higher accuracy.
- the sequencing polymerase reads th rough the concatemer that contains n number of subreads. Each sub read is a copy of the in itial DN A molecule and can be flanked by key registration sequences. The key registration sequences are used to parse out the subreads, which are then aligned against each other to create a consensus read. To get consensus reads align ment was done with Burrows-Wheeler Aligner ( BWA) to create consensus. The error rate of the consensus was determined by calculating the % mismatches bases against a reverence sequence. Mutations that are not found in the majorit of the subreads are discarded, resulting in a reduction of error rate ( Figure 4).
- BWA Burrows-Wheeler Aligner
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention is a novel method of sequencing nucleic acids involving the generation of a library of linear concatenated target nucleic acids, comprising the steps of: (i) ligating at least one end of a double stranded target nucleic acid to a first adaptor to form an adapted target nucleic acid; (ii) separating the strands of the adapted target nucleic acid to form a single stranded adapted target nucleic acid; (iii) circularizing the single stranded adapted target nucleic acid to form a single stranded circle comprising at least one first adaptor sequence; (iv) annealing a primer to the single stranded circle; (v) extending the primer with a DNA polymerase to generate a nucleic acid strand comprising multiple copies of the target nucleic acid; (vi) generating a copy strand of the nucleic acid strand from step e) forming a concatemer comprising multiple copies of the target nucleic acid; and (vii) ligating second adaptor to the concatemer wherein one strand of the second adaptor comprises a primer binding site thereby forming a concatenated nucleic acid template for analysis.
Description
LINEAR CONSENSUS SEQUENCING
FIELD OF THE INVENTION The invention relates to the field of nucleic acid analysis and more specifically, to preparing templates for nucleic acid sequencing.
BACKGROUND OF THE INVENTION
Consensus sequencing on single molecule sequencers generally relies on sequencing a circular template such that the library insert is read multiple times in a single long polymerase read. The use of circular templates in sequencing is also known in the art. See U.S. Pat. Nos. 7,302,146 and 8,153,375. Linear nucleic acids can be converted into a circular form for amplification and subsequent detection and quantification, see U.S. Pat. No. RE44265. The present invention is a novel efficient method of creating nucleic acid templates suitable for sequencing. The method is an improvement that uncouples circle formation and sequencing. The method yields a linear double stranded product which contains many copies of the library molecule. The product can be read in a linear manner (single pass) so that the original short insert is read multiple times without the need to sequence it in a circular manner. The method has multiple advantages described in detail below.
SUMMARY OF THE INVENTION
In some embodiments, the invention is a method of forming a concatenated nucleic acid template for analysis, comprising the steps of: ligating at least one end of a double stranded target nucleic acid to a first adaptor to form an adapted target nucleic acid; separating the strands of the adapted target nucleic acid to form a
single stranded adapted target nucleic acid; circularizing the single stranded adapted target nucleic acid to form a single stranded circle comprising at least one first adaptor sequence; annealing a primer to the single stranded circle; extending the primer with a DNA polymerase to generate a nucleic acid strand comprising multiple copies of the target nucleic acid; generating a copy strand of the nucleic acid strand forming a concatemer comprising multiple copies of the target nucleic acid; ligating second adaptor to the concatenate wherein one strand of the second adaptor comprises a primer binding site thereby forming a concatenated nucleic acid template for analysis. The method may further comprise a step of amplifying the adapted target nucleic acid. The amplification primers may comprise sequences at least partially complementary to the adaptors. The adaptors may comprise modifications for capturing a strand of single stranded adapted target nucleic acids after strand separation. The modification can be a 5'-phosphate group or a ligand for a capture molecule. The method can further comprise a step of removing one of the two single strands of the adapted target nucleic acid, e.g., by exonuclease digestion or affinity capture. The first adaptor may comprise a barcode such as a unique molecular ID (UID) and a sample ID (SID). The circularization step may comprise the use of a circularization probe. The method may further comprise a step of removing uncircularized nucleic acids after circularization. In some embodiments, the primer is target-specific and in other embodiments the primer is a collection of one or more random sequences such as a mixture of random hexamers. The method DNA polymerase may be a strand displacing polymerase. The amplification may be multiple displacement amplification (MDA) or its variation rolling circle amplification (RCA). The concatemer may be fragmented to desired size.
In some embodiments, the invention is a method of making a library of concatenated nucleic acid templates for sequencing comprising: ligating at least one end of double stranded target nucleic acids in a sample to a first adaptor to form
adapted target nucleic acids; separating the strands of the adapted target nucleic acids to form single stranded adapted target nucleic acids; circularizing the single stranded adapted target nucleic acids to form single stranded circles comprising at least one first adaptor sequence; annealing a primer to the adaptor sequence in the single stranded circles; extending the primer with a DNA polymerase to generate nucleic acid strands comprising multiple copies of the target nucleic acid; generating copy strands of the nucleic acid strands forming concatemers comprising multiple copies of the target nucleic acid; ligating second adaptors to the concatenates wherein one strand of the second adaptor comprises a sequencing primer binding site thereby forming a library of concatenated nucleic acid templates for sequencing. The method may comprise a step of amplifying the adapted target nucleic acids. The amplification primers for the library may comprise sequences at least partially complementary to the adaptors. The adaptors may comprise modifications for capturing a strand of single stranded adapted target nucleic acids, such as a 5'-phosphate group or a ligand for a capture molecule. The method may comprise a step of removing one of the two single strands of the adapted target nucleic acids, e.g., by exonuclease digestion or affinity capture. The first adaptor may comprise at least one barcode such as a unique molecular ID (UID) and a sample ID (SID). The circularization step may comprise the use of a circularization probe. The method may comprise a step of removing uncircularized nucleic acids. The primer for the library may be a collection of one or more random sequences such as a mixture of random hexamers. The DNA polymerase may be a strand displacing polymerase. The amplification for the library may be multiple displacement amplification (MDA) or its variation rolling circle amplification (RCA). The method may further comprise a step of fragmenting the concatemer. The method may further comprise a step of debranching the concatemer.
In some embodiments, the invention is a method of forming a concatenated nucleic acid template for analysis, comprising the steps of: separating the strands of a target nucleic acid in a sample to form a single stranded target nucleic acid; circularizing the single stranded target nucleic acid to form a single stranded circle; annealing a primer to the single stranded circle; extending the primer with a DNA polymerase to generate a nucleic acid strand comprising multiple copies of the target nucleic acid; generating a copy strand of the nucleic acid strand forming a concatemer comprising multiple copies of the target nucleic acid; ligating an adaptor to the concatemer wherein one strand of the adaptor comprises a primer binding site thereby forming a concatenated nucleic acid template for analysis. The method may further comprise a step of fragmenting the concatemers. The method may further comprise a step of debranching the concatemers.
In some embodiments, the invention is a method of determining the sequence of a double-stranded target nucleic acid in a sample comprising: forming a concatenated nucleic acid template by any of the methods set forth above; contacting the sample with a sequencing primer complementary to the primer binding site in the second adaptor; and extending the sequencing primer with a nucleic acid polymerase thereby determining the sequence of the target nucleic acid. The sequence may be determined by a method utilizing a nanopore.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows a general scheme of the Linear Consensus Sequencing method. Figure 2 is a detailed diagram of the sequencing step. Figure 3 is a diagram of a consensus read obtained from subreads. Figure 4 shows a reduction in error rate through consensus sequencing.
DETAILED DESCRIPTION OF THE INVENTION Definitions
The following definitions aid in understanding of this disclosure.
The term "sample" refers to any composition containing or presumed to contain target nucleic acid. This includes a sample of tissue or fluid isolated from an individual for example, skin, plasma, serum, spinal fluid, lymph fluid, synovial fluid, urine, tears, blood cells, organs and tumors, and also to samples of in vitro cultures established from cells taken from an individual, including the formalin- fixed paraffin embedded tissues (FFPET) and nucleic acids isolated therefrom. A sample may also include cell-free material, such as cell-free blood fraction that contains cell-free DNA (cfDNA) or circulating tumor DNA (ctDNA).
The term "nucleic acid" refers to polymers of nucleotides (e.g., ribonucleotides and deoxyribonucleotides, both natural and non-natural) including DNA, RNA, and their subcategories, such as cDNA, mRNA, etc. A nucleic acid may be single-stranded or double-stranded and will generally contain 5'-3' phosphodiester bonds, although in some cases, nucleotide analogs may have other linkages. Nucleic acids may include naturally occurring bases (adenosine, guanosine, cytosine, uracil and thymidine) as well as non-natural bases. Some examples of non-natural bases include those described in, e.g., Seela et al, (1999) Helv. Chim. Acta 82:1640. The non-natural bases may have a particular function, e.g., increasing the stability of the nucleic acid duplex, inhibiting nuclease digestion or blocking primer extension or strand polymerization.
The terms "polynucleotide" and "oligonucleotide" are used interchangeably. Polynucleotide is a single-stranded or a double-stranded nucleic acid. Oligonucleotide is a term sometimes used to describe a shorter polynucleotide. Oligonucleotides are prepared by any suitable method known in the art, for example, by a method involving direct chemical synthesis as described in Narang et
al. (1979) Meth. Enzymol. 68:90-99; Brown et al. (1979) Meth. Enzymol. 68:109-151; Beaucage et al. (1981) Tetrahedron Lett. 22:1859-1862; Matteucci et al. (1981) /. Am. Chem. Soc. 103:3185-3191.
The term "primer" refers to a single-stranded oligonucleotide which hybridizes with a sequence in the target nucleic acid ("primer binding site") and is capable of acting as a point of initiation of synthesis along a complementary strand of nucleic acid under conditions suitable for such synthesis.
The term "adaptor" means a nucleotide sequence that may be added to another sequence so as to import additional properties to that sequence. An adaptor is typically an oligonucleotide that can be single- or double-stranded, or may have both a single-stranded portion and a double-stranded portion.
The term "ligation" refers to a condensation reaction joining two nucleic acid strands wherein a 5'-phosphate group of one molecule reacts with the 3'- hydroxyl group of another molecule. Ligation is typically an enzymatic reaction catalyzed by a ligase or a topoisomerase. Ligation may join two single strands to create one single-stranded molecule. Ligation may also join two strands each belonging to a double-stranded molecule thus joining two double-stranded molecules. Ligation may also join both strands of a double-stranded molecule to both strands of another double-stranded molecule thus joining two double- stranded molecules. Ligation may also join two ends of a strand within a double- stranded molecule thus repairing a nick in the double-stranded molecule.
The term "barcode" refers to a nucleic acid sequence that can be detected and identified. Barcodes can be incorporated into various nucleic acids. Barcodes are sufficiently long e.g., 2, 5, 20 nucleotides, so that in a sample, the nucleic acids incorporating the barcodes can be distinguished or grouped according to the barcodes.
The term "multiplex identifier" or "MID" refers to a barcode that identifies a source of a target nucleic acids (e.g., a sample from which the nucleic acid is derived). All or substantially all the target nucleic acids from the same sample will share the same MID. Target nucleic acids from different sources or samples can be mixed and sequenced simultaneously. Using the MIDs the sequence reads can be assigned to individual samples from which the target nucleic acids originated.
The term "unique molecular identifier" or "UID" refers to a barcode that identifies a nucleic acid to which it is attached. All or substantially all the target nucleic acids from the same sample will have different UIDs. All or substantially all of the progeny (e.g., amplicons) derived from the same original target nucleic acid will share the same UID.
The term "universal primer" and "universal priming binding site" or "universal priming site" refer to a primer and primer binding site present in (typically, through in vitro addition to) different target nucleic acids. The universal priming site is added to the plurality of target nucleic acids using adaptors or using target-specific (non-universal) primers having the universal priming site in the 5'- portion. The universal primer can bind to and direct primer extension from the universal priming site.
More generally, the term "universal" refers to a nucleic acid molecule (e.g., primer or other oligonucleotide) that can be added to any target nucleic acid and perform its function irrespectively of the target nucleic acid sequence. The universal molecule may perform its function by hybridizing to the complement, e.g., a universal primer to a universal primer binding site or a universal circularization oligonucleotide to a universal primer sequence. As used herein, the terms "target sequence", "target nucleic acid" or "target" refer to a portion of the nucleic acid sequence in the sample which is to be detected
or analyzed. The term target includes all variants of the target sequence, e.g., one or more mutant variants and the wild type variant.
The term "amplification" refers to a process of making additional copies of the target nucleic acid. Amplification can have more than one cycle, e.g., multiple cycles of exponential amplification. Amplification may also have only one cycle
(making a single copy of the target nucleic acid). The copy may have additional sequences, e.g., those present in the primers used for amplification.
The term "sequencing" refers to any method of determining the sequence of nucleotides in the target nucleic acid. The invention is an improved method of generating templates for single- molecule sequencing. Consensus sequencing on single molecule sequencers generally relies on sequencing a circular template such that the target molecule is read multiple times in a single long polymerase read. The present invention uncouples this process: a circular template is produced first and subjected to rolling circle amplification (RCA) or multiple displacement amplification (MDA) to produce a long (longer than read length of the sequencers) double stranded product. Advantageously, RCA or MDA utilize a proof reading strand displacing enzyme to yield a product which contains many copies of the target molecule. The copies may be interspersed with adapter sequences. The product is linearized and sequenced on a single molecule sequencer in a linear manner (single pass). In this process, the original target sequence is read multiple times without the need to sequence it in a circular manner. The parsing of sub reads and identification of unique molecules is aided by the addition of an intervening adaptor sequence. The adaptor may contain a constant or variable unique region (barcode). The resulting long concatemer molecule consists of sense or antisense (not both) repeats of the original target molecule. The advantages include low target molecule input. The MDA reaction is capable of generating microgram quantities of DNA from a very
small amount of template. This method is suitable for clinical applications such as analysis of cell-free DNA (cfDNA, including circulating tumor DNA, ctDNA and cell-free fetal DNA). Another advantage is high fidelity stemming from the use of a proof reading DNA polymerase to copy the circularized molecule . Yet another advantage is obviating the need of barcodes such as UIDs to track the copies of an original molecule (as is the case with PCR amplification). In the case of RCA/MDA, the copies of the original molecule are joined in the concatenate. Yet another advantage is the ability to adjust the length of the template to meet the need of the sequencer. The concatemer molecule can be sheared to desired length as dictated by sequencing read length and original insert size. The steps of the method are shown in a diagram on Figure 1 and described in detail below.
The present invention comprises detecting a target nucleic acid in a sample. In some embodiments, the sample is derived from a subject or a patient. In some embodiments the sample may comprise a fragment of a solid tissue or a solid tumor derived from the subject or the patient, e.g., by biopsy. The sample may also comprise body fluids (e.g., urine, sputum, serum, plasma or lymph, saliva, sputum, sweat, tear, cerebrospinal fluid, amniotic fluid, synovial fluid, pericardial fluid, peritoneal fluid, pleural fluid, cystic fluid, bile, gastric fluid, intestinal fluid, and/or fecal samples), The sample may comprise whole blood or blood fractions where tumor cells may be present. In some embodiments, the sample, especially a liquid sample may comprise cell-free material such as cell-free DNA or RNA including cell-free tumor DNA or tumor RNA. The present invention is especially suitable for analyzing rare and low quantity targets. In some embodiments, the sample is a cell- free sample, e.g., cell-free blood-derived sample where cell-free tumor DNA or tumor RNA are present. In other embodiments, the sample is a cultured sample, e.g., a culture or culture supernatant containing or suspected to contain an infectious agent or nucleic acids derived from the infectious agent. In some
embodiments, the infectious agent is a bacterium, a protozoan, a virus or a mycoplasma.
A target nucleic acid is the nucleic acid of interest that may be present in the sample. In some embodiments, the target nucleic acid is a gene or a gene fragment. In other embodiments, the target nucleic acid contains a genetic variant, e.g., a polymorphism, including a single nucleotide polymorphism or variant (SNP of SNV), or a genetic rearrangement resulting e.g., in a gene fusion. In some embodiments, the target nucleic acid comprises a biomarker. In other embodiments, the target nucleic acid is characteristic of a particular organism, e.g., aids in identification of the pathogenic organism or a characteristic of the pathogenic organism, e.g., drug sensitivity or drug resistance. In yet other embodiments, the target nucleic acid is characteristic of a human subject, e.g., the HLA or KIR sequence defining the subject's unique HLA or KIR genotype. In yet other embodiments, all the sequences in the sample are target nucleic acids e.g., in shotgun genomic sequencing.
In an embodiment of the invention, a double-stranded target nucleic acid is converted into the template configuration of the invention. In some embodiments, the target nucleic acid occurs in nature in a single-stranded form (e.g., RNA, including mRNA, micro RNA, viral RNA; or single-stranded viral DNA). The single-stranded target nucleic acid is converted into double-stranded form to enable the further steps of the claimed method.
Longer target nucleic acids may be fragmented although in some applications longer target nucleic acids may be desired to achieve a longer read. In some embodiments, the target nucleic acid is naturally fragmented, e.g., circulating cell-free DNA (cfDNA) or chemically degraded DNA such as the one founds in preserved samples. In other embodiments, the target nucleic acid is fragmented in
vitro, e.g., by physical means such as sonication or by endonuclease digestion, e.g., restriction digestion.
In some embodiments, the invention is a method comprising a step of amplifying the target nucleic acid. In some embodiments, PCR amplification is used to target a specific genomic region for further analysis of the template prepared by the method of the invention. The amplification may be by polymerase chain reaction (PCR) or any other method that utilizes oligonucleotide primers. In some embodiments, the amplification primers are used to introduce auxiliary sequences into the target nucleic acid. In some embodiments, the amplification primers are used to introduce DNA modifications, e.g., those to enable exonuclease digestion or target capture e.g., with an affinity molecule such as streptavidin capturing biotinylated nucleic acids. Various PCR conditions are described in PCR Strategies (M. A. Innis, D. H. Gelfand, and J. J. Sninsky eds., 1995, Academic Press, San Diego, CA) at Chapter 14; PCR Protocols : A Guide to Methods and Applications (M. A. Innis, D. H. Gelfand, J. J. Sninsky, and T. J. White eds., Academic Press, NY,
1990).
Typically, the target-specific primers are used as a pair of distinct oligonucleotides, e.g., a forward and a reverse primer. For subsequent steps, a sequence can be added to the 5'-end of the forward and the reverse primer. In some embodiments, a universal sequence comprising an adaptor is added.
The method may also utilize an adaptor such as a universal adaptor sequence that is conjugated to the target sequence. In some embodiments, the adaptor comprises a registration sequence (a known sequence that can be used to bioinformatically to separate out the iterations of the target sequnce within the concatemer. In other embodiments, the adaptor can be used to incorporate a modification that can be used to enrich for a single stranded template prior to circular ization. In some embodiments, the modification is a ligand for a capture
moiety, e.g., biotin. In other embodiments, the modification is a 5'- phosphorylation protecting the strand from exonuclease degradation that removes the complementary strand without the modification.
The adaptor may also comprise primer binding sites such as amplification primer binding sites and sequencing primer binding sites. In some embodiments, the modifications described above (e.g., biotin or 5'-phosphate) are introduced using the primer binding to the primer binding site. In some embodiments, adaptors are added independent of the sequence of the target nucleic acid, for example, by ligation. In such embodiments, the target nucleic acids in a sample receive the same adaptor molecule at each end. To distinguish the strands of the resulting adapted target nucleic acid, the adaptor may have a Y-structure, see e.g., U.S. Patent Nos. 8053192, 8182989 and 8822150.
In some embodiments of the present invention, the adaptor molecules are ligated to the target nucleic acid. The ligation can be a blunt-end ligation or a more efficient cohesive-end ligation. The target nucleic acid or the adaptors may be rendered blunt-ended by strand-filling, i.e., extending a 3'-terminus by a DNA polymerase to eliminate a 5'-overhang. In some embodiments, the blunt-ended adaptors and target nucleic acid may be rendered cohesive by addition of a single nucleotide to the 3'-end of the adaptor and a single complementary nucleotide to the 3'-ends of the target nucleic acid, e.g., by a DNA polymerase or a terminal transferase. In yet other embodiments, the adaptors and the target nucleic acid may acquire cohesive ends (overhangs) by digestion with restriction endonucleases. The latter option is more advantageous for known target sequences that are known to contain the restriction enzyme recognition site. In some embodiments, other enzymatic steps may be required to accomplish the ligation. In some embodiments, a polynucleotide kinase may be used to add 5'-phosphates to the target nucleic acid molecules and adaptor molecules.
In other embodiments, adaptors are present in the 5'-regions of target- specific primers and are added via primer extension or amplification.
In some embodiments, the adaptor molecules are in vitro synthesized artificial sequences. In other embodiments, the adaptor molecules are in vitro synthesized naturally-occurring sequences. In yet other embodiments, the adaptor molecules are isolated naturally occurring molecules.
In some embodiments, the invention comprises introduction of barcodes into the target nucleic acids. Sequencing individual molecules typically requires molecular barcodes such as described e.g., in U.S. Patent Nos. 7,393,665, 8,168,385, 8,481,292, 8,685,678, and 8,722,368. A unique molecular barcode is a short artificial sequence added to each molecule in a sample such as a patient's sample typically during the earliest steps of in vitro manipulations. The barcode marks the molecule and its progeny. The unique molecular barcode (UID) has multiple uses. Barcodes allow tracking each individual nucleic acid molecule in the sample to assess, e.g., the presence and amount of circulating tumor DNA (ctDNA) molecules in a patient's blood in order to detect and monitor cancer without a biopsy. See U.S. patent applications 14/209,807 and 14/774,518. Unique molecular barcodes can also be used for sequencing error correction. The entire progeny of a single target molecule is marked with the same barcode and forms a barcoded family. A variation in the sequence not shared by all members of the barcoded family is discarded as an artifact and not a true mutation. Barcodes can also be used for positional deduplication and target quantification, as the entire family represents a single molecule in the original sample. See Id.
In the context of the present invention, barcodes also serve as registration sequences separating copies of the target nucleic acids within concatemers described herein.
In some embodiments of the present invention, adaptors comprise one or more barcodes. A barcode can be a multiplex sample ID (MID) used to identify the source of the sample where samples are mixed (multiplexed). The barcode may also serve as a unique molecular ID (UID) used to identify each original molecule and its progeny. The barcode may also be a combination of a UID and an MID. In some embodiments, a single barcode is used as both UID and MID.
In some embodiments, each barcode comprises a predefined sequence. In other embodiments, the barcode comprises a random sequence. Barcodes can be 1- 20 nucleotides long. The method further comprises a step of separating the strands of the target nucleic acid or the adapted target nucleic acid. In some embodiments, both of the separated strands are retained for downstream analysis, e.g., sequencing. The two strands may be separated by physical means, i.e., alkaline denaturation or heat denaturation. The strands can also be separated enzymatically e.g., by selective degradation of one strand by a nuclease such as en do- or an exonuclease or a combination thereof. For example, one strand can be marked by the presence of deoxyuracils that are converted to abasic sites by uracil-N-DNA glycosylase (UNG). The same strand is subsequently degraded by heat with an optional aid of an exonuclease. In other embodiments, one strand is retained for further analysis via affinity capture. For example, a biotinylated strand may be captured on a streptavidin containing substrate after denaturation and retained while the complementary strand may not be retained or be discarded.
The method further comprises a step of circularizing a single stranded nucleic acid, (see Figure 1). In some embodiments, the method comprises direct ligation of the ends of the single stranded nucleic acid. The ligation step utilizes a ligase or another enzyme with a similar activity or a non-enzymatic reagent. The ligase can be a DNA or RNA ligase, e.g., of viral or bacterial origin such as T4 or E.
colt ligase, or thermostable ligases Afu, Taq, Tfl or Tth In some embodiments, an alternative enzyme, e.g., topoisomerase can be used. Further, a non-enzymatic reagent can be used to form the phosphor-diester bond between the 5'-phosphate of the primer extension product and the 3'-OH of the adaptor as described e.g., in US20140193860. In some embodiments, the ligase is a single stranded DNA ligase such as one available in Accel-NGS™ IS DNA Library Kit (Swift Biosciences, Ann Arbor, Mich.) or Thermophage Ligase or its derivatives such as Circligase™ (Epicentre Tech., Madison, Wise.)
In some embodiments, circularization step utilizes a circularization probe. In some embodiments, the circularization probe is complementary to the adaptor sequences. The step includes contacting the 5'-phophorylated strand of the adapted target nucleic acid with a circularization oligonucleotide (probe) to generate a hybrid structure wherein the adaptor sequences in the adapted target nucleic acid strand are hybridized to the circularization probe so that the ends of the strand are brought into proximity. In some embodiments, the strands are brought into ligatable proximity. In other embodiments, a gap is left between the 5'- and 3'-ends of the target nucleic acid strand. In such embodiments, the circularization further comprises an extension step wherein the 3'-end of the adapted target nucleic acid is extended to come into ligatable proximity with its 5'- end.
The invention further comprises a ligation step comprising ligating the 5'- and 3'-ends of the adapted target nucleic acid thereby forming a circular single stranded molecule.
In some embodiments, the invention comprises an exonuclease digestion step wherein the linear nucleic acids possibly comprising un- circularized target nucleic acids or adapted target nucleic acids or any oligonucleotides present in the reaction mixture are removed from the reaction mixture. In some embodiments,
the exonuclease is a bacterial exonuclease such as E.coli exonuclease, e.g., Exo V, Exo III, ExoVI, Exol, T5 exonuclease, T7 exonuclease or Lambda exonuclease or a combination thereof.
The invention comprises a primer extension step. Following the formation of a circle, a primer may anneal to the primer binding site to initiate strand synthesis by primer extension. In some embodiments, a gene-specific primer is used. In other embodiments, a universal primer (e.g., a primer with a binding site present in the adaptor sequence can be used). In yet other embodiments, random priming, e.g., with a plurality of random oligonucleotides is used. In some embodiments, the priming utilizes a plurality of random hexamers.
In some embodiments, a strand displacing polymerase is used to perform rolling circle amplification (RCA) comprising synthesis of a strand comprising multiple iterations of a sequence complementary to a circular molecule from a single primer. Extension of a reverse primer binding to the primer binding site in adaptor sequence in the nascent strand is used to synthesize a second strand forming a double-stranded copy molecule. In some embodiments, the DNA polymerase used for RCA is a strand displacing polymerase. In some embodiments, the DNA polymerase has a 3'-5' exonuclease activity. In some embodiments, the polymerase is a viral polymerase, e.g., phi29 DNA polymerase. In some embodiments, a strand displacing polymerase is used to perform amultiple displacement amplification (MDA) from multiple primers annealing throughout the template strand. In some embodiments, the primers are a collection of oligonucleotides. In some embodiments, the oligonucleotides in the collection have a random sequence, e.g., the primers are a collection of random hexamers. In some embodiments, the primers have modifications that prevent their degradation by a 5'-3' exonuclease, e.g., the exonuclease activity of the DNA polymerase. In the reaction mixture, the same collection of primers primes
synthesis of a second strand forming a double-stranded copy molecule. In some embodiments, the DNA polymerase used for RCA is a strand displacing polymerase. In some embodiments, the DNA polymerase has a 3'-5' exonuclease activity. In some embodiments, the polymerase is a viral polymerase, e.g., phi29 DNA polymerase.
In some embodiments, a combination of RCA and MDA is used.
The invention comprises a step of forming a concatemer. In the context of the present invention, a concatemer is a nucleic acid containing multiple copies of a particular sequence, (such as a target sequence) arranged in tandem and separated by a registration sequence (such as an adaptor). In some embodiments, the concatemer is a product of rolling circle amplification (RCA). In other embodiments, the concatemer is a product of multiple displacement amplification (MDA). In some embodiments, the concatemer is a product of a combination of RCA and MDA. In some embodiments, the linear concatemers are reduced in size by a method selected from physical shearing (heat, sonication, size exclusion chromatography, electrophoresis, hydrodynamic shearing), enzymatic digestion (partial restriction digestion, DNase I digestion, transposase fragmentation, any non-specific nuclease) or chemical shearing, e.g., treatment with divalent metal cations. In some embodiments, the amplification products including MDA products are debranched. The debranching may comprise treatment of the strand displacing polymerase products with an additional enzyme such as a single strand-specific endonuclease, e.g., SI endonuclease.
The present invention further comprises a step of ligating a second adaptor to the ends of the concatemer comprising multiple copies of the adapted target nucleic acid. As with the ligation of the first adaptor used in the prior step, the ligation of the second adaptor can be a blunt-end or a cohesive-end ligation. The
target nucleic acid or the adaptors may be rendered blunt-ended by strand-filling, i.e., extending a 3'-terminus by a DNA polymerase to eliminate a 5'-overhang. In some embodiments, the blunt-ended adaptors and target nucleic acid may be rendered cohesive by addition of a single nucleotide to the 3'-end of the adaptor and a single complementary nucleotide to the 3'-ends of the target nucleic acid, e.g., by a DNA polymerase or a terminal transferase. In yet other embodiments, the adaptors and the target nucleic acid may acquire cohesive ends (overhangs) by digestion with restriction endonucleases. In some embodiments, a polynucleotide kinase may be used to add 5'-phosphates to the target nucleic acid molecules and adaptor molecules.
In some embodiments, the adaptor comprises a single-stranded region. The signal stranded region may comprise two non-complementary strands (Y-adaptor) or comprise only one strand (asymmetric adaptor) as shown in Figure 1, where one strand is longer. In some embodiments, the invention is a method of making a library of target nucleic acids. Specifically, the library comprises linear concatemers of target nucleic acids separated by a registration sequence. In some embodiments, the library is suitable for sequencing of the target nucleic acids. The method comprises a step wherein first adaptors are ligated to the multiple target nucleic acids in the sample to create a library of adapted molecules. The molecules in the library comprise target sequences flanked by adaptor sequences. The adapted target nucleic acids are circularized as described herein to form a library of circular adapted target nucleic acids. The library is then subjected to primer extension to form a library of concatemers each comprising multiple copies of an adapted target nucleic acid. In some embodiments, the library of linear concatemers is adjusted for size as described herein. The method further comprises a step wherein second adaptors are ligated to the library of concatemers. The final product is a library of
concatemers of adapted target nucleic acids flanked by second adaptors. Each molecule in the library comprises multiple iterations of the target nucleic acid separated by registration sequences. The registration sequence comprises all or a part of the sequence of the first adaptor. In some embodiments, the present invention comprises detecting target nucleic acids in a sample by nucleic acid sequencing. Multiple nucleic acids, including all the nucleic acids in a sample may be converted into the template configuration of the invention and sequenced. In some embodiments, the library of concatemers of target nucleic acids can be subjected to nucleic acid sequencing. The single stranded region of the second adaptor comprises a primer binding site, e.g., a sequencing primer binding site. As shown in Figure 1, the primer binding sites can initiate a sequencing read of each strand. The sequencing read will contain multiple copies of each strand of the target sequence e.g., the read of the top strand and the bottom strand of the concatemer. Thereby each read will contain multiple iterations of the (+) strand and the (-) strand of the target nucleic acid.
Sequencing can be performed by any method known in the art. Especially advantageous is the high-throughput single molecule sequencing. Examples of such technologies include the Illumina HiSeq platform (Illumina, San Diego, Cal.), Ion Torrent platform (Life Technologies, Grand Island, NY), Pacific Biosciences platform utilizing the SMRT (Pacific Biosciences, Menlo Park, Cal.) or a platform utilizing nanopore technology such as those manufactured by Oxford Nanopore Technologies (Oxford, UK) or Roche Sequencing Solutions (Santa Clara, Cal.) and any other presently existing or future DNA sequencing technology that does or does not involve sequencing by synthesis. The sequencing step may utilize platform-specific sequencing primers. Binding sites for these primers may be
introduced in the method of the invention as described herein, i.e., by being a part of second adaptors or amplification primers.
Analysis and error correction
In some embodiments, the invention comprises a step of determining a consensus sequence. Sequencing of concatemer library molecules produces reads which contain tandemly repeated copies of the target molecule (sub-reads). The sub-read sequences can be used to compute a high-accuracy consensus sequence of the original target nucleic acid molecule (Figure 3). Figure 4 demonstrates that as the number of sub-reads used to compute the consensus sequence increases, the error rate decreases drastically from over 5% to about 0.5%.
In some embodiments, the sequencing step involves sequence analysis including a step of sequence aligning. In some embodiments, aligning is used to determine a consensus sequence from a plurality of sequences, e.g., a plurality having the same barcodes (UID). In some embodiments barcodes (UIDs) are used to determine a consensus from a plurality of sequences all having an identical barcode (UID). In other embodiments, barcodes (UIDs) are used to eliminate artifacts, i.e., variations existing in some but not all sequences having an identical barcode (UID). Such artifacts resulting from PCR errors or sequencing errors can be eliminated. In some embodiments, the number of each sequence in the sample can be quantified by quantifying relative numbers of sequences with each barcode (UID) in the sample. Each UID represents a single molecule in the original sample and counting different UIDs associated with each sequence variant can determine the fraction of each sequence in the original sample. A person skilled in the art will be able to determine the number of sequence reads necessary to determine a consensus sequence. In some embodiments, the relevant number is reads per UID
("sequence depth") necessary for an accurate quantitative result. In some embodiments, the desired depth is 5-50 reads per UID.
EXAMPLES Example 1. DNA fragmentation and adapter ligation
Adapted target nucleic acids were prepared using the Kapa Hyper Plus kit (Roche Sequencing Solutions, Pleasanton, Cal.) according to the manufacturer's instructions. Input of E. coli genomic DNA was 500 ng per reaction in 35 μΐ water. Kapa Hyper plus library workflow was used for shearing, repair and A-tailing. DNA was sheared with Kapa Frag into fragments of approximately 300 bp.
Size selection and purification of the fragmented DNA was performed using Kapa Pure beads, with a 0.9X - 0.7X cut. The resulting library had an insert size of approximately 200 bp.
A-tailing was performed using the Kapa reagents and protocols. For adaptor annealing, oligonucleotides were suspended in annealing buffer (50 mM
NaCl; 10 mM Tris, pH 8.0) to a concentration of 100 μΜ. Two oligos comprising the adapter were combined in equal molar amounts. The mixed oligonucleotides were heated to 94°C for 2 minutes and gradually cooled at 2% ramp to 20°C.
Adaptors SEQ ID NO: 1 / 5phos /GACACGACGCTCTTCCGATCT
SEQ ID NO: 2 / 5phos /GATCGGAAGAGCACACGTCT
Annealed adaptor structure
/ 5phos /GACACGAC
\
GCTCTTCCGATCT -
I I I I I I I I I I I I
CGAGAAGGCTAG / 5ph
/
TCTGCACA
The ligation reaction was set up as follows and incubated at 20 °C for 15 minutes.
88 μΐ Kapa Pure, eluted in 10 μΐ lOmM Tris pH8)
Example 2. Circularization
This step comprises the denaturation of the template using heat which renders the library molecules single stranded, followed by single stranded circularization using a single strand DNA-specific ligase. As a potential aid in denaturation we explored the use of thermostable single stranded binding protein such as that provided by NEB: ET SSB (M2401S). This specific protein is not
required - any thermostable SSB (e.g., archaeal) would suffice. Epicentre CircLigase (CL4111K) or CircLigase II (CL9021K) (Epicentre, Madison, Wise.) were used according to the manufacturer's instructions.
For the denaturation step, the adapter ligated DNA from the previous step was incubated with or without ET SSB protein added.
The reaction was incubated at 95°C for 15 minutes after which the reaction was cooled to 60°C and the CircLigase master mix added.
Circularization step was performed as follows:
Incubation was at 60°C for 2 hours, followed by 80°C for 10 minutes.
Example 3. Exonuclease digestion of linear molecules
This step uses single strand-specific (Exonuclease I) and double strand- specific (Exonuclease III) DNA exonucleases to remove all non-circularized molecules. The enzymes Exonuclease I (M0293S) and Exonuclease III (M0206S) from New England Biolabs were used according to the manufacturer's instructions.
The digestion was followed by Kapa Pure SPRI bead cleanup (add 40 μΐ Kapa Pure, eluted in 20 μΐ lOmM Tris pH 8.0). This library is ready for single molecule sequencing.
Example 4. Random primer annealing and multiple displacement amplification (MDA)
The circular templates were diluted and phosphorothioate-protected random heptamers were annealed. The heptamers (exonuclease resistant random primers, ThermoFisher Scientific (S0181) and phi29 DNA polymerase 10 U/μΙ (NEB M0269S) were used according to the manufacturers' instructions. volume Concentration.
Annealed DNA (or NTC) 20 μΐ phi29 DNA polymerase 2 μ1 20U phi29 polymerase buffer 10X 5 μΐ IX dNTP mix (10 mM each dNTP) Ι μΐ 0.2 mM each dNTP
BSA 100X (NEB) Ι μΐ 2X water 21 μΐ
Total reaction volume 50 μΐ
Incubate at 30°C for 4 hours followed by 10 minutes at 65°C. Heated lid set to 80°C. The reaction was followed by Kapa Pure SPRI bead cleanup (add 50 μΐ Kapa Pure, eluted in 20 μΐ lOmM Tris pH 8.0).
Example 5. Sequencing library preparation
This step is comprises shearing the MDA product to the desired size followed by sequencing adapter ligation, using standard Kapa Hyper Prep chemistry. We sheared two micrograms of the MDA product using Covaris gTube to the desired insert size {e.g., 6 kb) according to manufacturer's instruction. Recover 150 μΐ of the sheared genomic DNA. Shearing was followed by Kapa Pure
SPRI bead cleanup (add 90 μΐ Kapa Pure, eluted in 20 μΐ lOmM Tris pH 8.0), diluted to 10 ng/ μΐ.
For adaptor ligation, the DNA was A-tailed as follows:
Incubation was at 20°C for 30 minutes followed by 65°C for 30 minutes. To anneal the adaptors, oligonucleotides were suspended in annealing buffer (50 mM NaCl; 10 mM Tris, pH 8.0) to a concentration of 100 μΜ. Two oligos comprising the adaptor were combined in equal molar amounts. The mixed oligonucleotides were heated to 94°C for 2 minutes and gradually cooled at 2% ramp to 20°C. Adaptors
SEQ ID NO : 3 5 ' hos/AATCTCTCTCTACTGACTGTCCTCCTCCTCC*G* T* T
SEQ ID NO: 4 5 ' hos/ ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTT*G*T*T SEQ ID NO: 5 GAGAGAGATT
Annealed adaptor structure
TTTTCCTCCTCCTCCGTTGTTGTT*G*T*T 3 '
/
/ 5phos / ATCTCTCTC 3' TTAGAGAGAG 5'
Adaptor ligation was performed as follows:
The reaction was incubated at 20 °C for 15 minutes followed by Kapa Pure SPRI bead cleanup (add 88 μΐ Kapa Pure, eluted in 10 μΐ lOmM Tris pH8). The library is ready for single molecule sequencing.
Example 6. Building a consensus sequence
Figure 3 shows how consensus analysis can be performed from the RCA concatemers to obtain higher accuracy. The sequencing polymerase reads th rough the concatemer that contains n number of subreads. Each sub read is a copy of the in itial DN A molecule and can be flanked by key registration sequences. The key registration sequences are used to parse out the subreads, which are then aligned against each other to create a consensus read. To get consensus reads align ment was done with Burrows-Wheeler Aligner ( BWA) to create consensus. The error rate of the consensus was determined by calculating the % mismatches bases against a reverence sequence. Mutations that are not found in the majorit of the subreads are discarded, resulting in a reduction of error rate ( Figure 4).
Claims
PATENT CLAIMS
A method of forming a concatenated nucleic acid template for analysis, comprising the steps of:
a) ligating at least one end of a double stranded target nucleic acid to a first adaptor to form an adapted target nucleic acid;
b) separating the strands of the adapted target nucleic acid to form a single stranded adapted target nucleic acid;
c) circularizing the single stranded adapted target nucleic acid to form a single stranded circle comprising at least one first adaptor sequence;
d) annealing a primer to the single stranded circle;
e) extending the primer with a DNA polymerase to generate a nucleic acid strand comprising multiple copies of the target nucleic acid; f) generating a copy strand of the nucleic acid strand from step e) forming a concatemer comprising multiple copies of the target nucleic acid;
g) ligating second adaptor to the concatemer wherein one strand of the second adaptor comprises a primer binding site thereby forming a concatenated nucleic acid template for analysis.
The method of claim 1, further comprising a step of amplifying the adapted target nucleic acid after step a).
The method of claim 1-2, wherein adaptors comprise modifications for capturing a strand of single stranded adapted target nucleic acids after step b).
The method of claim 1-3, wherein the first adaptor comprises at least one barcode.
5. The method of claim 1-4, wherein the circularization step comprises the use of a circularization probe.
6. The method of claim 1-5, further comprising a step of removing
uncircularized nucleic acids after step c).
7. The method of claim 1-5, wherein the steps e) and f) comprise multiple displacement amplification (MDA).
8. The method of claim 1-5, further comprising the step of fragmenting or debranching the concatemer.
9. A method of making a library of concatenated nucleic acid templates for sequencing comprising:
a) ligating at least one end of double stranded target nucleic acids in a sample to a first adaptor to form adapted target nucleic acids;
b) separating the strands of the adapted target nucleic acids to form single stranded adapted target nucleic acids;
c) circularizing the single stranded adapted target nucleic acids to form single stranded circles comprising at least one first adaptor sequence;
d) annealing a primer to the adaptor sequence in the single stranded circles;
e) extending the primer with a DNA polymerase to generate nucleic acid strands comprising multiple copies of the target nucleic acid; f) generating copy strands of the nucleic acid strands from step e) forming concatemers comprising multiple copies of the target nucleic acid;
g) ligating second adaptors to the concatemers wherein one strand of the second adaptor comprises a sequencing primer binding site thereby forming a library of concatenated nucleic acid templates for sequencing.
The method of claim 9, further comprising a step of amplifying the adapted target nucleic acids after step a).
The method of claim 9-10, wherein adaptors comprise modifications for capturing a strand of single stranded adapted target nucleic acids after step b).
12. The method of claim 9-11, wherein the first adaptor comprises at least one barcode.
13. The method of claim 9-12, wherein the circularization step comprises the use of a circularization probe.
14. The method of claim 9-13, further comprising a step of removing
uncircularized nucleic acids after step c).
15. The method of claim 9-13, wherein the steps e) and f) comprise multiple displacement amplification (MDA).
16. The method of claim 9-13, further comprising a step of fragmenting or debranching the concatemers.
17. A method of forming a concatenated nucleic acid template for analysis, comprising the steps of:
a) separating the strands of a target nucleic acid in a sample to form a single stranded target nucleic acid;
b) circularizing the single stranded target nucleic acid to form a single stranded circle;
c) annealing a primer to the single stranded circle;
d) extending the primer with a DNA polymerase to generate a nucleic acid strand comprising multiple copies of the target nucleic acid; e) generating a copy strand of the nucleic acid strand from step d) forming a concatemer comprising multiple copies of the target nucleic acid;
f) ligating an adaptor to the concatemer wherein one strand of the adaptor comprises a primer binding site thereby forming a concatenated nucleic acid template for analysis.
18. A method of determining the sequence of a double-stranded target nucleic acid in a sample comprising:
a) forming a concatenated nucleic acid template by a method of claim 1-8;
b) contacting the sample with a sequencing primer complementary to the primer binding site in the second adaptor; and c) extending the sequencing primer with a nucleic acid polymerase thereby determining the sequence of the target nucleic acid.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762581591P | 2017-11-03 | 2017-11-03 | |
| US62/581,591 | 2017-11-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019086531A1 true WO2019086531A1 (en) | 2019-05-09 |
Family
ID=64049269
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2018/079854 Ceased WO2019086531A1 (en) | 2017-11-03 | 2018-10-31 | Linear consensus sequencing |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2019086531A1 (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3574109A4 (en) * | 2017-01-24 | 2020-10-14 | Tsavachidou, Dimitra | METHOD OF CONSTRUCTION OF COPIES OF NUCLEIC ACID MOLECULES |
| WO2021142769A1 (en) * | 2020-01-17 | 2021-07-22 | 深圳华大智造科技有限公司 | Method for synchronously sequencing sense strand and antisense strand of dna |
| WO2021180791A1 (en) * | 2020-03-11 | 2021-09-16 | F. Hoffmann-La Roche Ag | Novel nucleic acid template structure for sequencing |
| WO2022018055A1 (en) * | 2020-07-20 | 2022-01-27 | Westfälische Wilhelms-Universität Münster | Circulation method to sequence immune repertoires of individual cells |
| WO2022015600A3 (en) * | 2020-07-13 | 2022-03-03 | Singular Genomics Systems, Inc. | Methods of sequencing complementary polynucleotides |
| CN114672546A (en) * | 2020-12-24 | 2022-06-28 | 郑州思昆生物工程有限公司 | A kind of template polynucleotide paired-end sequencing method and kit |
| CN115516104A (en) * | 2020-03-03 | 2022-12-23 | 加利福尼亚太平洋生物科学股份有限公司 | Methods and compositions for sequencing double-stranded nucleic acids |
| EP4065706A4 (en) * | 2019-11-25 | 2024-01-17 | William Marsh Rice University | Linear dna assembly for nanopore sequencing |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030068629A1 (en) * | 2001-03-21 | 2003-04-10 | Rothberg Jonathan M. | Apparatus and method for sequencing a nucleic acid |
| US7302146B2 (en) | 2004-09-17 | 2007-11-27 | Pacific Biosciences Of California, Inc. | Apparatus and method for analysis of molecules |
| US7393665B2 (en) | 2005-02-10 | 2008-07-01 | Population Genetics Technologies Ltd | Methods and compositions for tagging and identifying polynucleotides |
| WO2009094583A1 (en) * | 2008-01-23 | 2009-07-30 | Complete Genomics, Inc. | Methods and compositions for preventing bias in amplification and sequencing reactions |
| US8053192B2 (en) | 2007-02-02 | 2011-11-08 | Illumina Cambridge Ltd. | Methods for indexing samples and sequencing multiple polynucleotide templates |
| US8153375B2 (en) | 2008-03-28 | 2012-04-10 | Pacific Biosciences Of California, Inc. | Compositions and methods for nucleic acid sequencing |
| USRE44265E1 (en) | 2001-08-03 | 2013-06-04 | Olink Ab | Nucleic acid amplification method |
| US8481292B2 (en) | 2010-09-21 | 2013-07-09 | Population Genetics Technologies Litd. | Increasing confidence of allele calls with molecular counting |
| US20140193860A1 (en) | 2013-01-09 | 2014-07-10 | The Penn State Research Foundation | Low Sequence Bias Single-Stranded DNA Ligation |
| WO2014196863A1 (en) * | 2013-06-07 | 2014-12-11 | Keygene N.V. | Method for targeted sequencing |
| US20160304954A1 (en) * | 2013-12-11 | 2016-10-20 | Accuragen, Inc. | Compositions and methods for detecting rare sequence variants |
-
2018
- 2018-10-31 WO PCT/EP2018/079854 patent/WO2019086531A1/en not_active Ceased
Patent Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030068629A1 (en) * | 2001-03-21 | 2003-04-10 | Rothberg Jonathan M. | Apparatus and method for sequencing a nucleic acid |
| USRE44265E1 (en) | 2001-08-03 | 2013-06-04 | Olink Ab | Nucleic acid amplification method |
| US7302146B2 (en) | 2004-09-17 | 2007-11-27 | Pacific Biosciences Of California, Inc. | Apparatus and method for analysis of molecules |
| US8168385B2 (en) | 2005-02-10 | 2012-05-01 | Population Genetics Technologies Ltd | Methods and compositions for tagging and identifying polynucleotides |
| US7393665B2 (en) | 2005-02-10 | 2008-07-01 | Population Genetics Technologies Ltd | Methods and compositions for tagging and identifying polynucleotides |
| US8053192B2 (en) | 2007-02-02 | 2011-11-08 | Illumina Cambridge Ltd. | Methods for indexing samples and sequencing multiple polynucleotide templates |
| US8182989B2 (en) | 2007-02-02 | 2012-05-22 | Illumina Cambridge Ltd. | Methods for indexing samples and sequencing multiple polynucleotide templates |
| US8822150B2 (en) | 2007-02-02 | 2014-09-02 | Illumina Cambridge Limited | Methods for indexing samples and sequencing multiple polynucleotide templates |
| WO2009094583A1 (en) * | 2008-01-23 | 2009-07-30 | Complete Genomics, Inc. | Methods and compositions for preventing bias in amplification and sequencing reactions |
| US8153375B2 (en) | 2008-03-28 | 2012-04-10 | Pacific Biosciences Of California, Inc. | Compositions and methods for nucleic acid sequencing |
| US8481292B2 (en) | 2010-09-21 | 2013-07-09 | Population Genetics Technologies Litd. | Increasing confidence of allele calls with molecular counting |
| US8685678B2 (en) | 2010-09-21 | 2014-04-01 | Population Genetics Technologies Ltd | Increasing confidence of allele calls with molecular counting |
| US8722368B2 (en) | 2010-09-21 | 2014-05-13 | Population Genetics Technologies Ltd. | Method for preparing a counter-tagged population of nucleic acid molecules |
| US20140193860A1 (en) | 2013-01-09 | 2014-07-10 | The Penn State Research Foundation | Low Sequence Bias Single-Stranded DNA Ligation |
| WO2014196863A1 (en) * | 2013-06-07 | 2014-12-11 | Keygene N.V. | Method for targeted sequencing |
| US20160304954A1 (en) * | 2013-12-11 | 2016-10-20 | Accuragen, Inc. | Compositions and methods for detecting rare sequence variants |
Non-Patent Citations (7)
| Title |
|---|
| BEAUCAGE ET AL., TETRAHEDRON LETT., vol. 22, 1981, pages 1859 - 1862 |
| BROWN ET AL., METH. ENZYMOL., vol. 68, 1979, pages 109 - 151 |
| M. A. INNIS, D. H. GELFAND, AND J. J. SNINSKY: "PCR Strategies", 1995, ACADEMIC PRESS, article "Chapter 14" |
| M. A. INNIS, D. H. GELFAND, J. J. SNINSKY, AND T. J. WHITE: "PCR Protocols : A Guide to Methods and Applications", 1990, ACADEMIC PRESS |
| MATTEUCCI ET AL., J. AM. CHEM. SOC., vol. 103, 1981, pages 3185 - 3191 |
| NARANG ET AL., METH. ENZYMOL., vol. 68, 1979, pages 90 - 99 |
| SEELA ET AL., HELV. CHIM. ACTA, vol. 82, 1999, pages 1640 |
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3574109A4 (en) * | 2017-01-24 | 2020-10-14 | Tsavachidou, Dimitra | METHOD OF CONSTRUCTION OF COPIES OF NUCLEIC ACID MOLECULES |
| EP4253565A3 (en) * | 2017-01-24 | 2024-07-03 | Vastogen, Inc. | Methods for constructing copies of nucleic acid molecules |
| US11981961B2 (en) | 2017-01-24 | 2024-05-14 | Vastogen, Inc. | Methods for constructing copies of nucleic acid molecules |
| EP4065706A4 (en) * | 2019-11-25 | 2024-01-17 | William Marsh Rice University | Linear dna assembly for nanopore sequencing |
| WO2021142769A1 (en) * | 2020-01-17 | 2021-07-22 | 深圳华大智造科技有限公司 | Method for synchronously sequencing sense strand and antisense strand of dna |
| JP7525615B2 (en) | 2020-01-17 | 2024-07-30 | 深▲せん▼華大智造科技有限公司 | Method for synchronizing the sequencing of the sense and antisense strands of DNA |
| CN114846153A (en) * | 2020-01-17 | 2022-08-02 | 深圳华大智造科技股份有限公司 | Method for synchronously sequencing sense strand and antisense strand of DNA |
| JP2023510424A (en) * | 2020-01-17 | 2023-03-13 | 深▲せん▼華大智造科技有限公司 | Methods for Synchronizing Sequencing of Sense and Antisense Strands of DNA |
| CN115516104A (en) * | 2020-03-03 | 2022-12-23 | 加利福尼亚太平洋生物科学股份有限公司 | Methods and compositions for sequencing double-stranded nucleic acids |
| EP4114966B1 (en) * | 2020-03-03 | 2025-02-26 | Pacific Biosciences Of California, Inc. | Methods and compositions for sequencing double stranded nucleic acids |
| US12435370B2 (en) | 2020-03-03 | 2025-10-07 | Pacific Biosciences Of California, Inc. | Methods and compositions for sequencing double stranded nucleic acids |
| WO2021180791A1 (en) * | 2020-03-11 | 2021-09-16 | F. Hoffmann-La Roche Ag | Novel nucleic acid template structure for sequencing |
| US11486004B2 (en) | 2020-07-13 | 2022-11-01 | Singular Genomics Systems, Inc. | Methods of sequencing circular template polynucleotides |
| WO2022015600A3 (en) * | 2020-07-13 | 2022-03-03 | Singular Genomics Systems, Inc. | Methods of sequencing complementary polynucleotides |
| US12110550B2 (en) | 2020-07-13 | 2024-10-08 | Singular Genomics Systems, Inc. | Methods of amplifying circular polynucleotides in situ |
| US12139759B2 (en) | 2020-07-13 | 2024-11-12 | Singular Genomics Systems, Inc. | Methods of sequencing circular template polynucleotides |
| WO2022018055A1 (en) * | 2020-07-20 | 2022-01-27 | Westfälische Wilhelms-Universität Münster | Circulation method to sequence immune repertoires of individual cells |
| CN114672546A (en) * | 2020-12-24 | 2022-06-28 | 郑州思昆生物工程有限公司 | A kind of template polynucleotide paired-end sequencing method and kit |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12270070B2 (en) | Single stranded circular DNA libraries for circular consensus sequencing | |
| JP7717903B2 (en) | Creation of single-stranded circular DNA templates for single-molecule sequencing | |
| WO2019086531A1 (en) | Linear consensus sequencing | |
| EP3532635B1 (en) | Barcoded circular library construction for identification of chimeric products | |
| US12110534B2 (en) | Generation of single-stranded circular DNA templates for single molecule sequencing | |
| US11168360B2 (en) | Circularization methods for single molecule sequencing sample preparation | |
| US20200308576A1 (en) | Novel method for generating circular single-stranded dna libraries | |
| US11976275B2 (en) | Generation of double-stranded DNA templates for single molecule sequencing | |
| EP3682027A1 (en) | Hybridization-extension-ligation strategy for generating circular single-stranded dna libraries | |
| US20240209414A1 (en) | Novel nucleic acid template structure for sequencing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18795643 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18795643 Country of ref document: EP Kind code of ref document: A1 |