[go: up one dir, main page]

WO2025178954A1 - Methods for sequencing and counting nucleic acids - Google Patents

Methods for sequencing and counting nucleic acids

Info

Publication number
WO2025178954A1
WO2025178954A1 PCT/US2025/016475 US2025016475W WO2025178954A1 WO 2025178954 A1 WO2025178954 A1 WO 2025178954A1 US 2025016475 W US2025016475 W US 2025016475W WO 2025178954 A1 WO2025178954 A1 WO 2025178954A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
sample
random
reads
sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/016475
Other languages
French (fr)
Inventor
Robert MELTZER
Christopher D'amato
Yi XUE
Trinity SMITHERS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Illumina Inc
Original Assignee
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina Inc filed Critical Illumina Inc
Publication of WO2025178954A1 publication Critical patent/WO2025178954A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention provides methods for nucleic acid sequencing in which the number of sequence reads is representative of the number of molecules that were present in a sample.
  • nucleic acids in a sample are copied in a manner such that each copy has a segment of bases that uniquely identifies the original molecule. This can be done by capturing the target molecules with primer oligonucleotides that each have a random sequence at the 3' end. The primer oligonucleotides anneal to the target molecules at random locations, due the random 3' ends. When the primer oligonucleotides are extended to copy the target molecules, the 5' ends of the resultant copies will include random stretches of bases.
  • the random start sites provide each of the copies of the nucleic acid molecules with a unique sequence.
  • the unique sequence in each of the copies is copied (directly or as, e.g., a second-strand copy) from a segment of a transcript transcribed by an organism.
  • aspects of the disclosure provide methods of preparing a sample for sequencing and counting molecules therein.
  • the methods include providing a sample comprising a plurality of sample polynucleotides and generating a plurality of copies of the sample polynucleotides.
  • a copy of the plurality of copies comprises (i) a sample sequence copied from a sample polynucleotide of the plurality of sample polynucleotides, and (ii) a tag sequence copied from the sample polynucleotide, the tag sequence distinguishing said sample polynucleotide from other sample polynucleotides from the sample.
  • Dual IMI embodiments may include fragmenting the plurality of sample polynucleotides at a plurality of random break sites. Each copy may include the tag sequence copied from the sample polynucleotide at a 5' end and a second tag sequence at a 3' end, the second tag sequence copied from the sample polynucleotide from a segment adjacent one break site of the plurality of random break sites.
  • FIG. 2 shows compositions used in methods of the invention.
  • FIG. 3 shows melting temperatures over [Mg++] for randomer capture sequences.
  • the invention provides methods and compositions that are useful for nucleic acid sequencing methods.
  • claimed methods are useful with particle-templated instant partitions (PIPs) when sequencing target nucleic acids (PIPseq).
  • PIPs particle-templated instant partitions
  • Methods disclosed herein are optimized for preserved, fixed, or otherwise challenging samples such as formalin-fixed, paraffin embedded (FFPE) tissue samples.
  • Methods and compositions of the invention are quantitative in that they are useful to determine identities and quantities of individual molecules present in samples.
  • methods herein are useful for transcriptomics, to measure and show what quantities of different RNA transcripts are being expressed in samples.
  • aqueous partitions such as PIPs, methods may be performed for individual cells and even for multiple individual cells that are liberated from, or interrogated within, a single sample.
  • a count of reads that map to each transcript provides a count of how many transcripts from each gene are in a sample (e.g., single cell) and those counts can serve as expression levels, or an expression profile, or a transcriptome of the sample.
  • An IMI may be understood to be a transcript-informed sequence that is also used for transcript capture using a randomer capture sequence or that is copied from a random fragmentation position. With random priming (or random fragmentation) the identifying information at the end(s) of each molecule is generated by the transcript sequence. That identifying information is a function of gene identity and the position at which each transcript was annealed to the capture moiety. Randomer capture with IMI analysis has multiple potential uses including, for example, analysis of non-polyadenylated RNA; bacterial transcriptomics; viral gene transcription; analysis of non-coding RNA (IncRNA, etc.); analysis on sequencing platforms that only support single-end reads; and sequencing on other platforms such as Ultima sequencing.
  • Methods may use an IMI or may use dual-IMIs.
  • dual IMI embodiments each molecule has randomized initiation positions at both 3’ and 5’ ends.
  • Dual IMI analysis may have particular advantages in identification of very short transcript fragments.
  • a specific application of dual-IMI approaches may include recovery of severely fragmented RNA from clinical formaldehyde fixed samples or forensic or archeological samples.
  • Preferred embodiments of the disclosure subject the target nucleic acid to fragmentation, even to fragmentation conditions and chemistries that would be recognized as too promiscuous for conventional methods, causing too many random breaks, yielding many short fragments that terminate at essentially random points of breakage.
  • the 3' random ends of the capture oligos anneal to those fragments. After a capture oligo anneals to a fragment, it is extended by a polymerase or reverse transcriptase — copying the fragment until the enzyme reaches the end, the random point of breakage, of the fragment.
  • the enzyme generates a copy of the fragment for which the 5' end starts with the capture oligo and the 3' includes a copy of the broken end of the fragment (plus any additional sequence that may be added 3' of that copy).
  • Some embodiments use template-switching oligos.
  • Some embodiments use blunt-ending and adapter ligation.
  • the 5' end of the copy has one IMI generated by random sequence of the capture oligo.
  • the 3' end of the copy has a second IMI generated by copying the segment of the fragment near the random point of breakage.
  • each copy of one of the fragments includes first and second IMIs, or dual IMIs.
  • a transposase is used to randomly cut the cDNA.
  • Tn5 transposase
  • RNA sequencing by direct tagmentation of RNA/DNA hybrids PNAS117 (6) 2886- 2893, incorporated by reference.
  • the Tn5 transposase randomly binds and cuts doublestranded RNA/DNA and attaches its end sequence to the random cut site.
  • some embodiments of the invention use Tn5 transposase to directly tagment RNA/DNA hybrids and form polynucleotide libraries with intrinsic molecular identifiers (essentially unique sequences of bases originating in genetic material of the organism or biological system being studied).
  • Tn5 a RNase H superfamily member
  • the desired oligo is preferably a PCR handle (aka a universal primer binding site, a sequencing adaptor, a synthetic oligo of known sequence to which a PCR primer anneals, etc.).
  • Methods of the invention may be used with various amounts of input sample, from single cells to large numbers of cells, with a dynamic range spanning numerous orders of magnitude.
  • each copy of a fragment has a first IMI at one end and a second IMI at the other end.
  • Those copies are preferably subject to amplification (either with adapter ligation or with tailed primers) to generate amplicons (e.g., with sequencing adaptors and/or primer binding sites at both ends).
  • amplicons e.g., with sequencing adaptors and/or primer binding sites at both ends.
  • all of the amplicons have first and second IMIs at their respective ends.
  • an IMI functions like a molecular barcode, it is not the case that dual IMIs are equivalent to a single, but double-length, barcode.
  • Methods and compositions of the invention are compatible with cell preparation techniques that include aldehyde or formaldehyde fixation protocols.
  • Cell fixation and cell preservation techniques incorporate programmable fixation times, reversible bond formation and cleavage, chemo-selective reactions, and analyte recovery using, e.g., materials and techniques such as those discussed in Gallion, 2021, Preserving single cells in space and time for analytical assays, Trends Anal Chem 122: 115723, incorporated by reference.
  • Samples may be obtained from fixed cells or fixed tissue blocks. Samples that are formaldehyde-fixed, paraffin-embedded (FFPE) may be used.
  • FFPE formaldehyde-fixed, paraffin-embedded
  • methods of the disclosure may include steps from nuclei extraction from FFPE and formaldehyde fixed tissue. According to methods of the invention, fixed cells or nuclei may be captured as normal. Methods may include treatment with proteinase K, optionally using heat-triggered proteinase K activation, with added lysis buffer enzymes to efficiently dissolve and liberate cells and/or nuclei.
  • nucleic acids of interest typically include mRNA or precursor mRNA transcripts.
  • Other RNAs, such as ribosomal RNA (rRNA) may not be of-interest to the assay being performed.
  • Methods of the invention include a step for the depletion or removal of nontarget nucleic acids such as rRNA.
  • Ribosomal RNA removal using CRISPR may generally be referred to as CRISPR-based depletion or CRISPR-depletion or CRISPR ribodepletion or similar when applied to ribosomal RNA.
  • CRISPR-depletion technology harnesses the specificity of CRISPR to degrade abundant, uninformative sequences.
  • CRISPR-depletion may be integrated into a stranded total RNA sequencing library prep protocol by adding Cas9/RNA complexes after the adapter ligation step.
  • the Cas9/RNA complexes include a pool of guide RNAs that target and deplete unwanted sequences.
  • CRISPR-depletion technologies are not specific to sequencing platforms.
  • CRISPR-depletion or bulk ribodepletion reagents may be used to remove overabundant human, mouse, or rat rRNA from RNA-Seq libraries to improve sequencing sensitivity and performance. Kits are available commercially to provide a bulk post-library depletion reagent with multiplexing format - designed to remove overabundant human 5S, 5.8S, 18S, 28S, 45S (precursor), mitochondrial 12S and 16S ribosomal RNA.
  • CRISPR-based depletion may include the methods and materials for CRISPR-depletion sold under the trademark CRISPRCLEAN by Jumpcode Genomics, Inc. (San Diego, CA).
  • CRISPR-depletion may use materials or techniques described in US 2022/0145359 or US 2023/0265528, incorporated by reference.
  • magnetic pull-down depletion is an option, using rRNA-specific probes linked to magnetic beads, or example.
  • Single cell embodiments may involve isolating single cells into aqueous partitions or compartments to sequence and count nucleic acid molecules of a single cell. Isolation into partitions may be accomplished by any suitable mechanism and any suitable type of aqueous partition may be used. Exemplary suitable partitions include droplets, wells in a plate, or other fluid portioning structures. For example, the partitions may be wells, cavities, pockets, or openings in a pico-, nano-, or microtiter plate or substrate, or fluidic harbors (see, e.g., US Pub 2010/0041046 Al, incorporated by reference).
  • the partitions may be well in a multi-well plate such as a 96-well plate, 384 well plate, a 1536 well plate, a 3456 well plate, or a 9600 well plate.
  • the partitions may be separate chambers (see, e.g., 20210178395 Al, incorporated by reference).
  • the partitions may be distinct regions defined within a fluidic device (see, e.g., 20200269248 Al, incorporated by reference).
  • the partitions are droplets of an emulsion such as a water-in-oil (W/O) emulsion or a water-in-oil-in-water (W/O/W) emulsion.
  • W/O water-in-oil
  • W/O/W water-in-oil-in-water
  • template particles are in the mixture that is sheared or vortexed (e.g., dozens, hundreds, thousands, tens of thousands, more) is the number of droplets that are formed (as well as some "satellite droplets, or mere bubbles, of miscellaneous sizes and integrities that are not relevant to downstream analysis and can be simply ignored).
  • the resultant droplets that each include a template particle are monodisperse (e.g., same number of polymer subunits among the beads aka particles, and/or same mass each, and/or essentially same size/volume among the particles, e.g., same diameter when viewed under a microscope)
  • the resultant droplets that each include a template particle are monodisperse (each the size of one template particle plus a small, thin shell of aqueous fluid around it). If a droplet forms that contains two or more template particles, then during shearing or vortexing, that droplet breaks into one droplet per each template particle.
  • the capture moiety e.g., 3' sequence of the hybrid capture oligo
  • genomic information that is encoded within the sequence of the source gene for a molecule. Where the other end is generated by random fragmentation, both ends of each molecule are informed by the specific gene identity and the position of the molecule along that gene.
  • FIG. 2 shows compositions used in methods of the invention.
  • Methods and compositions of the invention preferably use a template particle 205, such as a hydrogel bead, linked to one or more capture oligos 207.
  • a 3' end of the capture oligo 207 defines a primer 219 for hybrid capture of a target nucleic acid 213.
  • the primer 219 preferably includes a stretch of random bases and may be referred to as a randomer or random priming sequence. Any suitable number (e g., from 3 to 30, or fewer, or more) bases may constitute the random priming sequence.
  • Preferred embodiments use about 6 to about 10, e.g., 8, random bases to make up the primer 219.
  • the capture oligo hybridizes to the target nucleic acid 213 at a random location and primes the synthesis of a first strand copy 235, where a 5' end of the first strand copy 235 begins at a random start location 232 in the target nucleic acid 231.
  • the first strand copy has a 3' end that includes a copy of bases adjacent a second random location 231.
  • Amplification may be performed to generate amplicons ready for sequencing, or sequencing libraries, from the target molecules.
  • Each amplicon may have an IMI or dual IMIs.
  • the disclosed system may use IMIs to deduplicate sequence reads and count template molecules in the sample.
  • Preferred embodiments use dual IMIs.
  • the dual IMI system involves randomer capture (by a segment of random bases on a 3' end of the capture oligo, e.g., that is attached to a bead) to read a random sequence from a 3 1 end of each molecule (e.g., template RNA).
  • Libraries have randomized segments at both 5' and 3' ends of molecules. Either or both may be used as IMI(s) for sequence read deduplication.
  • Attachment of the end sequence to the cDNA at the random cut site produces a construct, a contiguous DNA molecule that includes a first PCR handle (PEI), a cell barcode, a capture segment, a portion of the cDNA optionally terminating at the random cut site, and a second PCR handle (PE2).
  • PEI first PCR handle
  • PE2 cell barcode
  • Amplification of the construct yields amplicons.
  • constructs are amplified with a P5-PE1 hybrid oligo and P7 index primer directly into a sequencing library.
  • the library may be sequenced to assess RNA expression, for example, as described in Hrdlickova, 2017, RNA-Seq methods for transcriptome analysis, WIREs RNA 8(1): 10.1002, incorporated by reference.
  • Constructs or amplicons may include certain primer and index sequences or copies thereof, such as, P5s and P7s.
  • Those sequences may be any arbitrary sequence useful in downstream analysis. For example, they may be additional universal primer binding sites or sequencing adaptors.
  • either or both of the P5s and P7s may be arbitrary universal priming sequence (universal meaning that the sequence information is not specific to the naturally occurring genomic sequence being studied but is instead suited to being amplified using a pair of cognate universal primers, by design).
  • the index segment may be any suitable barcode or index such as may be useful in downstream information processing.
  • Libraries may be sequenced by any suitable method. Suitable methods include Sanger sequencing, Illumina or Ultima sequencing, Roche pyrosequencing, single-molecule, long-read sequencing using platforms offered by Pacific Biosciences or Oxford Nanopore.
  • An example of a sequencing technology that can be used is Illumina sequencing. Illumina sequencing is based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers. Genomic DNA is fragmented and attached to the surface of flow cell channels. Four fluorophore-labeled, reversibly terminating nucleotides are used to perform sequencing. After nucleotide incorporation, a laser is used to excite the fluorophores, and an image is captured, and the identity of the first base is recorded.
  • SAM sequence alignment map format
  • Other methods useful for processing and analyzing sequence reads are discussed in U.S. Pat. No. 8,209,130, which is incorporated by reference. Determining gene expression generally involves counting numbers of unique sequence reads that uniquely map to a human reference genome. Mapping reads to a reference to identify genes may be performed using computer software packages known in the art.
  • mapping reads to a reference and identifying genes gives a quantitative result when reads are deduplicated to yield one read per mRNA from which those reads originated. Because each mRNA is typically copied into cDNA and each cDNA is typically copied into an unpredictably large number of amplicons in the sequencing library, and because each library member is often amplified or read redundantly as part of a sequencing technique, a number of raw sequence reads does not necessarily correlate to numbers of input molecules from the single cells. Nevertheless, one cell may include abundant transcripts that map to one gene.
  • compositions and methods of the invention give each cDNA at least one unique intrinsic identifier (and preferably dual-IMIs) that can be identified within, and used to deduplicate, sequence reads. After those sequence reads are identified by gene and deduplicated, then counts of those reads provide a quantitative measure of gene expression levels. Methods may include, prior to the deduplicating step, saving the sequence reads in memory, coupled to at least on processor in a computer system, as a FASTA or FASTQ file, wherein the deduplicating and mapping are performed by the computer system. [0069] As discussed, methods of the invention are useful for scRNA-Seq and specifically for expression analysis.
  • cells are isolated into, and lysed within, aqueous partitions with capture oligos.
  • the capture oligos anneal to RNAs released from the cells.
  • the capture oligos preferably include partition-specific barcodes and PCR handles. Once the capture oligos have hybridized to the RNAs, those duplexes may be released from partitions and pooled at any subsequent stage. Because capture oligos with partition-specific barcodes are used to capture and tag RNA from cells isolated in the partition, any arbitrary number of cells may be captured in parallel (simultaneously).
  • the cell barcodes in the sequencing data can be used to “bin” the sequence data by original cell, i.e., assign each sequence read (or assembled contigs or sequences therefrom) back to originating cells.
  • Read deduplication 131 and transcript counting is preferably performed by a computer system operably linked to a sequencing instrument and executing program instructions causing the computer system to perform those functions.
  • sequence reads may arrive at the computer system in FASTQ format.
  • Each entry — each "sequence read — in a FASTQ file (or FASTA) will include a segment of sequence information read from the target molecules.
  • Those sequence reads will also each include at least one IMI.
  • each target molecule will generate two reads, a forward read and a reverse read, and each read will have an IMI read from the target molecule.
  • sequence information is read from the target molecule is used to identify the gene of origin for that molecule (e.g., transcript). But each IMI is also read from the target molecule.
  • the computer system can execute program software to deduplicate the reads (simply identifying all duplicates and saving only one, i.e., "collapsing" the reads to a single read; or by leaving the FASTQ files intact but only "counting" all duplicate reads as 1, e.g., in a count file.
  • the system may also identify gene information for each read, e.g., by mapping to a reference or by querying a transcript database.
  • Read mapping can proceed by known methods including, for example, by methods that involve pairwise alignment of each read to a reference, such as a published human genome such as the 36 th build of the human genome refer e d to in industry as HG36 or hg36. Comparison to references may also proceed by building hashes of k-mers in the reference and in the query (the sequence read) and looking up the hashed k-mers of the query in the target (the reference). Read-mapping may involve transforming each sequence in order or in characters via an informatic transform such as the Burroughs-Wheeler transform (BWT) after which comparison of the BWT of the query to the target is trivial.
  • BWT Burroughs-Wheeler transform
  • RNA transcript a gene (as it is found in the DNA within the genome of an organism that is being studied) that has been transcribed into an RNA transcript, that was copied with a randomer and amplified to generate amplicons with at least one IMI, which amplicons are sequenced to generate sequence reads that include the IMI is assigned to the sequence read, and by implication, the RNA transcript is identified as having been transcribed from that gene. For that gene, each unduplicated read is used to increment a count of transcripts by one.
  • the read counts are a measure of a number of actual transcript molecules that were present in the sample.
  • the gene identities and read counts are a measure of expression levels of the gene in that cell and are also thus a transcriptome or transcriptomic profile for the cell.
  • Each copy includes (i) a sample sequence copied from a sample polynucleotide of the plurality of sample polynucleotides, and (ii) a tag sequence copied from the sample polynucleotide, the tag sequence distinguishing said sample polynucleotide from other sample polynucleotides from the sample.
  • the tag sequence functions like an IMI discussed above.
  • the information of the tag sequence is information that was in the sample polynucleotides originally, i.e., it is genetic information of the organism being studied.
  • Generating the copies may include annealing, to the sample polynucleotide, a primer (e.g., random er of a capture oligo) having a random sequence of bases at a 3' end and extending the primer to generate the tag sequence.
  • the random sequence may be about 8 bases in length, e.g., about 6 to 9.
  • the plurality of sample polynucleotides may be RNA and each of the plurality of copies of the sample polynucleotides may include a random tag sequence copied from the RNA.
  • the unique sequence in each of the copies is copied from a segment of a transcript transcribed by an organism.
  • the methods may include performing the recited steps for a plurality of cells to generate, for each cell of the plurality, a transcriptome profile based on, for that cell, the count of mapped unique reads.
  • the fluid may comprise reagents such as surfactants (e.g., octylphenol ethoxylate and/or octylphenoxypolyethoxyethanol), reducing agents (e.g., DTT, beta mercaptoethanol, or combinations thereof).
  • surfactants e.g., octylphenol ethoxylate and/or octylphenoxypolyethoxyethanol
  • reducing agents e.g., DTT, beta mercaptoethanol, or combinations thereof.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Plant Pathology (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides methods for sequencing (127) and counting nucleic acids. Methods may include making copies (113) of nucleic acid molecules from random start sites, sequencing (127) the copies to obtain sequence reads, deduplicating (131) the sequence reads to yield only unique reads, and mapping the unique reads to genes and, for each of the genes, providing a count of mapped unique reads.

Description

METHODS FOR SEQUENCING AND COUNTING NUCLEIC ACIDS
TECHNICAL FIELD
[0001] The disclosure relates to determining identity and quantity of nucleic acid molecules in a sample by sequencing.
BACKGROUND
[0002] Nucleic acid sequencing typically involves an amplification step that results in numerous copies of a target nucleic acid. Sequencing instruments use biochemical reactions to determine the sequence of bases in amplicons and write a character string, known as a sequence, read, in computer memory, conventionally using the A, C, G, and T characters to represent base information in the target nucleic acid. Due to the amplification step, each molecule is likely to be sequenced redundantly.
[0003] Organisms respond to different environments, metabolic demands, and regulatory signals by expressing different genes at different levels. Measuring transcript levels of single cells, known as single-cell transcriptomics, offers the ability to understand complex cell populations and developmental pathways.
[0004] An approach to single-cell transcriptomics could involve sequencing all of the RNA from a single cell (referred to as single-cell RNA sequencing, or scRNA-Seq) and analyzing the resultant sequence reads to determine the identities and quantities of the transcripts that were being expressed in the single cell. Unfortunately, due to the amplification common in existing sequencing techniques, the molecules are sequenced redundantly, and the number of sequence reads is not informative of the number of transcripts that were sequenced.
SUMMARY
[0005] The present invention provides methods for nucleic acid sequencing in which the number of sequence reads is representative of the number of molecules that were present in a sample. According to the invention, nucleic acids in a sample are copied in a manner such that each copy has a segment of bases that uniquely identifies the original molecule. This can be done by capturing the target molecules with primer oligonucleotides that each have a random sequence at the 3' end. The primer oligonucleotides anneal to the target molecules at random locations, due the random 3' ends. When the primer oligonucleotides are extended to copy the target molecules, the 5' ends of the resultant copies will include random stretches of bases. The random stretch of bases near the 5' ends of the copies serve as unique identifiers for each target molecule. After those copies are amplified and sequenced, the sequence reads will include the information from the random stretch of bases. That information is copied from, and is thus intrinsic to, the target molecules and the source genome. After sequencing, the random stretch of bases intrinsically identifies the molecule from which the sequence read was obtained. Sequence reads can be deduplicated using the intrinsic molecular identifiers, after which a count of unique sequence reads is representative of a count of molecules in the sample (or alternatively may be an exact match to the number of molecules). The sequence reads may also be mapped to genetic reference information, such as a reference genome, to identify the genes from which those identified molecules were transcribed.
[0006] Not only do methods of the invention make use of an intrinsic molecular identifier, or IMI, in each sequence read, preferred embodiments also make use of random fragmentation of the molecules in the sample. Due to being randomly fragmented, the end of each fragment includes a second stretch of random bases. When those fragments are copied with the random primer oligonucleotides, the random primer stretch will create a first IMI and, when the first strand copy reaches the end of the fragment, the copy will include a second IMI — the information copied from a segment of bases at the 5' end of the fragment, i.e., copied into the 3' end of the copy. Because each copy of the target molecules will include two IMIs, the resulting amplicons will include dual IMIs, as will sequence reads. When the amplicons are sequenced by paired-end sequencing, paired reads will include dual IMIs — one per member of the read pair — that associate the reads to one fragment from the sample.
[0007] The dual IMIs are well suited to sample preparation workflows that highly fragment target nucleic acid molecules from the sample. For example, where the sample is a fixed or preserved sample, such as a formaldehyde fixed or formalin-fixed, paraffin embedded (FFPE) sample, and/or where the target nucleic acids are RNA, some nucleic acid extraction techniques and reagents may cause breakage of the molecules. Methods of the present invention exploit that breakage and may, in fact, use increased amounts of reagents such as Mg2+ that cause such breakage, to generate random broken ends among the target molecules. Thus where, for example, RNA transcripts are highly fragmented after extraction FFPE, those molecules may be captured with primer oligos having random 3' sequences that are extended with reverse transcriptase to make a first strand copy. Additionally, or alternatively, nucleic acids may be fragmented enzymatically using e.g., a transposase. Fragmentation of double stranded cDNA may optionally be performed after limited cycle amplification from bead-bound primary cDNA strand.
[0008] Random priming places a first IMI in a 5' end of the copy. Where the reverse transcriptase encounters a broken end of, and stops copying, the fragment, copying to the broken end places a second IMI in a 3' end of the first strand copy. In such embodiments, the copies include first and second IMIs. Those copies are amplified and sequenced to generate sequence reads. The reads are mapped to a reference to identify which genes were being expressed as the RNA transcripts. The reads are de-duplicated using the dual-IMIs and a count of deduplicated reads is a measure of an expression level of each gene identified by read-mapping.
[0009] In certain aspects, the invention provides methods for nucleic acid analysis. Methods include fragmenting nucleic acids at random breakpoints to yield a plurality of fragments. A 3' end of capture oligo is annealed to a binding site of one fragment of the plurality of fragments. The 3' end comprises a random base sequence. Methods include extending the annealed capture oligo to make a copy comprising: a 5' identifier copied from the binding site, and a 3' identifier copied from an end of the fragment. I.e., the copies include dual IMIs. Methods include amplifying the copy to make amplicons that include copies of the 5' and 3' identifiers, wherein the 5' and 3' identifiers uniquely associate the amplicons with the one fragment.
[0010] The plurality of fragments may be amplified to yield corresponding sets of amplicons, wherein each set of amplicons shares a first and second identifier copied from a corresponding fragment, i.e., dual IMIs. The sets of amplicons may be sequenced to yield sequence reads that may be deduplicated based on the first and second identifiers to provide a set of unique sequence reads. Methods may include mapping the unique sequence reads to a reference to identify a gene for each fragment and providing an expression level of each identified gene based on a number of fragments mapped to that gene. [0011 ] In some embodiments, the nucleic acids are obtained from a single cell, and the fragments are sequenced to obtain sequence reads comprising first and second identifiers, the reads are deduplicated and reference-mapped to identify genes. Methods may include providing gene expression levels from the identified genes and counts of the deduplicated reads.
[0012] In some embodiments, the capture oligo is one of a plurality of oligos attached to a bead, wherein each of the plurality of oligos has a copy of a bead-identifying barcode and a stretch of random bases at a 3' end. The bead may be one of a plurality of beads, and the method may include encapsulating the beads in aqueous compartments with sample polynucleotides.
[0013] Preferably the beads are substantially uniform in size and mass, the compartments are monodisperse droplets, and encapsulating is done by providing a mixture comprising the beads, an aqueous first liquid, and an immiscible second liquid and shearing the mixture to generate the monodisperse droplets simultaneously. At least one of the aqueous compartments may include a cell, wherein the sample polynucleotides are RNA transcripts in the single cell, and the method includes capturing, amplifying, and sequencing the RNA transcripts to yield sequence reads, wherein each sequence read includes a first and second identifiers sequence with information that (i) uniquely associates that read with one transcript, and (ii) is obtained from that transcript.
[0014] Methods may include counting to obtain counts of unique or unduplicated reads and identifying a gene of origin for each read and providing gene expression levels for the single cell from the identified genes and read counts.
[0015] In related aspects, the invention provides methods for sequencing and counting nucleic acids. Methods may include making copies of nucleic acid molecules from random start sites, sequencing the copies to obtain sequence reads, deduplicating the sequence reads to yield only unique reads, mapping the reads to genes and, for each of the genes, providing a count of mapped unique reads. The nucleic acid molecules may be, for example, RNA transcripts from a single cell. Methods may include isolating the single cell in a compartment (such as an aqueous droplet or well) and releasing the RNA transcripts from the cell. The method may include copying the nucleic acid from the start sites using primers comprising 3' random oligomers, thereby providing the copies with the random start sites. The random start sites provide each of the copies of the nucleic acid molecules with a unique sequence. In some embodiments, the unique sequence in each of the copies is copied (directly or as, e.g., a second-strand copy) from a segment of a transcript transcribed by an organism.
[0016] In dual-IMI embodiments and fixed/preserved embodiments or in other suitable embodiments, methods may include fragmenting the nucleic acid molecules at random fragmentation sites, prior to making the copies. The nucleic acid molecules may be RNA (e.g., transcripts released from a single cell), and the fragmenting step may include incubating the RNA in the presence of Mg2+ or other tools or reagents to promote fragmentation. Preferably each of the first strand copies includes a first random sequence at a 5' end adjacent one of the random start sites and a second random sequences at a 3' end adjacent one of the random fragmentation sites.
[0017] The copies that include one or dual IMIs may be sequenced. Sequencing may include amplifying the copies to make amplicons and sequencing the amplicons (optionally PCR and/or optionally bridge amplification on a surface of a flow cell). In some embodiments a 5' end of each of the copies includes a 5' unique sequence adjacent to one of the random start sites, in which the 5' unique sequence is unique among the copies. The nucleic acid molecules may be mRNA transcripts from a single cell. Methods may include using the count of mapped unique reads as a measure of expression levels. Methods may include, prior to the deduplicating step, saving the sequence reads in memory, coupled to at least one processor in a computer system, as a FASTA or FASTQ file, wherein the deduplicating and mapping are performed by the computer system.
[0018] In embodiments, methods are performed for a plurality of cells to generate, for each cell of the plurality, a transcriptome profile based on, for that cell, the count of mapped unique reads. A computer system may be used for generating a uniform manifold approximation and projection (UMAP) and creating from the UMAP a 2D plot showing the plurality of cells clustered by properties.
[0019] Aspects of the disclosure provide methods of preparing a sample for sequencing and counting molecules therein. The methods include providing a sample comprising a plurality of sample polynucleotides and generating a plurality of copies of the sample polynucleotides. A copy of the plurality of copies comprises (i) a sample sequence copied from a sample polynucleotide of the plurality of sample polynucleotides, and (ii) a tag sequence copied from the sample polynucleotide, the tag sequence distinguishing said sample polynucleotide from other sample polynucleotides from the sample. The generating step may include annealing to the sample polynucleotide, a primer having a random sequence of bases at a 3' end and extending the primer to generate the tag sequence. The plurality of sample polynucleotides may include RNA and each of the plurality of copies of the sample polynucleotides may include a random tag sequence copied from the RNA. Methods may include amplifying the plurality of copies of the sample polynucleotides to generate amplicons that comprise sequencing adaptors.
[0020] The amplicons may be sequenced to generate sequence reads that include sample sequence information and tag sequence information. Methods may include deduplicating the sequence reads to save only unique sequence reads and providing counts of the plurality of sample polynucleotides in the sample from numbers of the unique sequence reads.
[0021] The sequence reads or the unique sequence reads may be associated with genes by comparison to genetic reference information and may be used to provide expression levels for the genes from the counts of the plurality of sample polynucleotides. In some embodiments, the sample comprises a cell isolated in a compartment and the method includes releasing the plurality of sample polynucleotides from the cell into the compartment.
[0022] The generating step may include annealing, to the sample polynucleotide, a primer of a plurality of primers having at least one barcode and a random priming sequence. In certain embodiments, the plurality of primers are attached to a solid support such as a bead (e.g., a hydrogel bead). The plurality of primers may include sample barcodes that are specific to the solid support, e.g., "bead barcodes".
[0023] Dual IMI embodiments may include fragmenting the plurality of sample polynucleotides at a plurality of random break sites. Each copy may include the tag sequence copied from the sample polynucleotide at a 5' end and a second tag sequence at a 3' end, the second tag sequence copied from the sample polynucleotide from a segment adjacent one break site of the plurality of random break sites.
[0024] Methods may include amplifying the plurality of copies of the sample polynucleotides to generate amplicons that each include first and second identifier segments copied from the first and second tag sequence, respectively; sequencing the amplicons to generate sequence reads; and assigning the sequence reads to specific ones of the plurality of sample polynucleotides using data in the sequence reads from the first and second identifier segments.
[0025] In embodiments of the methods, unwanted or non-target nucleic acid may be depleted from the sample prior to the generating step. The plurality of sample polynucleotides may include mRNA or pre-mRNA and the unwanted or non-target nucleic acid may include one or more of ribosomal RNA, globin transcripts, mitochondrial RNA, or non-coding RNA.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] FIG. 1 diagrams a method for sequencing and counting nucleic acids.
[0027] FIG. 2 shows compositions used in methods of the invention.
[0028] FIG. 3 shows melting temperatures over [Mg++] for randomer capture sequences.
DETAILED DESCRIPTION
[0029] The invention provides methods and compositions that are useful for nucleic acid sequencing methods. In particular, claimed methods are useful with particle-templated instant partitions (PIPs) when sequencing target nucleic acids (PIPseq). Methods disclosed herein are optimized for preserved, fixed, or otherwise challenging samples such as formalin-fixed, paraffin embedded (FFPE) tissue samples. Methods and compositions of the invention are quantitative in that they are useful to determine identities and quantities of individual molecules present in samples. For example, methods herein are useful for transcriptomics, to measure and show what quantities of different RNA transcripts are being expressed in samples. Using, for example, aqueous partitions such as PIPs, methods may be performed for individual cells and even for multiple individual cells that are liberated from, or interrogated within, a single sample.
[0030] Methods and compositions of the invention provide a quantitative measure of individual nucleic acid molecules in a sample by strategically using segments of those nucleic acids as unique tags or barcodes, i.e., sequences from within those molecules that appear in sequence data after sequencing and can be used to associate each sequence read with a particular nucleic acid molecule from the sample. That tag or barcode sequence from each molecule functions as an identifier that is unique to the molecule from which the sequence information was read. That is, each sequence read has one or more segment(s) with a sequence of bases that are unique to, and intrinsic to, the nucleic acid molecule from which that sequence reads was obtained. That sequence of bases is intrinsic to the nucleic acid molecule in the sense that the information describing the sequence originates in the genetic material of the organism; the information of that identifier sequence is not added into a sample mixture in the form of synthetic oligonucleotides or other common human-made reaction reagents. Because each sequence read includes an intrinsic sequence that identifies the molecule from which the read was obtained, i.e., an "intrinsic molecular identifier" (IMI), reads from the same transcript that contain the same IMI can be treated as duplicates that show the presence of a molecule in a sample. Such reads can be deduplicated (i.e., counted as 1 in a read-count variable or file or by editing sequence read files to remove all but one representative "unduplicated" or "de-duplicated" read). Sequence read deduplication may be referred to as "collapsing" reads or "collapsing duplicate reads" (or "duplicated reads" or similar). The individual reads may also be analyzed to assign each read to a gene or transcript from which that read originated. Such an identification process may be referred to as read-mapping and may include, e.g., alignment to a reference genome or transcript database. After mapping and de-duplication, a count of reads that map to each transcript (e.g., by gene) provides a count of how many transcripts from each gene are in a sample (e.g., single cell) and those counts can serve as expression levels, or an expression profile, or a transcriptome of the sample.
[0031 ] An IMI may be understood to be a transcript-informed sequence that is also used for transcript capture using a randomer capture sequence or that is copied from a random fragmentation position. With random priming (or random fragmentation) the identifying information at the end(s) of each molecule is generated by the transcript sequence. That identifying information is a function of gene identity and the position at which each transcript was annealed to the capture moiety. Randomer capture with IMI analysis has multiple potential uses including, for example, analysis of non-polyadenylated RNA; bacterial transcriptomics; viral gene transcription; analysis of non-coding RNA (IncRNA, etc.); analysis on sequencing platforms that only support single-end reads; and sequencing on other platforms such as Ultima sequencing. [0032] Methods may use an IMI or may use dual-IMIs. In dual IMI embodiments, each molecule has randomized initiation positions at both 3’ and 5’ ends. Dual IMI analysis may have particular advantages in identification of very short transcript fragments. A specific application of dual-IMI approaches may include recovery of severely fragmented RNA from clinical formaldehyde fixed samples or forensic or archeological samples.
[0033] Methods of the present disclosure preferably obtain the unique IMIs by capturing nucleic acids from a sample by hybrid capture using capture oligos with random sequences on the 3' ends. Those capture oligos may be attached to beads (e.g., hydrogel beads) or other solid substrates and/or may have other functional sequences 5' of the 3' random ends (e.g., sequencing adaptors, cell barcodes, primer binding sites, etc.). Because the 3' ends of the capture oligos have effectively random sequences, the capture oligos anneal to the target nucleic acid molecules at effectively random locations. After those capture oligos are extended and amplified to generate amplicons that are sequenced, the resultant sequence reads will have an effectively random sequence of bases, or IMI, at the end corresponding to hybrid capture (typically, amplicons will be in both senses so an IMI will appear at a 5' end of about half the amplicons and a reverse complement of that IMI will appear at a 3' end of the other half).
[0034] Preferred embodiments of the disclosure subject the target nucleic acid to fragmentation, even to fragmentation conditions and chemistries that would be recognized as too promiscuous for conventional methods, causing too many random breaks, yielding many short fragments that terminate at essentially random points of breakage. The 3' random ends of the capture oligos anneal to those fragments. After a capture oligo anneals to a fragment, it is extended by a polymerase or reverse transcriptase — copying the fragment until the enzyme reaches the end, the random point of breakage, of the fragment. The enzyme generates a copy of the fragment for which the 5' end starts with the capture oligo and the 3' includes a copy of the broken end of the fragment (plus any additional sequence that may be added 3' of that copy). Some embodiments use template-switching oligos. Some embodiments use blunt-ending and adapter ligation. In any event, the 5' end of the copy has one IMI generated by random sequence of the capture oligo. The 3' end of the copy has a second IMI generated by copying the segment of the fragment near the random point of breakage. When such embodiments are performed, each copy of one of the fragments includes first and second IMIs, or dual IMIs. In some embodiments, a transposase is used to randomly cut the cDNA. This may be performed using a transposase such as Tn5. See Lin, 2020, RNA sequencing by direct tagmentation of RNA/DNA hybrids, PNAS117 (6) 2886- 2893, incorporated by reference. In brief, the Tn5 transposase randomly binds and cuts doublestranded RNA/DNA and attaches its end sequence to the random cut site.
[0035] Accordingly, some embodiments of the invention use Tn5 transposase to directly tagment RNA/DNA hybrids and form polynucleotide libraries with intrinsic molecular identifiers (essentially unique sequences of bases originating in genetic material of the organism or biological system being studied). In particular, Tn5, a RNase H superfamily member, binds to RNA/DNA hybrids similarly as to dsDNA and effectively cuts randomly and then ligates a desired oligo onto the hybrid. The desired oligo is preferably a PCR handle (aka a universal primer binding site, a sequencing adaptor, a synthetic oligo of known sequence to which a PCR primer anneals, etc.). Methods of the invention may be used with various amounts of input sample, from single cells to large numbers of cells, with a dynamic range spanning numerous orders of magnitude.
[0036] In dual IMI embodiments, each copy of a fragment has a first IMI at one end and a second IMI at the other end. Those copies are preferably subject to amplification (either with adapter ligation or with tailed primers) to generate amplicons (e.g., with sequencing adaptors and/or primer binding sites at both ends). Whatever adaptors or primer binding sites may be added, all of the amplicons have first and second IMIs at their respective ends. To the extent that an IMI functions like a molecular barcode, it is not the case that dual IMIs are equivalent to a single, but double-length, barcode. Using dual IMIs in the manner described protects against the (albeit low) possibility of non-unique hybrid capture by the capture oligos. Even if the capture oligos anneal to the fragments in a manner that generates a non-unique IMI, when the capture oligos are extended, the copy must terminate at an end of each fragment that is, itself, a random breakpoint that generating a second IMI for that copy, where the probability of generating the same first and second IMI from different molecules is vanishingly small. In dual IMI embodiments, after sequencing, each sequence read (if paired-end sequencing, then each read pair) has a pair of IMIs that is unique to the original molecule.
[0037] The benefits of high levels of fragmentation here are well-suited to techniques for analyzing nucleic acid in challenging samples such as FFPE samples. Such techniques often use levels of heat or ion concentrations that promote nucleic acid fragmentation in a manner conventionally thought of as problematic. Here, the levels of nucleic acid fragmentation, e.g., or RNA transcripts, are beneficial (e.g., at least because they facilitate dual-IMI sequencing and analysis) and well-suited to PIPseq in FFPE. Using IMIs or dual-IMIs according to methods of the disclosure, one may sequence and count nucleic acids in the sample. IMIs have benefits over other barcoding strategy in that the sequence information that identifies the molecule is also a part of (arose from) the molecule, and thus there is no requirement to add an extrinsic, synthetic barcoding reagent (e.g., barcoded primer or adaptor) with a barcode segment that "uses up" sequencing real-estate. Of particular importance with short-read sequencing platforms such as various next-generation sequencing (NGS) instruments, using an IMI means that one does not have to have a separate barcode that effectively subtracts from a number of bases that are read from a target for each sequence read. With 8 base (for example) identifier sequences and, e.g., a 35-base read length, with IMIs, each read covers 35 bases of target, whereas with other barcoding oligonucleotides, each sequence read can only include 27 bases from the target, because 8 bases of the read have to read from the barcode. Using dual-IMIs, methods provide additional assurance that each read can be mapped to one molecule (such that deduplicated reads are a tool for counting the original molecules) while still allowing all bases of a sequence read to be read from the sequence of the target nucleic acid. Thus, methods of the invention are useful for sequencing and counting nucleic acids.
[0038] FIG. 1 diagrams a method 101 for sequencing and counting nucleic acids. The method 101 preferably includes obtaining 105 nucleic acid molecules from a sample, optionally or preferably by a technique that includes fragmenting those molecules, e.g., at effectively random break sites. Those nucleic acid molecules are copied 109 from random start sites. The random start site of the copies is preferably provided by capturing those molecules with capture oligos that have 3' ends that anneal to the molecules at random sites. Those random 3' ends may include segments with sequences of random bases, degenerate bases, the random attachment of primer binding sites (e.g., by tagmentation with transposase or random fragmentation and adaptor ligations), other suitable methods, or combinations thereof. Due to the random start sites of the copies, the copies have sequence (e.g., proximal to the random start sites) that are effectively random for the purposes of deduplicating sequence reads and counting molecules. The method 101 preferably includes amplifying 113 the copies to make amplicons (e.g., the workflow may include making 109 a first-strand, cDNA copy of an RNA, copying the cDNA to make dsDNA with adaptors or primer binding sites at both ends, and amplifying the dsDNA by, e.g., polymerase chain reaction (PCR) to yield amplicons (which are, themselves, copies of the molecules from the sample). Each of the amplicons may include a first IMI due to random start site when making the first strand copy. In preferred embodiments, the molecules in the sample are fragmented and each of the amplicons includes a second IMI due to the random sites of fragmentation. In such embodiments, the first and second IMIs are preferably located at or near opposed ends of the amplicons, especially where paired-end sequencing will be used. The amplicons may preferably include adaptors, such as the Y-adaptors that are known in the art and are compatible with NGS instruments sold by Illumina, Inc. A set of amplicons with at least one IMI in each and with sequencing adaptors may be referred to as a sequencing library.
[0039] In some embodiments of the method 101, a sequencing library is prepared and held in a suitable container such as a test tube or microcentrifuge tube. Such a tube may be stored, preferably frozen, for some arbitrary amount of time. In some embodiments, the method 101 includes sequencing 127 the amplicons (e.g., by loading the sequencing library onto a sequencing instrument or by shipping the sequencing library to a genomic sequencing facility). Sequencing generally produces a plurality of sequence reads, and methods may include analyzing the sequence reads. The method 101 preferably includes deduplicating 131 the sequence reads to yield only unique reads, and mapping the unique reads to genes and, for each of the genes, providing a count of mapped unique reads. Mapping to genes may involve mapping to a transcriptomic reference or a human genome or other such lookup.
[0040] As discussed, the nucleic acid molecules may be obtained from any suitable sample. Suitable samples include clinical samples including, e.g., bodily fluid such as blood draws, or tissue samples such as biopsies (e.g., needle biopsies) or fine needle aspirates. Samples may be research samples from agriculture, e.g., crops or livestock, or forensic samples, metagenomic samples, environmental samples such as wastewater or food or water supply samples, e.g., for pathogen or purity testing. Certain preferred embodiments involve obtaining a preserved or fixed sample, such as a slice of tissue such as a tumor slice that has been preserved, e.g., formalin- fixed and paraffin embedded (FFPE). Biological material such as cells or nucleic acid may be isolated or liberated from samples according to methods known in the art. For example, bodily fluid samples may be centrifuged followed by supernatant removal, wash, and pellet resuspension to enrich for cells. Nucleic acids may be liberated from fixed samples by treating with appropriate reagents (e.g., proteinases) and washing with appropriate solutions or through the use of commercially-available kits such as the FFPE nucleic acid extraction kit sold under the trademark QIAMP by Qiagen. It is understood that techniques and reagents used with FFPE samples contribute to the generation of fragmented RNA (which conventionally compromises RNA capture and generates short RNA fragment length distributions).
[0041] Methods and compositions of the invention are compatible with cell preparation techniques that include aldehyde or formaldehyde fixation protocols. Cell fixation and cell preservation techniques incorporate programmable fixation times, reversible bond formation and cleavage, chemo-selective reactions, and analyte recovery using, e.g., materials and techniques such as those discussed in Gallion, 2021, Preserving single cells in space and time for analytical assays, Trends Anal Chem 122: 115723, incorporated by reference. Samples may be obtained from fixed cells or fixed tissue blocks. Samples that are formaldehyde-fixed, paraffin-embedded (FFPE) may be used. Working with FFPE samples may include deparaffinization protocols such as treatment with xylene, other solvents, heat, abrasion, or combinations thereof. Formalin fixed samples may be handled using reagents and techniques of the disclosure to recover nucleic acid suitable for sequencing and transcript counting.
[0042] Formaldehyde fixation can significantly complicate conventional RNA sequencing methods. Complications arise because formaldehyde fixation causes fragmentation of mRNA and can be difficult to reverse. Attempts to mitigate the effects of formaldehyde fixation have included proteolytic cleavage to release cross-linked transcripts. Here, methods of the disclosure may include steps from nuclei extraction from FFPE and formaldehyde fixed tissue. According to methods of the invention, fixed cells or nuclei may be captured as normal. Methods may include treatment with proteinase K, optionally using heat-triggered proteinase K activation, with added lysis buffer enzymes to efficiently dissolve and liberate cells and/or nuclei. Proteinase K- based lysis is performed at high temperature to promote digestion of amide cross-linking in the fixed samples. Methods may include delivery of chemical lysis reagents (including, e.g., proteinase K, detergents, solutions, or combinations thereof) to supplement release of mRNA. Compared to convention techniques, methods may include use of proteinase K in a buffer, as well as removal of (or exclusion or minimization of) EDTA, and the inclusion of high concentration of Mg2+. In a preferred embodiment, micelles are used to deliver lysis agents. Micellar delivery agents include block copolymer micelles, inverse micelles, spherical micelles, cylindrical micelles, and others. Typically, micelles are formed using one or more surfactant or an oil-in-water emulsion. Suitable lysis agents include Sarkosyl, SDS, and Triton X-100 among others. One or more surfactants is used to micellize lysis agents in an oil phase. Suitable surfactants for creating micelles include, for example, Ran or ionic Krytox. In addition, it may be useful to use a super-concentrated co-solvent to aid dissolution of the lysis agent. In some embodiments, a combination of fluoro-phase surfactant Krytox 157-FSH (acidic form) or neutralized form (ammonium counter-ion, potassium counter-ion or sodium counter-ion) in 0.05% to 5% in Novec 7500 or 7100 or Fuorinert to form micelles that include a lysis agent, such as Sarkosyl or SDS (e.g., at 0.5% to 5%).
[0043] Certain embodiments use EDTA free beads. EDTA is included in some conventional approaches to chelate or sequester ions (e.g., Mg2+) that contribute to fragmentation. Here, RNA fragmentation is advantageous. In fact, some embodiments of the present disclosure use an FFPE nucleic acid extraction kit but omit the use of EDTA (or use less than the provided or recommended quantity). The resultant fragmentation of RNA provides molecules with ends (break points) at random locations, thereby imbuing each molecule, and its copies and sequence reads, with an associated IMI.
[0044] One issue that may arise in some embodiments is the presence of non-target nucleic acids in the sample. For example, where methods of the invention are used for single cell transcriptomics, the nucleic acids of interest typically include mRNA or precursor mRNA transcripts. Other RNAs, such as ribosomal RNA (rRNA) may not be of-interest to the assay being performed. Methods of the invention include a step for the depletion or removal of nontarget nucleic acids such as rRNA. Any suitable approach may be used to remove rRNA such as, for example, rRNA depletion probes that anneal to rRNA and promote RNAseH mediated digestion of the rRNA, transposase-based targeted depletion, depletion by engineered nucleases (such as Cas9, zinc fingers, transcription activator-like effector nucleases, or others), capture to solid supports (e.g., magnetic beads) using e.g., catalytically inactive Cas9 (dCas9) or a homolog thereof with guide RNAs specific to rRNA, or any other suitable rRNA depletion technique. Methods for depleting non-target nucleic acids from a sample may use techniques described in US 11,115,853; US 10,472,666; US 9,005,891; US 11,149,297; US 10,421,992; and US 2023/0279490 Al, the contents of each of which are incorporated by reference. Suitable approaches for the removal of unwanted, non-target nucleic acid include the use of gene editing system originally associated with clustered regularly interspaced short palindromic repeats (CRISPR), such as the CRISPR-associated (Cas) enzyme 9 (Cas9). Ribosomal RNA removal using CRISPR may generally be referred to as CRISPR-based depletion or CRISPR-depletion or CRISPR ribodepletion or similar when applied to ribosomal RNA. CRISPR-depletion technology harnesses the specificity of CRISPR to degrade abundant, uninformative sequences. CRISPR-depletion may be integrated into a stranded total RNA sequencing library prep protocol by adding Cas9/RNA complexes after the adapter ligation step. The Cas9/RNA complexes include a pool of guide RNAs that target and deplete unwanted sequences. CRISPR-depletion technologies are not specific to sequencing platforms. CRISPR-depletion or bulk ribodepletion reagents may be used to remove overabundant human, mouse, or rat rRNA from RNA-Seq libraries to improve sequencing sensitivity and performance. Kits are available commercially to provide a bulk post-library depletion reagent with multiplexing format - designed to remove overabundant human 5S, 5.8S, 18S, 28S, 45S (precursor), mitochondrial 12S and 16S ribosomal RNA. CRISPR-based depletion may include the methods and materials for CRISPR-depletion sold under the trademark CRISPRCLEAN by Jumpcode Genomics, Inc. (San Diego, CA). CRISPR-depletion may use materials or techniques described in US 2022/0145359 or US 2023/0265528, incorporated by reference. Alternatively, magnetic pull-down depletion is an option, using rRNA-specific probes linked to magnetic beads, or example.
[0045] After or with any optional removal of non-target nucleic acids, methods of the invention may include enhancing RNA fragmentation with addition of divalent cations to lysis buffer. Methods may include fragmenting the plurality of sample polynucleotides at a plurality of random break sites, and further wherein the copy comprises the tag sequence copied from the sample polynucleotide at a 5' end and a second tag sequence at a 3' end, the second tag sequence copied from the sample polynucleotide from a segment adjacent one break site of the plurality of random break sites. In preferred embodiment such as for single-cell transcriptomics, the nucleic acid molecules are RNA and the fragmenting step includes incubating the RNA in the presence of Mg2+. Fragmented RNA intrinsically generates a dual IMI counting system in which amplification products with same start and end sequence must be replicates. A high concentration of Mg2+ not only promotes fragmentation (to yield random break sites at 5' ends of targets) but is also used to promote annealing on short randomer capture moieties (at random sites at the 3' end of the portion of the target molecule that will be copied). Methods described herein are compatible with any suitable sample collection and sample preparation front end. For example, some embodiments are applied to fixed samples such as FFPE tissue slices where target nucleic acids are liberated into solutions and subject to target capture and library preparation as described herein. Methods may include the use of enzymes and reagents for cap repair after fragmentation of RNA. Reagents and kits known in the art may add, e.g., a 5' methyl cap after fragmentation to stabilize the fragment until a second IMI is copied into a copy. Methods of the disclosure are also applicable to "single cell" analysis with any of a variety of sample types including, for example, bodily fluid samples, environmental samples, samples from culture, others, or combinations thereof.
[0046] Single cell embodiments may involve isolating single cells into aqueous partitions or compartments to sequence and count nucleic acid molecules of a single cell. Isolation into partitions may be accomplished by any suitable mechanism and any suitable type of aqueous partition may be used. Exemplary suitable partitions include droplets, wells in a plate, or other fluid portioning structures. For example, the partitions may be wells, cavities, pockets, or openings in a pico-, nano-, or microtiter plate or substrate, or fluidic harbors (see, e.g., US Pub 2010/0041046 Al, incorporated by reference). The partitions may be well in a multi-well plate such as a 96-well plate, 384 well plate, a 1536 well plate, a 3456 well plate, or a 9600 well plate. The partitions may be separate chambers (see, e.g., 20210178395 Al, incorporated by reference). The partitions may be distinct regions defined within a fluidic device (see, e.g., 20200269248 Al, incorporated by reference). In certain embodiments, the partitions are droplets of an emulsion such as a water-in-oil (W/O) emulsion or a water-in-oil-in-water (W/O/W) emulsion. In preferred embodiments, the partitions are a plurality of droplets that are formed essentially simultaneously by mixing together and shearing or vortexing an aqueous fluid that includes template particles with an oil. The template particles — typically small, hydrogel beads of hydrogel such as poly-acrylamide (PAA) or polyethylene glycol (PEG) — each cause the creation of one aqueous droplet when the mixture is sheared and, in that sense, each particle serves as a template for one droplet. However, many template particles are in the mixture that is sheared or vortexed (e.g., dozens, hundreds, thousands, tens of thousands, more) is the number of droplets that are formed (as well as some "satellite droplets, or mere bubbles, of miscellaneous sizes and integrities that are not relevant to downstream analysis and can be simply ignored). When the template particles are monodisperse (e.g., same number of polymer subunits among the beads aka particles, and/or same mass each, and/or essentially same size/volume among the particles, e.g., same diameter when viewed under a microscope), the resultant droplets that each include a template particle are monodisperse (each the size of one template particle plus a small, thin shell of aqueous fluid around it). If a droplet forms that contains two or more template particles, then during shearing or vortexing, that droplet breaks into one droplet per each template particle. When the mixture is sheared or vortexed, the monodisperse droplets, aka aqueous partitions, form very fast (typically under 30 seconds) and effectively instantly as compared to microfluidics. All of the monodisperse droplets form essentially simultaneously during partitioning. That formation yields a monodisperse emulsion (the satellite bubbles are ignored as background noise) of uniform, stable partitions, formed instantly and by the templating action of the hydrogel beads, hence they may be referred to as particle templated instant partitions (PIPs). Methods of working with particle templated partitions are described in WO 2019/139650; US 2020/0261879 Al; US Pat. 11,773,452; US 2021/0214792 Al; US 2021/0215591 Al; US 2021/0340596 Al; US 2021/0381064 Al; and US 2021/0214721 Al, the contents of each of which are incorporated by reference.
[0047] By whatever partitioning mechanism, methods may include isolating a single cell into a compartment or partition and releasing RNA transcripts from the cell, e.g., by lysing the cell in the partition. Some embodiments use high temperature and proteinase K for lysis, optionally with the additional delivery of chemical lysis reagents such as detergents. Methods may include fragmenting the nucleic acid molecules at random fragmentation sites, e.g., by including Mg2++ in the reaction mixture. Fragmentation may be promoted for dual IMI embodiments because when a target nucleic acid is copied, the copy ends at a random break point in a manner that a 3' end of the copied sequence includes a segment of basis that is essentially random in sequence or information and is thus useful as a tag to identify the origin molecule or to distinguish one target nucleic acid molecule from another when the sequence data is analyzed. Preferred embodiments make use of randomer-based annealing to capture short, fragmented RNA. [0048] Preferably, the fragments are captured with oligos that include 3' "randomers", segments with random sequences of bases or degenerate basis. In randomer embodiments, the capture moiety (e.g., 3' sequence of the hybrid capture oligo) is complementary to genomic information that is encoded within the sequence of the source gene for a molecule. Where the other end is generated by random fragmentation, both ends of each molecule are informed by the specific gene identity and the position of the molecule along that gene.
[0049] An IMI functions similar to a "unique molecular identifier" except that the sequence of an IMI is not provided by synthetic reagents or oligos. The information in the sequence of an IMI is intrinsic to the genomic information of the organism. Embodiments of this disclosure use a "dual IMI", a sequence of bases from both ends of a molecule. With paired-end sequencing, the dual IMI may be a concatenation of m and n terminal base sequences of paired forward & reverse reads. The sequence of bases in each case is random because the read begins at a random site (due to priming with a random N-mer and/or copying through to a random break site). In the event that one random start is duplicated among mRNA templates in a sample (two mRNA transcripts from the same gene are copied by priming at the same locus with two matching random 3' sequences), then having the copy terminate at a random break site will greatly increase the probability that a dual IMI in an amplicon (or even in a cDNA) is truly unique to the molecule. The dual IMI approach expands the number of theoretical unique identifiers for each gene significantly. For example, if a given target average fragment size only provides about 1000 unique 5' cut sites, using a dual IMI (dIMI) method may provide 1000x1000 combinations, or about 1 million unique dIMI per gene.
[0050] FIG. 2 shows compositions used in methods of the invention. Methods and compositions of the invention preferably use a template particle 205, such as a hydrogel bead, linked to one or more capture oligos 207. A 3' end of the capture oligo 207 defines a primer 219 for hybrid capture of a target nucleic acid 213. The primer 219 preferably includes a stretch of random bases and may be referred to as a randomer or random priming sequence. Any suitable number (e g., from 3 to 30, or fewer, or more) bases may constitute the random priming sequence. Preferred embodiments use about 6 to about 10, e.g., 8, random bases to make up the primer 219. The capture oligo 207 may include any other functional sequence such as, for example, a bead barcode 223 (which becomes a cell barcode once one cell is isolated in a partition with one bead) and optionally a primer binding site 227, such as a universal primer binding site or a binding site for a first sequencing primer. The capture oligo may be covalently linked to the bead 205, e.g., by acrylamide chemistry. The capture oligo 207 may include a photolabile cleavage site or a restriction cleavage site, e.g., 5' of any functional sequences, to allow a first strand copy 235 and any functional segments of the capture oligo 207 to be released from the bead 205. Due to the random er in the primer 219, the capture oligo hybridizes to the target nucleic acid 213 at a random location and primes the synthesis of a first strand copy 235, where a 5' end of the first strand copy 235 begins at a random start location 232 in the target nucleic acid 231. Where the target nucleic acid has been fragmented at a random location 231, the first strand copy has a 3' end that includes a copy of bases adjacent a second random location 231.
[0051] In transcriptome embodiments where the target nucleic acid 213 is used, the primer 219 may be extended with Moloney murine leukemia virus (MMLV) reverse transcriptase enzyme, the enzyme may add 3 un-templated cytosine residues at a 3' end of the first strand copy. That CCC segment may be annealed to a template-switching oligo 237 (TSO) such that the MMLV RT "switches templates" (from the target nucleic acid 231 to the TSO and extends the first strand copy 235 to include any functional sequences in the TSO, such as barcodes, second sequencing primer binding sites, restriction cleavage sites, universal primer binding sites, others, or combinations thereof.
[0052] In embodiments where the capture oligo 207 and the TSO 237 include forward and reverse primer binding sites, respectively, the first strand copy 235 may be amplified by, e.g., polymerase chain reaction once the TSO is copied, optionally after release from the bead 205.
[0053] Where the nucleic acids are captured in aqueous partitions (such as the droplets of PIPs), one preferred workflow is to perform hybrid capture in partitions, then release the beads 205 from the partitions, the beads 205 having the various nucleic acids annealed to the capture oligos 207 linked to the beads. In single cell embodiments, because each bead 205 may have barcodes in all of its capture oligos 207 that are common across that bead but unique to that bead, each nucleic acid will end up labeled with a "cell barcode" (the bead barcode becomes a cell barcode once cells are associated with respective beads). In the preferred workflow, the bead-bound nucleic acids are released from partitions. E.g., the emulsion may be broken chemically (e.g., with detergent), enzymatically (lipase), thermally (heat), mechanically (e g., centrifugation and/or sonication), and preferably by a combination of the foregoing. RT may be introduced after bead-bound target nucleic acids are released from droplets and RT extends the capture oligos to create the first strand copies 235 that include the cell barcodes 223 and the IMIs.
[0054] As stated, in various embodiments throughout, the nucleic acid molecules may be, for example, RNA transcripts from a single cell. The plurality of primers 219 on the capture oligos 207 preferably include sample barcodes 223 that are specific to the solid support, e.g., bead 205 ("solid support" is used because methods herein may be performed on spots on an array or with capture oligos 207 linked to the surface of a slide, well, or NGS flow cell, or other similar structure). The described methods are well suited to PIPSeq with formalin fixed samples (PIP- Seq FF) and preferably use randomer capture of short RNA fragments. The randomer in the primer 219 imbues the copy 235 with an IMI that means no extrinsic unique molecular identifier (UMI) barcode needs to be added.
[0055] In the partitions, methods and compositions of the invention use ramped cooling to facilitate randomer annealing. As stated, preferred embodiments use a randomer for primer 219. Any suitable length may be used for the randomer.
[0056] FIG. 3 shows melting temperatures over [Mg++] for randomer capture sequences of 7, 8, and 9 bases in length. Methods herein optimize randomer length to optimize benefits in a tradeoff between annealing strength and specificity, on the one hand, and meltability and sequencing real estate and uniqueness on the other hand. Random ers between about 6 and 10 bases appear to be good candidates. Randomers with a length of about 2 or 3 bases, for example, are not particularly good at generating unique IMIs in amplicons and do not necessarily anneal with sufficient strength to hold target during partition release and introduction of RT. Randomers with a length of 30 or more bases, to give an example, may have melt temperature (Tm) high enough that copies 235 cannot be melted off of the randomers. It is preferable to stabilize the captured material that is captured to the beads strongly enough that the captured material is not washed away during washing. Data from experiments such as the results shown in the graph of melting temperatures over [Mg++] for randomer capture sequences of 7, 8, and 9 bases in length show that 8 is a good length of bases for the randomer, with 7 and 9 also being suitable. Results indicate that 8-mers are a good balance of specificity and stringency. [0057] As noted above, high Mg2+ concentration is good for fragmentation but may also be beneficial for enhanced annealing and promote binding diversity. Any suitable Mg2+ concentration ([Mg2+]) may be used including, for example a [Mg2+] between about 5 and 50 mM. Conventionally, Mg2+ is avoided at high T to minimize RNA fragmentation but may be used beneficially in the methods of the disclosure. Other features and techniques may be used in any of the methods for sequencing and counting nucleic acids herein.
[0058] Methods include copying the nucleic acid start sites using primers comprising 3' random oligomers, thereby providing the copies with the random start sites. Embodiments may use template switching oligos to add functional sequences (e.g., primer binding sites) at the 3' end of the first strand copies. The random start sites provide each of the copies of the nucleic acid molecules with a unique sequence, or a first intrinsic molecular identifier (IMI), at the random start site at the 5' end of the first stand copy, which IMI ends up in amplicons and sequence reads made from that molecule. Where the target nucleic acids are randomly fragmented, the 3' end of the first strand copy includes a second IMI. Thus, preferably each of the copies includes a first random sequence at a 5' end adjacent one of the random start sites and a second random sequences at a 3' end adjacent one of the random fragmentation sites.
[0059] Methods of the disclosure allow for single PCR to library preparation with no requirement for a secondary fragmentation. Any suitable workflow may use the features shown herein. In some embodiment, methods use a primary amplification (e.g., PCR) followed by indexed amplification. In certain embodiments, the primer amplification uses indexed primers for direct indexed primer PCR, which has advantages of simplified workflow and reduced time. Methods may include amplifying the plurality of copies of the sample polynucleotides to generate amplicons that comprise sequencing adaptors. Methods may include amplifying the plurality of copies of the sample polynucleotides to generate amplicons that each include first and second identifier segments copied from the first and second tag sequence, respectively; sequencing the amplicons to generate sequence reads; and assigning the sequence reads to specific ones of the plurality of sample polynucleotides using data in the sequence reads from the first and second identifier segments.
[0060] Amplification may be performed to generate amplicons ready for sequencing, or sequencing libraries, from the target molecules. Each amplicon may have an IMI or dual IMIs. The disclosed system may use IMIs to deduplicate sequence reads and count template molecules in the sample. Preferred embodiments use dual IMIs. The dual IMI system involves randomer capture (by a segment of random bases on a 3' end of the capture oligo, e.g., that is attached to a bead) to read a random sequence from a 31 end of each molecule (e.g., template RNA). Libraries have randomized segments at both 5' and 3' ends of molecules. Either or both may be used as IMI(s) for sequence read deduplication.
[0061] Capture oligos with random 3' ends may be used for RNA capture and library preparation according to methods of the invention. A particle may be linked to a capture oligos. The capture oligo anneals to an mRNA. Particle-bound capture oligos in this application may comprise an acrydite linker, a PEI priming sequence, a particle barcode, and a random N-base capture moiety. A reverse transcriptase may extend the capture oligo to form a cDNA. The cDNA and capture oligo in combination with the mRNA form a duplex. At that stage, it is suitable to break the droplets and pool their contents, wash in buffer, and proceed in library preparation.
[0062] A transposase complex (sometimes called a transposasome) may optionally be introduced. The transposase complex includes a dimer that includes two of a transposase and two transposon end sequences. Here, the transposon end sequences are depicted as both being paired- end 2 end (PE2) sequences, which will cooperate with paired-end 1 (PEI) sequences in the capture oligo in subsequent amplification and sequence steps. In the depicted method, the transposase randomly cuts the cDNA/mRNA duplex thereby defining a random cut site. In a downstream step, read 2 of paired-end sequencing will include the first segment of bases in the cDNA adjacent the random cut site.
[0063] Attachment of the end sequence to the cDNA at the random cut site produces a construct, a contiguous DNA molecule that includes a first PCR handle (PEI), a cell barcode, a capture segment, a portion of the cDNA optionally terminating at the random cut site, and a second PCR handle (PE2). Amplification of the construct yields amplicons. In some embodiments, constructs are amplified with a P5-PE1 hybrid oligo and P7 index primer directly into a sequencing library. The library may be sequenced to assess RNA expression, for example, as described in Hrdlickova, 2017, RNA-Seq methods for transcriptome analysis, WIREs RNA 8(1): 10.1002, incorporated by reference. [0064] Constructs or amplicons may include certain primer and index sequences or copies thereof, such as, P5s and P7s. Those sequences may be any arbitrary sequence useful in downstream analysis. For example, they may be additional universal primer binding sites or sequencing adaptors. For example, either or both of the P5s and P7s may be arbitrary universal priming sequence (universal meaning that the sequence information is not specific to the naturally occurring genomic sequence being studied but is instead suited to being amplified using a pair of cognate universal primers, by design). The index segment may be any suitable barcode or index such as may be useful in downstream information processing. It is contemplated that the P5 sequences, the P7 sequence, and the index segment may be the sequences use in NGS indexed sequences such as performed on an NGS instrument sold under the trademark ILLUMINA, and as described in Bowman, 2013, Multiplexed Illumina sequencing libraries from picogram quantities of DNA, BMC Genomics 14:466 (esp. in Figure 2), incorporated by reference.
[0065] Libraries may be sequenced by any suitable method. Suitable methods include Sanger sequencing, Illumina or Ultima sequencing, Roche pyrosequencing, single-molecule, long-read sequencing using platforms offered by Pacific Biosciences or Oxford Nanopore. An example of a sequencing technology that can be used is Illumina sequencing. Illumina sequencing is based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers. Genomic DNA is fragmented and attached to the surface of flow cell channels. Four fluorophore-labeled, reversibly terminating nucleotides are used to perform sequencing. After nucleotide incorporation, a laser is used to excite the fluorophores, and an image is captured, and the identity of the first base is recorded. Sequencing according to this technology is described in U.S. Pub. 2011/0009278, U.S. Pub. 2007/0114362, U.S. Pub. 2006/0024681, U.S. Pub. 2006/0292611, U.S. Pat. 7,960,120, U.S. Pat. 7,835,871, U.S. Pat. 7,232,656, U.S. Pat.
7,598,035, U.S. Pat. 6,306,597, U.S. Pat. 6,210,891, U.S. Pat. 6,828,100, U.S. Pat. 6,833,246, and U.S. Pat. 6,911,345, each incorporated by reference. In preferred embodiments, an Illumina Mi-Seq sequencer is used.
[0066] Sequencing 127 creates sequence reads, i.e., a record of a sequence of bases from at least a part of a nucleic acid. Sequencing the amplicons generates sequence reads that include sample sequence information and tag sequence information. Methods include deduplicating the sequence reads to save only unique sequence reads and providing counts of the plurality of sample polynucleotides in the sample from numbers of the unique sequence reads.
[0067] The sequence reads may be analyzed to determine expression of RNA associated with genes based on unique reads that correspond to those genes. Analyzing the sequence reads may be performed using known software and following multistep procedures that are known in the art. For example, first, the quality of each sequence read, i.e., FASTQ sequence, may be assessed using the software FASTQC. Next, the reads may be trimmed using, for example, using Trimmomatic software. See Bolger, 2014, Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics 30(15):2114-2120, incorporated by reference. The trimmed sequence reads may then be mapped to a human genome using with, for example, HISAT2 software. HISAT2 output files in a SAM (sequence alignment map format), which may be compressed to binary sequence alignment map files. Other methods useful for processing and analyzing sequence reads are discussed in U.S. Pat. No. 8,209,130, which is incorporated by reference. Determining gene expression generally involves counting numbers of unique sequence reads that uniquely map to a human reference genome. Mapping reads to a reference to identify genes may be performed using computer software packages known in the art.
[0068] An important benefit of the invention is that mapping reads to a reference and identifying genes gives a quantitative result when reads are deduplicated to yield one read per mRNA from which those reads originated. Because each mRNA is typically copied into cDNA and each cDNA is typically copied into an unpredictably large number of amplicons in the sequencing library, and because each library member is often amplified or read redundantly as part of a sequencing technique, a number of raw sequence reads does not necessarily correlate to numbers of input molecules from the single cells. Nevertheless, one cell may include abundant transcripts that map to one gene. Here, compositions and methods of the invention give each cDNA at least one unique intrinsic identifier (and preferably dual-IMIs) that can be identified within, and used to deduplicate, sequence reads. After those sequence reads are identified by gene and deduplicated, then counts of those reads provide a quantitative measure of gene expression levels. Methods may include, prior to the deduplicating step, saving the sequence reads in memory, coupled to at least on processor in a computer system, as a FASTA or FASTQ file, wherein the deduplicating and mapping are performed by the computer system. [0069] As discussed, methods of the invention are useful for scRNA-Seq and specifically for expression analysis. In preferred embodiments, cells are isolated into, and lysed within, aqueous partitions with capture oligos. The capture oligos anneal to RNAs released from the cells. The capture oligos preferably include partition-specific barcodes and PCR handles. Once the capture oligos have hybridized to the RNAs, those duplexes may be released from partitions and pooled at any subsequent stage. Because capture oligos with partition-specific barcodes are used to capture and tag RNA from cells isolated in the partition, any arbitrary number of cells may be captured in parallel (simultaneously). Because the RNAs are tagged with a cell barcode during hybrid capture (e.g., aka a partition-specific barcode or a bead barcode), if those duplexes are pooled and ultimately sequenced, the cell barcodes in the sequencing data can be used to “bin” the sequence data by original cell, i.e., assign each sequence read (or assembled contigs or sequences therefrom) back to originating cells.
[0070] Read deduplication 131 and transcript counting is preferably performed by a computer system operably linked to a sequencing instrument and executing program instructions causing the computer system to perform those functions. For example, sequence reads may arrive at the computer system in FASTQ format. Each entry — each "sequence read — in a FASTQ file (or FASTA) will include a segment of sequence information read from the target molecules. Those sequence reads will also each include at least one IMI. For dual IMI embodiments with paired end sequencing, each target molecule will generate two reads, a forward read and a reverse read, and each read will have an IMI read from the target molecule. Note that the sequence information is read from the target molecule is used to identify the gene of origin for that molecule (e.g., transcript). But each IMI is also read from the target molecule. The computer system can execute program software to deduplicate the reads (simply identifying all duplicates and saving only one, i.e., "collapsing" the reads to a single read; or by leaving the FASTQ files intact but only "counting" all duplicate reads as 1, e.g., in a count file. The system may also identify gene information for each read, e.g., by mapping to a reference or by querying a transcript database. Read mapping can proceed by known methods including, for example, by methods that involve pairwise alignment of each read to a reference, such as a published human genome such as the 36th build of the human genome refer e d to in industry as HG36 or hg36. Comparison to references may also proceed by building hashes of k-mers in the reference and in the query (the sequence read) and looking up the hashed k-mers of the query in the target (the reference). Read-mapping may involve transforming each sequence in order or in characters via an informatic transform such as the Burroughs-Wheeler transform (BWT) after which comparison of the BWT of the query to the target is trivial. The foregoing approaches are implemented in software. For example, read mapping may be performed by a computer system operating one of the software programs such as BLAST (alignment-based), BLAT (alignmentbased), Bowtie2 (implementing a BWT), Burrows- Wheel er Aligner aka BWA (implementing a BWT), FastHASH (mapping hashed k-mers), kallisto (maps hashed k-mers), others, or combinations thereof. See Hatem, 2013, Benchmarking short sequence mapping tools, BMC Bioinformatics 14:al84, incorporated by reference. Based on mapping information, each read may be assigned to a gene. Specifically, a gene (as it is found in the DNA within the genome of an organism that is being studied) that has been transcribed into an RNA transcript, that was copied with a randomer and amplified to generate amplicons with at least one IMI, which amplicons are sequenced to generate sequence reads that include the IMI is assigned to the sequence read, and by implication, the RNA transcript is identified as having been transcribed from that gene. For that gene, each unduplicated read is used to increment a count of transcripts by one. By performing deduplication and read mapping for all transcripts from a sample, transcribed genes can be identified and given read counts for that sample. Notably, the read counts are a measure of a number of actual transcript molecules that were present in the sample. Where the sample is a single cell, the gene identities and read counts are a measure of expression levels of the gene in that cell and are also thus a transcriptome or transcriptomic profile for the cell.
[0071] Using the foregoing reagents and techniques, the invention provides methods of preparing a sample for sequencing and counting molecules therein. Such methods include providing a sample comprising a plurality of sample polynucleotides (such as a single cell comprising a plurality of mRNA transcripts) and generating a plurality of copies of the sample polynucleotides (e.g. by annealing capture oligos comprising randomers to the transcripts, optionally in aqueous partitions such as droplets, optionally releasing the capture oligo/transcript duplexes from the partitions, and extending the capture oligos, e.g., with reverse transcriptase, to make the copies). Each copy includes (i) a sample sequence copied from a sample polynucleotide of the plurality of sample polynucleotides, and (ii) a tag sequence copied from the sample polynucleotide, the tag sequence distinguishing said sample polynucleotide from other sample polynucleotides from the sample. The tag sequence functions like an IMI discussed above. Notably, the information of the tag sequence is information that was in the sample polynucleotides originally, i.e., it is genetic information of the organism being studied. Generating the copies may include annealing, to the sample polynucleotide, a primer (e.g., random er of a capture oligo) having a random sequence of bases at a 3' end and extending the primer to generate the tag sequence. The random sequence may be about 8 bases in length, e.g., about 6 to 9. The plurality of sample polynucleotides may be RNA and each of the plurality of copies of the sample polynucleotides may include a random tag sequence copied from the RNA. The unique sequence in each of the copies is copied from a segment of a transcript transcribed by an organism. The methods may include performing the recited steps for a plurality of cells to generate, for each cell of the plurality, a transcriptome profile based on, for that cell, the count of mapped unique reads.
[0072] In any of the suitable examples described above or herein, a computer system may generate a uniform manifold approximation and projection (UMAP) and create from the UMAP a 2D plot showing the plurality of cells clustered by properties.
[0073] Any one of the above described strategies and methods, or combinations thereof may be used in the conjunction particle-templated emulsions. For example, methods may be used for single cell expression profiling, which may include combining target cells with a plurality of template particles in a first fluid to provide a mixture in a reaction tube. The mixture may be incubated to allow association of the plurality of the template particles with target cells. A portion of the plurality of template particles may become associated with the target cells. The mixture is then combined with a second fluid which is immiscible with the first fluid. The fluid and the mixture are then sheared so that a plurality of monodisperse droplets is generated within the reaction tube. The monodisperse droplets generated comprise (i) at least a portion of the mixture, (ii) a single template particle, and (iii) a single target particle. Of note, in practicing methods of the invention provided by this disclosure a substantial number of the monodisperse droplets generated will comprise a single template particle and a single target particle, however, in some instances, a portion of the monodisperse droplets may comprise none or more than one template particle or target cell. [0074] In some aspects, generating the template particles-based monodisperse droplets involves shearing two liquid phases. The mixture is the aqueous phase and, in some embodiments, comprises reagents selected from, for example, buffers, salts, lytic enzymes (e.g., proteinase k) and/or other lytic reagents (e. g. Triton X-100, Tween-20, IGEPAL, bm 135, or combinations thereof), nucleic acid synthesis reagents e.g., nucleic acid amplification reagents or reverse transcription mix, or combinations thereof. The fluid is the continuous phase and may be an immiscible oil such as fluorocarbon oil, a silicone oil, or a hydrocarbon oil, or a combination thereof. In some embodiments, the fluid may comprise reagents such as surfactants (e.g., octylphenol ethoxylate and/or octylphenoxypolyethoxyethanol), reducing agents (e.g., DTT, beta mercaptoethanol, or combinations thereof).
[0075] Some methods of the disclosure use capture oligos. Oligos, sometimes referred to as oligonucleotides, are sequences of contiguous nucleotides of DNA, RNA, or a mixture thereof. Preferably, oligos comprise DNA. However, in certain embodiments, oligos may comprise RNA. In other embodiments, oligos may comprise a mixture of DNA and RNA. Oligos may comprise noncanonical nucleotides, such as, synthetic nucleotides that have been modified to incorporate certain biomolecular properties. The length of the oligo is usually denoted by "-mer" and the length of a 3' random er sequence may also be denote using -mer. For example, a capture oligo (e.g., linked to a hydrogel template particle) may be 50 bases long and thus be a 50-mer and may have 8 random bases at a 3' end, and thus include an 8-mer random er capture sequence useful to generate an IMI.

Claims

1. A method nucleic acid analysis, the method comprising: randomly fragmenting nucleic acid to yield a plurality of fragments comprising random breakpoints; annealing a 3' end of a capture oligo to a binding site of a member of the plurality of fragments, wherein the 3' end comprises a random base sequence; extending the annealed capture oligo to make a copy comprising: a 5' identifier copied from the binding site, and a 3' identifier copied from an end of the fragment; and amplifying the copy to make amplicons that include copies of the 5' and 3' identifiers, wherein the 5' and 3' identifiers uniquely associate the amplicons with the one fragment.
2. The method of claim 1, further comprising amplifying the plurality of fragments to yield a plurality of sets of amplicons, wherein each set of amplicons shares a first and second identifier copied from a corresponding fragment.
3. The method of claim 2, further comprising sequencing the plurality of sets of amplicons to yield sequence reads and deduplicating the sequence reads based on the first and second identifiers to provide a set of unique sequence reads.
4. The method of claim 3, further comprising mapping the unique sequence reads to a reference to identify a gene for each fragment and providing an expression level of each identified gene based on a number of fragments mapped to that gene.
5. The method of claim 1, further comprising obtaining the nucleic acids from a single cell, sequencing the plurality of fragments to obtain sequence reads comprising first and second identifiers, deduplicating and reference-mapping the reads to identify genes, and providing gene expression levels from the identified genes and counts of the deduplicated reads.
6. The method of claim 1, wherein the capture oligo is one of a plurality of oligos attached to a bead, wherein each of the plurality of oligos has a copy of a bead-identifying barcode and a stretch of random nucleobases at a 3' end.
7. The method of claim 6, wherein the bead is one of a plurality of beads, and the method includes encapsulating the beads alone in aqueous compartments with sample polynucleotides.
8. The method of claim 7, wherein the beads of the plurality of beads are substantially uniform in size and mass, and wherein the compartments are monodisperse droplets, and the encapsulating includes providing a mixture comprising the beads, an aqueous first liquid, and an immiscible second liquid and shearing the mixture to generate the monodisperse droplets simultaneously.
9. The method of claim 7, wherein at least one of the aqueous compartments further comprises a cell, wherein the sample polynucleotides are RNA transcripts in the single cell, and the method includes capturing, amplifying, and sequencing the RNA transcripts to yield sequence reads, wherein each sequence read includes a first and second identifiers sequence with information that (i) uniquely associates that read with one transcript, and (ii) is obtained from that transcript.
10. The method of claim 9, further comprising counting to obtain counts of unique or unduplicated reads and identifying a gene of origin for each read and providing gene expression levels for the single cell from the identified genes and read counts.
11. A method for sequencing and counting nucleic acids, the method comprising: making copies of nucleic acid molecules from random start sites; sequencing the copies to obtain sequence reads; deduplicating the sequence reads to yield only unique reads; and mapping the unique reads to genes and, for each of the genes, providing a count of mapped unique reads.
12. The method of claim 11 , wherein the nucleic acid molecules are RNA transcripts from a single cell.
13. The method of claim 12, further comprising isolating the single cell in a compartment and releasing the RNA transcripts from the cell.
14. The method of claim 11, wherein the method includes copying the nucleic acid start sites using primers comprising 3' random oligomers, thereby providing the copies with the random start sites.
15. The method of claim 11, wherein the random start sites provide each of the copies of the nucleic acid molecules with a unique sequence.
16. The method of claim 15, wherein the unique sequence in each of the copies is copied from a segment of a transcript transcribed by an organism.
17. The method of claim 11, further comprising fragmenting the nucleic acid molecules at random fragmentation sites, prior to making the copies.
18. The method of claim 17, wherein the nucleic acid molecules are RNA and the fragmenting step includes incubating the RNA in the presence of Mg2+.
19. The method of claim 17, wherein each of the copies includes a first random sequence at a 5' end adjacent one of the random start sites and a second random sequences at a 3' end adjacent one of the random fragmentation sites.
20. The method of claim 11, wherein the sequencing step includes amplifying the copies to make amplicons and sequencing the amplicons.
21 . The method of claim 11, wherein a 5' end of each of the copies includes a 5' sequence adjacent one of the random start sites, wherein the 5' sequence is unique to one of the nucleic acid molecules.
22. The method of claim 11, wherein the nucleic acid molecules comprising mRNA transcripts from a single cell, and the method includes using the count of mapped unique reads as measure of expression levels of the mRNA transcripts from the single cell.
23. The method of claim 11, further comprising, prior to the deduplicating step, saving the sequence reads in memory, coupled to at least one processor in a computer system, as a FASTA or FASTQ fde, wherein the deduplicating and mapping are performed by the computer system.
24. The method of claim 23, further comprising performing the recited steps for a plurality of cells to generate, for each cell of the plurality of cells, a transcriptome profde based on, for that cell, the count of mapped unique reads.
25. The method of claim 24, further comprising, by the computer system: generating a uniform manifold approximation and projection (UMAP) and creating from the UMAP a 2D plot showing the plurality of cells clustered by properties.
26. A method of preparing a sample for sequencing and counting molecules therein, the method comprising: providing a sample comprising a plurality of sample polynucleotides; and generating a plurality of copies of the sample polynucleotides, wherein a copy of the plurality of copies comprises
(i) a sample sequence copied from a sample polynucleotide of the plurality of sample polynucleotides, and
(ii) a tag sequence copied from the sample polynucleotide, the tag sequence distinguishing said sample polynucleotide from other sample polynucleotides from the sample.
27. The method of claim 26, wherein the generating step comprises annealing, to the sample polynucleotide, a primer having a random sequence of bases at a 3' end and extending the primer to generate the tag sequence.
28. The method of claim 26, wherein the plurality of sample polynucleotides comprise RNA and each of the plurality of copies of the sample polynucleotides comprises a random tag sequence copied from the RNA.
29. The method of claim 26, further comprising amplifying the plurality of copies of the sample polynucleotides to generate amplicons that comprise sequencing adaptors.
30. The method of claim 29, further comprising: sequencing the amplicons to generate sequence reads that include sample sequence information and tag sequence information; counting unduplicated reads among the sequence reads; and providing quantities of the plurality of sample polynucleotides in the sample from counts of the unduplicated reads.
31. The method of claim 30, further comprising associating the sequence reads or the unduplicated reads with genes by comparison to genetic reference information and providing expression levels for the genes from the quantities of the plurality of sample polynucleotides.
32. The method of claim 26, wherein the sample comprises a cell isolated in a compartment and the method includes releasing the plurality of sample polynucleotides from the cell into the compartment.
33. The method of claim 26, wherein the generating step comprises annealing, to the sample polynucleotide, a primer of a plurality of primers having at least one barcode and a random priming sequence.
34. The method of claim 33, wherein the plurality of primers are attached to a solid support.
35. The method of claim 24, wherein the solid support comprises a hydrogel bead.
36. The method of claim 34, wherein the plurality of primers comprise sample barcodes that are specific to the solid support.
37. The method of claim 26, further comprising fragmenting the plurality of sample polynucleotides at a plurality of random break sites, and further wherein the copy comprises the tag sequence copied from the sample polynucleotide at a 5' end and a second tag sequence at a 3' end, the second tag sequence copied from the sample polynucleotide from a segment adjacent one break site of the plurality of random break sites.
38. The method of claim 37, further comprising amplifying the plurality of copies of the sample polynucleotides to generate amplicons that each include first and second identifier segments copied from the first and second tag sequence, respectively; sequencing the amplicons to generate sequence reads; and assigning the sequence reads to specific ones of the plurality of sample polynucleotides using data in the sequence reads from the first and second identifier segments.
39. The method of claim 26, further comprising depleting the sample of unwanted nucleic acid prior to the generating step.
40. The method of claim 39, wherein the plurality of sample polynucleotides comprises mRNA or pre-mRNA and the unwanted nucleic acid includes one or more of ribosomal RNA, globin transcripts, mitochondrial RNA, or non-coding RNA.
41. A method nucleic acid analysis, the method comprising: annealing a 3' end of a capture oligo to a binding site of a member of a plurality of sample nucleic acids, wherein the 3' end comprises a random base sequence; extending the annealed capture oligo to make a copy comprising: a 5' identifier copied from the binding site, and a 3' identifier copied from an end of the fragment; randomly fragmenting the nucleic acids to yield a plurality of fragments comprising random breakpoints; and amplifying the copy to make amplicons that include copies of the 5' and 3' identifiers, wherein the 5' and 3' identifiers uniquely associate the amplicons with the one fragment.
PCT/US2025/016475 2024-02-23 2025-02-19 Methods for sequencing and counting nucleic acids Pending WO2025178954A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463557130P 2024-02-23 2024-02-23
US63/557,130 2024-02-23

Publications (1)

Publication Number Publication Date
WO2025178954A1 true WO2025178954A1 (en) 2025-08-28

Family

ID=94974271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/016475 Pending WO2025178954A1 (en) 2024-02-23 2025-02-19 Methods for sequencing and counting nucleic acids

Country Status (1)

Country Link
WO (1) WO2025178954A1 (en)

Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
US6828100B1 (en) 1999-01-22 2004-12-07 Biotage Ab Method of DNA sequencing
US6833246B2 (en) 1999-09-29 2004-12-21 Solexa, Ltd. Polynucleotide sequencing
US6911345B2 (en) 1999-06-28 2005-06-28 California Institute Of Technology Methods and apparatus for analyzing polynucleotide sequences
US20060024681A1 (en) 2003-10-31 2006-02-02 Agencourt Bioscience Corporation Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof
US20060292611A1 (en) 2005-06-06 2006-12-28 Jan Berka Paired end sequencing
US20070114362A1 (en) 2005-11-23 2007-05-24 Illumina, Inc. Confocal imaging methods and apparatus
US7232656B2 (en) 1998-07-30 2007-06-19 Solexa Ltd. Arrayed biomolecules and their use in sequencing
US7598035B2 (en) 1998-02-23 2009-10-06 Solexa, Inc. Method and compositions for ordering restriction fragments
US20100041046A1 (en) 2008-08-15 2010-02-18 University Of Washington Method and apparatus for the discretization and manipulation of sample volumes
US7835871B2 (en) 2007-01-26 2010-11-16 Illumina, Inc. Nucleic acid sequencing system and method
US7960120B2 (en) 2006-10-06 2011-06-14 Illumina Cambridge Ltd. Method for pair-wise sequencing a plurality of double stranded target polynucleotides
US8209130B1 (en) 2012-04-04 2012-06-26 Good Start Genetics, Inc. Sequence assembly
US9005891B2 (en) 2009-11-10 2015-04-14 Genomic Health, Inc. Methods for depleting RNA from nucleic acid samples
WO2018136248A1 (en) * 2017-01-18 2018-07-26 Illuminia, Inc. Methods and systems for generation and error-correction of unique molecular index sets with heterogeneous molecular lengths
WO2019139650A2 (en) 2017-09-29 2019-07-18 The Regents Of The University Of California Method of generating monodisperse emulsions
US10421992B2 (en) 2012-09-13 2019-09-24 Takara Bio Usa, Inc. Methods of depleting a target nucleic acid in a sample and kits for practicing the same
US10472666B2 (en) 2016-02-15 2019-11-12 Roche Sequencing Solutions, Inc. System and method for targeted depletion of nucleic acids
US20200269248A1 (en) 2007-02-06 2020-08-27 Brandeis University Manipulation of fluids and reactions in microfluidic systems
US20210178395A1 (en) 2007-04-19 2021-06-17 President And Fellows Of Harvard College Manipulation of fluids, fluid components and reactions in microfluidic systems
US20210214721A1 (en) 2020-01-13 2021-07-15 Fluent Biosciences Inc. Reverse transcription during template emulsification
US20210215591A1 (en) 2020-01-13 2021-07-15 Fluent Biosciences Inc. Devices for generating monodisperse droplets from a bulk liquid
US20210214792A1 (en) 2020-01-13 2021-07-15 Fluent Biosciences Inc. Methods and systems for single cell gene profiling
US11115853B2 (en) 2009-01-07 2021-09-07 Yamaha Corporation Wireless network system and wireless communication method
US20210340596A1 (en) 2018-09-28 2021-11-04 Fluent Biosciences Inc. Target capture and barcoding in monodisperse droplets
US20210381064A1 (en) 2020-01-13 2021-12-09 Fluent Biosciences Inc. Single cell sequencing
WO2022098726A1 (en) * 2020-11-03 2022-05-12 Fluent Biosciences Inc. Systems and methods for making sequencing libraries
US20220145359A1 (en) 2019-02-12 2022-05-12 Jumpcode Genomics, Inc. Methods for targeted depletion of nucleic acids
US20230265528A1 (en) 2020-08-12 2023-08-24 Jumpcode Genomics, Inc. Methods for targeted depletion of nucleic acids
US20230279490A1 (en) 2022-03-01 2023-09-07 Watchmaker Genomics, Inc. Depletion probes
WO2024077114A1 (en) * 2022-10-06 2024-04-11 Fluent Biosciences Inc. Sequencing and quantitation
WO2024259274A2 (en) * 2023-06-16 2024-12-19 Illumina, Inc. Molecular deduplication analysis methods

Patent Citations (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
US7598035B2 (en) 1998-02-23 2009-10-06 Solexa, Inc. Method and compositions for ordering restriction fragments
US7232656B2 (en) 1998-07-30 2007-06-19 Solexa Ltd. Arrayed biomolecules and their use in sequencing
US6828100B1 (en) 1999-01-22 2004-12-07 Biotage Ab Method of DNA sequencing
US6911345B2 (en) 1999-06-28 2005-06-28 California Institute Of Technology Methods and apparatus for analyzing polynucleotide sequences
US6833246B2 (en) 1999-09-29 2004-12-21 Solexa, Ltd. Polynucleotide sequencing
US20060024681A1 (en) 2003-10-31 2006-02-02 Agencourt Bioscience Corporation Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof
US20060292611A1 (en) 2005-06-06 2006-12-28 Jan Berka Paired end sequencing
US20070114362A1 (en) 2005-11-23 2007-05-24 Illumina, Inc. Confocal imaging methods and apparatus
US7960120B2 (en) 2006-10-06 2011-06-14 Illumina Cambridge Ltd. Method for pair-wise sequencing a plurality of double stranded target polynucleotides
US7835871B2 (en) 2007-01-26 2010-11-16 Illumina, Inc. Nucleic acid sequencing system and method
US20110009278A1 (en) 2007-01-26 2011-01-13 Illumina, Inc. Nucleic acid sequencing system and method
US20200269248A1 (en) 2007-02-06 2020-08-27 Brandeis University Manipulation of fluids and reactions in microfluidic systems
US20210178395A1 (en) 2007-04-19 2021-06-17 President And Fellows Of Harvard College Manipulation of fluids, fluid components and reactions in microfluidic systems
US20100041046A1 (en) 2008-08-15 2010-02-18 University Of Washington Method and apparatus for the discretization and manipulation of sample volumes
US11115853B2 (en) 2009-01-07 2021-09-07 Yamaha Corporation Wireless network system and wireless communication method
US11149297B2 (en) 2009-11-10 2021-10-19 Genomic Health, Inc. Methods for depleting RNA from nucleic acid samples
US9005891B2 (en) 2009-11-10 2015-04-14 Genomic Health, Inc. Methods for depleting RNA from nucleic acid samples
US8209130B1 (en) 2012-04-04 2012-06-26 Good Start Genetics, Inc. Sequence assembly
US10421992B2 (en) 2012-09-13 2019-09-24 Takara Bio Usa, Inc. Methods of depleting a target nucleic acid in a sample and kits for practicing the same
US10472666B2 (en) 2016-02-15 2019-11-12 Roche Sequencing Solutions, Inc. System and method for targeted depletion of nucleic acids
WO2018136248A1 (en) * 2017-01-18 2018-07-26 Illuminia, Inc. Methods and systems for generation and error-correction of unique molecular index sets with heterogeneous molecular lengths
US20200261879A1 (en) 2017-09-29 2020-08-20 The Regents Of The University Of California Method of generating monodisperse emulsions
WO2019139650A2 (en) 2017-09-29 2019-07-18 The Regents Of The University Of California Method of generating monodisperse emulsions
US20210340596A1 (en) 2018-09-28 2021-11-04 Fluent Biosciences Inc. Target capture and barcoding in monodisperse droplets
US20220145359A1 (en) 2019-02-12 2022-05-12 Jumpcode Genomics, Inc. Methods for targeted depletion of nucleic acids
US20210381064A1 (en) 2020-01-13 2021-12-09 Fluent Biosciences Inc. Single cell sequencing
US20210214792A1 (en) 2020-01-13 2021-07-15 Fluent Biosciences Inc. Methods and systems for single cell gene profiling
US20210214721A1 (en) 2020-01-13 2021-07-15 Fluent Biosciences Inc. Reverse transcription during template emulsification
US20210215591A1 (en) 2020-01-13 2021-07-15 Fluent Biosciences Inc. Devices for generating monodisperse droplets from a bulk liquid
US11773452B2 (en) 2020-01-13 2023-10-03 Fluent Biosciences Inc. Single cell sequencing
US20230265528A1 (en) 2020-08-12 2023-08-24 Jumpcode Genomics, Inc. Methods for targeted depletion of nucleic acids
WO2022098726A1 (en) * 2020-11-03 2022-05-12 Fluent Biosciences Inc. Systems and methods for making sequencing libraries
US20230279490A1 (en) 2022-03-01 2023-09-07 Watchmaker Genomics, Inc. Depletion probes
WO2024077114A1 (en) * 2022-10-06 2024-04-11 Fluent Biosciences Inc. Sequencing and quantitation
WO2024259274A2 (en) * 2023-06-16 2024-12-19 Illumina, Inc. Molecular deduplication analysis methods

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BOLGER: "Trimmomatic: a flexible trimmer for Illumina sequence data", BIOINFORMATICS, vol. 30, no. 15, 2014, pages 2114 - 2120, XP055862121, DOI: 10.1093/bioinformatics/btu170
BOWMAN: "Multiplexed Illumina sequencing libraries from picogram quantities of DNA", BMC GENOMICS, vol. 14, 2013, pages 466, XP021156155, DOI: 10.1186/1471-2164-14-466
CONTE MATILDE I ET AL: "Opportunities and tradeoffs in single-cell transcriptomic technologies", TRENDS IN GENETICS, ELSEVIER SCIENCE PUBLISHERS B.V. AMSTERDAM, NL, vol. 40, no. 1, 10 November 2023 (2023-11-10), pages 83 - 93, XP087448218, ISSN: 0168-9525, Retrieved from the Internet <URL:10.1016/J.TIG.2023.10.003> [retrieved on 20231110], DOI: 10.1016/J.TIG.2023.10.003 *
GALLION: "Preserving single cells in space and time for analytical assays", TRENDS ANAL CHEM, vol. 122, 2021, pages 115723
LIN: "RNA sequencing by direct tagmentation of RNA/DNA hybrids", PNAS, vol. 117, no. 6, 2020, pages 2886 - 2893, XP055866200, DOI: 10.1073/pnas.1919800117

Similar Documents

Publication Publication Date Title
US20220333185A1 (en) Methods and compositions for whole transcriptome amplification
JP7234114B2 (en) Analysis system for orthogonal access to biomolecules within cellular compartments and tagging of biomolecules within cellular compartments
JP2024149636A (en) Nuclease-based RNA depletion
US20220135966A1 (en) Systems and methods for making sequencing libraries
JP2021522797A (en) High-throughput multi-omics sample analysis
KR20170020704A (en) Methods of analyzing nucleic acids from individual cells or cell populations
AU2016331058B2 (en) Methods for subtyping diffuse large b-cell lymphoma (dlbcl)
US20200157600A1 (en) Methods and compositions for whole transcriptome amplification
JP2024502028A (en) Methods and compositions for sequencing library preparation
US20230235391A1 (en) B(ead-based) a(tacseq) p(rocessing)
KR20220121826A (en) Sample Handling Barcoded Bead Compositions, Methods, Preparations, and Systems
US11976325B2 (en) Quantitative detection and analysis of molecules
US20240050949A1 (en) Highly efficient partition loading of single cells
US20250046395A1 (en) Molecular deduplication analysis methods
US20240279648A1 (en) Quantitative detection and analysis of molecules
JP2025078774A (en) Linked Target Acquisition
WO2025178954A1 (en) Methods for sequencing and counting nucleic acids
JP7152599B2 (en) Systems and methods for modular and combinatorial nucleic acid sample preparation for sequencing
JP2024546177A (en) Methods for labeling and analyzing single-cell nucleic acids
US20240067959A1 (en) Library preparation from fixed samples
JP2024546168A (en) Quality control for reporter screening assays
HK40061304A (en) Linked target capture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25711913

Country of ref document: EP

Kind code of ref document: A1