WO2025019319A1 - Preparation of molecular libraries by linked adapters - Google Patents
Preparation of molecular libraries by linked adapters Download PDFInfo
- Publication number
- WO2025019319A1 WO2025019319A1 PCT/US2024/037811 US2024037811W WO2025019319A1 WO 2025019319 A1 WO2025019319 A1 WO 2025019319A1 US 2024037811 W US2024037811 W US 2024037811W WO 2025019319 A1 WO2025019319 A1 WO 2025019319A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- linked
- acid molecule
- primer
- adapters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
Definitions
- This disclosure relates to methods and systems for profiling single cells.
- the present invention provides methods, primers, and modified substrates that allow for the physical pairing of molecules together from a single source, for example a single cell. This allows for the sequencing of linked molecules simultaneously without needing to label each molecule as originating from that single source. This greatly reduces the data burden produced during sample preparation for later analysis, reducing the expense and time needed for single-cell analysis.
- aspects of the invention provide methods for preparing a molecular library.
- the methods comprise preparing a first mixture by segregating together at least two nucleic acid molecules from a single source, such as a single cell.
- the first mixture is then contacted with a plurality of linked adapters, each linked adapter comprising a first primer and a second primer that are physically linked.
- a first nucleic molecule from the single source is associated with the first primer of a linked adapter and a second nucleic acid molecule from the single source is associated with the second primer of the same linked adapter.
- Complementary strands are then generated from each of the first nucleic acid molecule and the second nucleic acid molecule without breaking the physical link of the linked adapters.
- aspects of the invention may further comprise the step of amplifying the first and second nucleic acid molecules after the associating step and/or after the step of generating complementary strands to each of the first and second nucleic acid molecules.
- any method for isolating molecules within a sample may be used together with the invention.
- the step of preparing the first mixture may comprise compartmentalizing the first mixture, for example through the use of droplet emulsification or microfluidics.
- the step of compartmentalizing the first mixture may comprise compartmentalizing a single cell from the mixture, thereby segregating the single cell from the mixture (e.g. from other single cells in the mixture) and thereby segregating the nucleic acid molecules from the single cell from the mixture.
- Linked adapters of the invention may also be physically linked by any method including covalent cross-linking and non-covalent linkage.
- the primers of the linked adapters may be physically linked by ligation, click chemistry, protein linkers, polymer linkers, an amplification reaction, or oligonucleotide chemical synthesis.
- aspects of the invention may comprise the step of breaking compartments formed in the preparing step into a single container, thereby pooling the contents of the compartment (e.g. microfluidic droplets) into a single container.
- the step of breaking compartments may be prior to the step of generating sequence reads.
- any nucleic acid molecule may be analyzed by methods of the invention.
- the at least two nucleic acid molecules may comprise nucleic acid molecules encoding a cell receptor protein.
- the nucleic acid molecules may encode a B-cell receptor or a T-cell receptor (TCR).
- the linked primers of the invention may be used to link first and second nucleic acid molecules that encode related molecules, such as subunits of a protein.
- the first nucleic acid molecule paired to the first primer of the linked adapter may encode a TCRa and the second nucleic acid molecule paired to the second primer of the linked adapter may encode a TCRp of a TCR, thereby linking a nucleic acid molecule encoding each protein chain of a TCR heterodimer.
- the same principle may be used to link nucleic acid molecules encoding any related molecules.
- Nucleic acid molecules may also be associated with primers of the linked adapters by any method.
- associating a first and second nucleic acid molecule to the first and second primer of the linked adapter may comprise annealing the nucleic acid molecules to the primers.
- the associating step may comprise annealing the first and second nucleic acid molecules to the first and second primers.
- methods of the invention allow for the sequencing of linked molecules simultaneously without the need for individual molecules to need to be labeled as from that single source. Accordingly, methods of the invention may be completed with the proviso that the steps are completed without tagging the nucleic acid molecules with a cell specific barcode. Alternatively, the at least two nucleic acid molecules may also be tagged with a cell specific barcode.
- Methods for analyzing nucleic acid molecules may comprise sequencing nucleic acid molecules. Accordingly, aspects of the invention further comprise the step of generating sequence reads for the first nucleic acid molecule and the second nucleic acid molecule.
- the step of synthesizing complementary strands for each nucleic acid molecule may comprise seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on a flow cell without breaking the physical link of the linked adapters.
- the linked nucleic acid molecules may be seeded as a single cluster on a flow cell, resulting in a cluster with the two linked nucleic acid molecules seeded together. This is contrasted to conventional methods for flow cell sequencing that require “pure” clusters that comprise only a single nucleic acid sequence.
- Generating sequence reads can be done by any known method, including using next generation sequencing and long read sequencing.
- adapters may be paired to a target DNA molecule to sequence each strand of the DNA molecule multiple times in one continuous long read.
- Several DNA fragments may be concatenated to produce a longer DNA amplicon for sequencing, which may be attached to adapters.
- individual genes may be amplified and the amplicons concatenated into a single fragment for sequencing.
- Seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on the flow cell may comprise coupling each of the first nucleic acid molecule and second nucleic acid molecule to a substrate of the flow cell via binding primers that are physically bound to the substrate and coupled to the end of each first and second nucleic acid molecules by an end that is not linked by the physical link of the linked adapter.
- Binding primers are typically complementary to a sequence at a point on the substrate of the flow cell, for example on the surface of the substrate or within a well on the substrate.
- the binding primer added to each nucleic acid molecule may also comprise binding sites and adapters for sequencing and downstream processing.
- seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on the flow cell may comprise coupling each of the first nucleic acid molecule and second nucleic acid molecule to a substrate of the flow cell via a binding primer physically bound to the substrate and coupled to the end of each nucleic acid molecule that is linked by the physical link of the linked adapter.
- aspects of the invention further provide a modified substrate comprising a plurality of discrete locations, each location comprising two nucleic acid molecules.
- Each of the two nucleic acid molecules are coupled at one end to the substrate via a primer that is physically bound to the substrate and coupled to one another at their other end by a physical linking.
- Each of the two nucleic acid molecules may be distinct and different.
- linked adapters of the invention may be physically linked by any method including covalent cross-linking and non-covalent linkage.
- the primers of the linked adapters may be physically linked by ligation, click chemistry, an amplification reaction, or oligonucleotide chemical synthesis.
- the distinct nucleic acid molecules do not need to be and are not tagged with a cell specific barcode. Although in some embodiments, they may also be tagged with a barcode, such as a cell specific barcode or a unique molecule identifier (UMI).
- a barcode such as a cell specific barcode or a unique molecule identifier (UMI).
- the modified substrate may be part of a flow cell.
- each discrete location is a cluster on the flow cell.
- the flow cell may be configured to allow for the synthesis of complementary strands to each nucleic acid without breaking the physical link between the nucleic acids at a discrete location.
- FIG. l is a diagram of mRNA and cDNA synthesis from cell samples.
- FIG. 2 is a diagram of an exemplary method of the invention for TCR analysis.
- FIG. 2 is a diagram of TCR capture and sequencing using adapters of the invention.
- FIG. 3 is an image of gel analysis of two genes.
- FIG. 4 is an image of gel analysis of two genes.
- FIG. 5 is a plot of sequence reads from two genes.
- FIG. 6 is a table of sequence reads from two genes.
- FIG. 7 is a diagram of linked adapter capture of the invention.
- FIG. 8 is a table of sample reads aligned to genes using adapters of the invention.
- FIG. 9 is a graph of identification of T-cell receptor (TRBV) variable regions.
- FIG. 10 is a graph of the identification of T-cell receptor a (TRAV) variable regions.
- FIG. 11 is a table identifying TCR variable (V), diversity (D), and joining (J) fragments using adapters of the invention.
- FIG. 12 is a heatmap of gene combinations detected by the invention.
- FIG. 13 is a diagram of multi-leaf adapter synthesis using click chemistry.
- FIG. 14 is a diagram TCR capture and sequencing using 3-leaf antibody tagged adapters of the invention.
- FIG. 15 is an image of gel analysis of a 3-leaf clover library.
- FIG. 16 is a heatmap of TRAV-TRBV gene combinations.
- the present invention provides methods, primers, and modified substrates that allow for the physical pairing of molecules together from a single source, for example a single cell. This allows for the sequencing of linked molecules simultaneously without the need for individual molecules to need to be labeled as from that single source. This greatly reduces the data burden produced during sample preparation for later analysis, reducing the expense and time needed for single-cell analysis while improving cell capture rate and throughput of single cell analysis.
- the present invention may be applied to improve deep mining of immune receptor repertoires, direct cloning of sample regeneration, and the analysis of novel targets for therapeutics and novel biomarkers for patient stratification.
- An exemplary method for preparing a molecular library comprises preparing a first mixture by segregating together (for example, via microfluidic droplet formation) at least two nucleic acid molecules from a single source; contacting the first mixture with a plurality of linked adapters, each linked adapter comprising a first and second primer that are physically linked; associating a first nucleic acid molecule from the single source to the first primer of a linked adapter and a second nucleic acid molecule from the single source to the second primer of the linked adapter; and synthesizing a complementary strand to the first nucleic acid molecule and a complementary strand to the second nucleic acid molecule without breaking the physical link of the linked adapters.
- FIG. l is a diagram of mRNA and cDNA synthesis from cell samples.
- Magnetic beads in a lysis buffer and a cell sample e.g. comprising cells 101, 105, 109, and their components
- a cell sample e.g. comprising cells 101, 105, 109, and their components
- Oil in water suspensions are used to create droplets capturing the magnetic beads together with cells.
- Capture sequences on the surface of the beads are used to capture mRNA from the cells (e.g. cell 109 and corresponding mRNA as shown) segregated into the droplets. Droplets are broken and the contents pooled for bulk cDNA analysis with mRNA bound to the magnetic beads.
- FIG. 2 diagrams an exemplary method of the invention.
- Linked adapters of the invention are segregated under microfluidic control together with the magnetic beads carrying mRNA from cells now bound to the magnetic beads, for example mRNA encoding TCR sequences. It can be appreciated that cells may be directly introduced together with lytic agents and the linked adapters in the absence of magnetic beads.
- a TCRa and TCRP from a TCR are each linked to a respective primer on a single linked adapter within the droplet.
- a binding sequence complementary to a position on the solid support of a flow cell is added to the ends of each nucleic acid molecule that are not associated with the linked adapter. Droplets are then broken and the contents of each droplet are pooled together for sequencing on a flow cell. By the binding sequence, the ends of each nucleic acid molecule that are not associated with the linked adapter form a binding point for the nucleic acid molecules to the flow cell.
- each cluster represents one linked adapter which represents molecules from one cell.
- a first read is then generated for the TCRa and a second read is generated for the TCRp.
- FIG. 3 is an image of gel analysis of two genes. As shown, gene 1 and gene 2 are each captured in the multiplexed library using linked adapters of the invention.
- FIG. 4 is an image of gel analysis of two genes using forward and reverse primers. As shown, the forward primer (read 1) captures gene 1 located on leaf 1 (arm 1) of the linked adapter and the reverse primer (read 2) captures gene 2 located on leaf 2 (arm 2) of the linked adapter.
- the forward primer captures gene 1 located on leaf 1 (arm 1) of the linked adapter
- the reverse primer captures gene 2 located on leaf 2 (arm 2) of the linked adapter.
- FIG. 5 is a plot of sequence reads from the two genes.
- linked adapters of the invention identified specifically as CLOVER adapters, used throughout the disclosure in reference to linked adapters of the invention) capture both gene 1 and gene 2.
- FIG. 6 is a table of sequence reads from the two genes. Clover adapters capture both gene 1 and gene 2 in 84.05% of reads, with only 8.22% of reads unmapped.
- FIG. 7 is a diagram of TCR capture and sequencing using adapters of the invention. A first TCRa and TCRp from a TCR are each linked to a respective primer on a single linked adapter. Each of the TCRa and TCRP provide a read for sequencing as separate “leafs” from the linked adapter.
- FIG. 8 is a table of sample reads aligned to genes using adapters of the invention. From 152,336 reads from the sample, 142,372 reads are aligned to a TCRa bound to leaf (arm) 1 of a linked adapter. 100,767 reads are aligned to the TCRp , leaf (arm) 2 of the linked adapter.
- FIG. 9 is a graph of the identification of T-cell receptor p (TRBV) variable regions, identifying TRBV2 and TRBV19.
- FIG. 10 is a graph of the identification of T-cell receptor a (TRAV) variable regions gene usage, identifying TRAV13-2.
- TRAV T-cell receptor a
- FIG. 11 is a table identifying TCR variable (V), diversity (D), and joining (J) fragments using adapters of the invention.
- FIG. 12 is a heatmap of TCR a and P variable regions detected using adapters of the invention.
- the linked adapters enable paired sequencing of full length TCR a and P chains, with the heat map showing identification of TRBV2 with TRAV13-2 and TRBV19 with TRAV13-2 from the sample.
- FIG. 13 is a diagram of multi-leaf (arm) adapter synthesis using click chemistry.
- a first leaf is bound to an azide and a second leaf bound to an alkyne.
- reaction conditions for example a copper catalyzed reaction
- a 5-member heteroatom ring is formed with leaf 1 and leaf 2.
- linked adapters of the invention may comprise 3-arms or 3-leafs, used interchangeably herein.
- FIG. 14 is a diagram showing TCR capture and sequencing using 3-leaf antibody tagged adapters of the invention. As shown, a first leaf in a linked adapter captures a TCRa, a second leaf captures the TCRp, and a third leaf is represented by an antibody (Ab) tag.
- Ab antibody
- FIG. 15 is an image of a gel analysis of a 3-leaf clover library.
- FIG. 16 is a heatmap of TCR a and P variable regions detected using adapters of the invention.
- Linked adapters of the invention may be physically linked by any method including covalent linking and non-covalent linkage.
- the primers of the linked adapters may be physically linked by ligation, click chemistry, an amplification reaction, or oligonucleotide chemical synthesis.
- Linked adapters that may be used in the present invention may be as described in U.S. Patent No. 10,982,278.
- polymer, glycan (dextran), or protein linkers may form a scaffold onto which primers are grafted to form the linked adapters.
- the linked adapters may comprise two or more separate oligonucleotide arms that are linked to one another at the 5' end of each arm.
- each arm of the linked adapter has two or more free 3' ends and comprise at least a hybridization domain capable of binding to a complementary sequence on a target oligonucleotide.
- the linked adapters may be designed to bind to RNA, DNA, or a combination thereof. Amplification reactions utilizing the linked adapters then result in a single amplicon or molecule that incorporates the sequence of both target molecules. This single amplicon or molecule may then be used in further processing steps such as, but not limited to, sequencing.
- covalent linkage of polynucleotides is achieved using 5 '-5' linked oligonucleotides, these linked adapters also referred to as a “chain-seq” or “crab-seq” oligonucleotides.
- a 5'-5' linked oligonucleotide comprises two or more “arms,” each comprising an oligonucleotide sequence that is linked at the 5' end via a covalent or non-covalent biocompatible linkage.
- the chain-seq oligonucleotide comprises two or more free 3' ends.
- Each arm of the 5' -5' linked oligonucleotide may comprise the same oligonucleotide sequence, or each arm may comprise a different oligonucleotide sequence.
- the oligonucleotide sequences of each arm may be the same or of different lengths. In certain embodiments, an individual arm may be from about 8 to about 1000 nucleotides in length.
- an individual arm may be 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 nucleotides, or the like.
- Linking together individual nucleic acid molecules as described herein may result in a nucleic acid construct of any size, containing any number of joined nucleic acid molecules appropriate in accordance with the invention.
- Each arm may be single stranded, double stranded, or a combination thereof.
- the oligonucleotide may comprise DNA, RNA, or a combination thereof.
- the arms may also comprise, in full or in part, nucleotide analogs such as peptide nucleic acids, morpholino and locked nucleic acids, glycol nucleic acid, and threose nucleic acids.
- a portion of each arm of the 5 '-5' linked oligonucleotide may comprise a binding domain comprising a nucleic acid sequence that is complementary to and hybridizes with a target sequence.
- a target sequence may be a naturally occurring nucleic acid sequence or may be artificially introduced into a target polynucleotide as appropriate depending on the application.
- a 5'-5' linked oligonucleotide may comprise at least two, at least three, at least four, at least five, at least six, at least seven, at least eight arms, or any number of arms as appropriate for the number of target oligonucleotides to be linked via the methods disclosed herein.
- the arms of an oligonucleotide as described herein may be connected via a common linkage.
- each arm may recognize the same target sequence.
- each arm, or a subset of arms may recognize a different target sequence. For example, given a four-arm 5 '-5' linked oligonucleotide, each arm may recognize up to four different target sequences. Alternatively, two arms could hybridize to a first target sequence and the remaining two arms could hybridize to a second target sequence. Other similar variations are contemplated and are within the scope of this invention.
- each arm may be linked to each other using means known in the art for linking nucleic acids to each other.
- nucleic acids are linked together via a biocompatible reaction.
- a biocompatible reaction may comprise use of “click chemistry” (see, e.g., Rostovtsev et al., Angew Chem Int Ed 41 :2596-2599, 2002; Himo et al., J Am Chem Soc 127:210-216, 2005; Boren et al., J Am Chem Soc 130:8923-8930, 2008).
- An example of a click chemistry reaction is the Huisgen 1,3-dipolar cycloaddition of alkynes to azides to form l,4-disubstituted-l,2,3-triazoles.
- the copper(I)-catalyzed reaction is mild and very efficient, requiring no protecting groups, and requiring no purification, in many cases.
- the azide (AZ) and alkyne (AK) functional groups are largely inert towards biological molecules and aqueous environments, which allows the use of the Huisgen 1,3-dipolar cycloaddition in target- guided synthesis and activity-based protein profding.
- a chain oligo is formed by linking the 5' end of one nucleic acid strand that includes an azide group to the 5' end of another nucleic acid strand that includes an alkyne group.
- Other exemplary biocompatible reactions include copper(I)-catalyzed azide-alkyne cycloaddition (CuAAC) reaction, a copper- free strain-promoted azide-alkyne cycloaddition (SPAAC) reaction, and a thiol-ene reaction.
- each arm of the 5 '-5' linked oligonucleotide may be connected indirectly via a binding or scaffolding molecule.
- indirect means for linking the arms of such an oligo may include use of binding or scaffolding molecules such as, but not limited to polymers, such as polyethylene glycol (PEG) and other polyethers.
- spacers may be employed, for example, to reduce steric hindrance between individual arms.
- the spacer may be an alkyne or an azide spacer.
- a spacer may be joined to an oligo of the invention using direct or indirect means, including, but not limited to polymers, such as polyethylene glycol (PEG) and other polyethers.
- PEG polyethylene glycol
- a spacer may be 8 to 1000 nucleotides in length.
- a spacer may be 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 nucleotides, or the like.
- nucleic acid a polymer of nucleotides
- a nucleic acid is “single-stranded” if nucleotides that form the nucleic acid are unpaired. That is, nucleotides of a single-stranded nucleic acid are not base-paired (via Watson-Crick base pairs, e.g., guanine-cytosine and adenine-thymine/uracil) to nucleotides of another nucleic acid.
- a single-stranded nucleic acid may be contrasted with a double-stranded (paired) nucleic acid, a typical example of which is a DNA double helix.
- Single-stranded nucleic acids may include a contiguous (uninterrupted) sequence of nucleotides or, in some embodiments, a single-stranded nucleic acid may be a conjugate that includes two nucleic acid strands j oined together, for example, through a chemical (covalent) linkage.
- a single strand of a nucleic acid e.g., DNA or RNA
- the 5' end typically contains a phosphate group attached to the 5' carbon of the ribose ring of a nucleotide and a 3' end, which is unmodified from the ribose — OH substituent.
- Nucleic acids are synthesized in vivo in the 5' to 3' direction. Polymerase relies on the energy produced by breaking nucleoside triphosphate bonds to attach new nucleoside monophosphates to the 3 '-hydroxyl ( — OH) group via a phosphodiester bond.
- An engineered single-stranded nucleic acid of the present disclosure has two 3' ends (a chain oligo). Each terminus of the single-stranded nucleic acid includes a 3 '-hydroxyl ( — OH) group.
- a single-stranded chain oligo is formed by joining (linking) the 5' end of one single-stranded nucleic acid to the 5' end of another single-stranded nucleic acid.
- the linkage between two 5' ends is a covalent linkage. In other embodiments, the linkage is non-covalent.
- Each arm of a chain-oligo may comprise a hybridization domain.
- a “domain” refers to a discrete, contiguous sequence of nucleotides or nucleotide base pairs, depending on whether the domain is unpaired (single-stranded nucleotides) or paired (double-stranded nucleotide base pairs), respectively.
- a hybridization domain facilitates binding of the chain-oligo to a complementary sequence on a target oligonucleotide i.e., the target sequence.
- a domain is “complementary to” a target sequence if the domain contains nucleotides that base pair (hybridize/bind through Watson-Crick nucleotide base pairing) with nucleotides of the target sequence such that a paired (double-stranded) or partially-paired molecular species/ structure is formed.
- Complementary domains need not be perfectly (100%) complementary to form a paired structure, although perfect complementarity is provided, in some embodiments.
- the length of a hybridization domain may vary. In some embodiments, a hybridization domain may have a length of 5-50 nucleotides.
- an anchor domain may have a length of 5-45, 5-40, 5- 35, 5-30, 5-25, 5-20, 5-15, 5-10, 10-50, 10-45, 10-40, 10-35, 10-30, 10-25, 10-20, 10-15, 15-50, 15-45, 15-40, 15-35, 15-30, 15-25, 15-20, 20-50, 20-45, 20-40, 20-35, 20-30, 20-25, 25-40, 25- 35, 25-30, 30-40, 30-35, or 35-40 nucleotides.
- a hybridization domain may have a length of 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides.
- a hybridization domain may have a length of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides.
- a hybridization domain in some embodiments, may be longer than 50 nucleotides, or shorter than 5 nucleotides.
- one or more chain-oligo arms may further comprise a primer domain.
- a primer domain is a domain to which a primer binds.
- a primer is a strand of short nucleotide sequence that serves as a starting point for nucleic acid (e.g., DNA) synthesis.
- chain oligos may comprise a pair of internal primer domains (e.g., near the linked 5' ends), which may be used for amplification of sequence-ready constructs produced using the methods of the present disclosure. The length of a primer domain may vary.
- a primer domain may have a length of 5-50 nucleotides, for example, a length of 5-45, 5-40, 5-35, 5-30, 5-25, 5-20, 5-15, 5-10, 10-50, 10-45, 10-40, 10-35, 10-30, 10-25, 10-20, 10-15, 15-50, 15-45, 15-40, 15-35, 15-30, 15-25, 15-20, 20-50, 20-45, 20-40, 20-35, 20-30, 20- 25, 25-40, 25-35, 25-30, 30-40, 30-35, or 35-40 nucleotides.
- a primer domain has a length of 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides.
- a primer domain has a length of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides.
- a primer domain in some embodiments, may be longer than 50 nucleotides, or shorter than 5 nucleotides.
- one or more chain-oligo arms may further comprise a sequencing adapter.
- a sequencing adapter is a nucleotide sequence that facilitates binding of oligonucleotide sequences generated using the methods disclosed herein to complementary sequences used in certain next-generation sequencing technologies.
- a method of the invention may involve the use of a binding tag.
- a nucleic acid as described herein may be labeled with an affinity tag, for example an affinity pull-down functional group, on the first arm, or the second arm, or both.
- an affinity tag may be used to isolate a biomolecule of interest, for example an amplified nucleic acid, such as an amplified segment of a template DNA molecule or fragment or portion thereof.
- an amplified nucleic acid may contain one or more adapter molecules as described herein, which may serve as a means for isolation of the nucleic acid.
- a chain oligo may have more than two arms, and thus an affinity tag may be present on a chain oligo on only a single arm, on multiple arms, or on all arms of the chain oligo.
- an affinity tag may be used to isolate a biomolecule of interest, such as a nucleic acid, polynucleotide, protein, or the like. Affinity tags attached to as described herein may be removed by chemical or enzymatic means.
- One of skill in the art would be able to identify appropriate methods and means for an affinity tag in accordance with the invention.
- affinity tags include an enzymatic modification such as biotin or desthiobiotin; a fluorescent tag, such as green fluorescent protein (GFP); or a solubilization tag, such as thioredoxin, maltose binding protein, glutathione-S-transferase, or poly(NANP).
- GFP green fluorescent protein
- solubilization tag such as thioredoxin, maltose binding protein, glutathione-S-transferase, or poly(NANP).
- any binding tag appropriate for the specific application may be used to isolate or separate a biomolecule of interest as described herein.
- chain oligos as described herein to link together two RNA molecules.
- a chain oligo having two arms may be bound to a first RNA molecule and a second RNA molecule such that the RNA molecules are linked together to form a single long RNA molecule, wherein the chain oligo is located between the first and second RNA molecules.
- Reverse transcription may then be performed to produce cDNA of both the first and the second RNA molecules, wherein both 3' ends of the chain oligo serve as primer molecules for first- strand cDNA synthesis according to methods known in the art.
- the newly produced cDNA may be dissociated from the template RNA molecules and a second, distinct chain oligo may hybridize to the 3' ends of the cDNA, then the first chain oligo and second strand synthesis as known in the art may be performed in order to produce a double-stranded DNA copy of the starting RNA molecule.
- second strand cDNA may be synthesized using additional conventional mRNA specific primers, or using a common template switching adapter and a primer priming a sequence in the template switching adapter.
- two or more chain oligos may be used to link together two or more RNA molecules.
- a first chain oligo may be hybridized to a first RNA molecule, wherein the first 3' end of the first chain oligo hybridizes to the first RNA molecule and the second 3' end of the first chain oligo hybridizes to a second RNA molecule;
- a second chain oligo may be hybridized to the second RNA molecule, wherein the first 3' end of the second chain oligo hybridizes to the second RNA molecule and the second 3' end of the second chain oligo hybridizes to a third RNA molecule;
- a third chain oligo may be hybridized to the third RNA molecule, wherein the first 3' end of the third chain oligo hybridizes to the third RNA molecule and the second 3' end of the third chain oligo hybridizes to the first RNA molecule, such that a circular nucleic acid molecule is formed by the hybridization of the
- the nucleic acid molecules linked by the 5 '-5' linked oligonucleotides may be mRNA molecules encoding different transcripts.
- the nucleic acid molecules may encode immunoglobulin heavy and light chains, or T cell receptor a and T cell receptor .
- the nucleic acid molecules linked by the 5'-5' linked oligonucleotides may be DNA molecules, for example, genomic DNAs harboring different mutations or polymorphisms.
- the nucleic acid molecules may be isolated from different cells, for example, immune cells including T cells, B cells, dendritic cells, macrophages, neutrophils, mast cells, eosinophils, basophils, and natural killer cells.
- the nucleic acid molecules may encode any cellular receptors or lectins.
- the 5 '-5' linked oligonucleotides may be used in linking two ends of a nucleic acid molecule and downstream amplification and sequencing procedures.
- the nucleic acid molecule may be a DNA or RNA.
- the nucleic acid molecule may be at least 1 kb, at least 2 kb, at least 3 kb, at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, or at least 10 kb in length.
- the use of the 5 '-5' linked oligonucleotides in sequencing may be applied to archiving an immune repertoire by capturing all sequences in a Crab-Seq library derived from a human sample, such as a blood or urine sample.
- the 5 '-5' linked oligonucleotides may be used to characterize cell type specific mRNAs and to identify or profile any cell types.
- aspects of the invention provide a modified substrate comprising a plurality of discrete locations, each location comprising two nucleic acid molecules, each of the two nucleic acid molecules coupled at one end to the substrate via a primer that is physically bound to the substrate and coupled to each other at another end by a physical linking.
- Substrates refer to physical structures/surfaces onto which molecules of interest may be immobilized in geographically distinct regions, thereby allowing molecules to be screened in parallel.
- Substrates may comprise glass, metal, metal oxide, and carbon, which materials may be selected based on wettability, charge, and reactivity of the material.
- the modified substrate may be a part of a flow cell.
- a flow cell is a channel for fixing mobile nucleic acid molecules on a flow cell surface, referred to as a substrate.
- nucleic acid molecules As nucleic acid molecules flow through the channel and over the substrate, nucleic acid molecules are bound, e.g. by hybridization, to fixed points along the substrate. The bound nucleic acid molecules may then be amplified, e.g. by bridge amplification, to form a clonal cluster.
- nucleic acid molecules in the sequencing library will randomly attach to the lanes on the surface of the flow cell when they pass through it.
- each flow cell may have 8 lanes, each lane having a number of binding primers attached to the surface, which can match binding primers added at the ends of the nucleic acid molecules during sample processing.
- Binding primers may also be found on nanowells on the substrate that space out clusters and prevent overcrowding of clusters.
- Amplification is then performed using the adapters on the flow cell surface as a template. After continuous amplification and mutation cycles, each nucleic acid molecule will eventually be clustered in bundles at their respective locations, each containing many copies of a single nucleic acid template.
- two nucleic acid molecules linked by a linked adapter are seeded at single location along the flow cell.
- the same binding primer may be added to each nucleic acid molecule, thereby resulting in the two nucleic acid molecules hybridizing to the same point on the flow cell.
- linked nucleic acid molecules bind to the same location on the flow cell allowing for complementary strands to be synthesized at the same location in the flow cell.
- Examples of flow cells that may be used together with the present invention include the Genome Analyzer (commercially available by Illumina, Inc.), a parallel, fluorescence-based readout of immobilized sequences that are iteratively sequenced. Sequencing according to this technology is described in U.S. Pat. Nos. 7,960,120; 7,835,871; 7,232,656; 7,598,035;
- Methods of the invention may also be used together with fluorescence-based approaches, for examples fluorescence in situ hybridization (FISH).
- FISH fluorescence in situ hybridization
- the methods may include microcontact printing to stamp molecules to a substrate for polymerase catalyzed extension and amplification. Polymerase DNA colonies (polonies) are then formed on the surface of the substrate.
- polymerase DNA colonies polymerase DNA colonies (polonies) are then formed on the surface of the substrate.
- an elastomeric crosslinked polyacrylamide may be used to efficiently stamp molecules.
- Each polony facilitates the registration of fluorescence signals. Polony based assays are described, for example, in Fu et al. (2022) Cell 185:4621-4633, the entirety of the contents of which are incorporated herein.
- DNA templates can be amplified to polonies on a gel surface.
- a tissue cryosection may be placed onto a polony gel for mRNA capture and spatial indexing prior to library amplification.
- Intensity profiles for polonies stained by fluorescence can be generated for spatial index mapping.
- DNA copies per polonies can be quantified by real-time image quantification.
- T cells T lymphocytes
- MHC major histocompatibility complex
- APCs antigen-presenting cells
- TCR-a or TCR-P subunit TCR-a or TCR-P subunit
- TCR-a+TCR-P TCR subunit variants
- linked adapters as described herein, these shortcomings can be overcome and throughput and efficiency can be significantly increased.
- specifically designed linked adapters can be used to amplify and clone the coding sequences of TCR-a and TCR-p. Once sequenced, the coding sequences of TCR-a and TCR-P can in turn be cloned into B-cells.
- Linked adapters may be designed that are specific to, for example, the constant regions in a-chain mRNAs, such as the 5' untranslated region (UTR) and “constant” (C) segment coding region, or to the constant regions in P-chain mRNAs. Following chain-Seq PCR as described in the Examples, the coding regions of the a- and P- chains that encode “variable” (V) and “joining” (J) segments may be linked using the methods described herein.
- two linked adapters can be used to isolate and/or amplify even longer sequences for sequencing. After a single linked adapter grabs a pair of particular DNA sequences, only around 300 bp of nucleic acid sequence can be sequenced. However, using two compatible linked adapters, two separate molecules that may be sequenced can be produced, allowing sequencing of twice the length of nucleic acid, i.e., 300+300 bp.
- the invention provides a diagnostic method to capture heavy and light chain transcripts of B-cells or TCR-a and TCR-fl sequences wherein no adapter is required.
- one or more linked adapters may directly link the transcript pair via PCR.
- Such a method may only require isolation of a cell either in a container or spatially on a surface.
- each linked adapter contains a primer pair corresponding to a conserved framework in, for example, the heavy (H) or light (L) chains of an antibody sequence, which can then extend and capture the full-length chain information.
- a cell or container barcode may be added to identify single cells or samples.
- the B cell or T cell of interest can be isolated from a subject with a recent infection or with a vaccine administration.
- linked adapters as described herein may be used to produce an antibody of any combination of components, such as H and L chains. Any number of coding regions for antibody components may be joined together in any configuration using any number of linked adapters to link the nucleic acid sequences together.
- the linked adapters themselves may then be removed or excised from the joined complex using specific transposases, including, but not limited to a piggyback transposase, in order to remove the chain oligo such that there is no “scar” left in the joined nucleic acid complex.
- the resulting nucleic acid complex may then be introduced into a B-cell in order to produce a specific desired antibody.
- the present invention therefore, enables the production of engineered antibodies having any desired sequence.
- Nucleic acid molecules may be acquired from a sample or a subject.
- Nucleic acid molecules include DNA and RNA.
- Methods of the invention are applicable to DNA from whole cells, portions of genetic or proteomic material obtained from one or more cells, artificial nucleic acid molecules, or viral DNA or RNA.
- Nucleic acid molecules may be extracted from a sample by any method.
- the nucleic acid molecule may include DNA, RNA, cDNA, PNA, LNA and others that are contained within a sample.
- Biological samples for use in the present invention include viral particles or preparations.
- Nucleic acid molecules can be obtained directly from an organism or from a biological sample obtained from an organism, e.g., from blood, urine, cerebrospinal fluid, seminal fluid, saliva, sputum, stool and tissue. Any tissue or body fluid specimen may be used as a source for nucleic acid for use in the invention.
- Nucleic acid molecules can also be isolated from cultured cells, such as a primary cell culture or a cell line. The cells or tissues from which template nucleic acids are obtained can be infected with a virus or other intracellular pathogen.
- nucleic acids can be obtained from non-cellular or non-tissue samples, such as viral samples, or environmental samples.
- the nucleic acid molecules are bound to other target molecules such as proteins, enzymes, substrates, antibodies, binding agents, beads, small molecules, peptides, or any other molecule and serve as a surrogate for quantifying and/or detecting the target molecule.
- Nucleic acid molecules may be single-stranded, double-stranded, or doublestranded with single-stranded regions (for example, stem- and loop-structures).
- Proteins or portions of proteins (amino acid polymers) that can bind to high affinity binding moieties, such as antibodies or aptamers, are target molecules for oligonucleotide labeling, for example, in droplets.
- molecules may be compartmentalized.
- a compartment may be an aqueous droplet in a water-in-oil emulsion.
- Said droplets may be formed using microfluidic devices according to known techniques in the art. Other methods for generating droplets as described herein may be used as appropriate, including, but not limited to, high speed vortex, ultrasonic waves, extrusion, filtering, microsieve chips, or the like.
- Individual oligonucleotides may be loaded into separate droplets according to known methods in the art.
- Nucleic acid templates can be obtained directly from an organism or from a biological sample obtained from an organism, e.g., from blood, urine, cerebrospinal fluid, seminal fluid, saliva, sputum, stool and tissue.
- a nucleic acid is obtained from fresh frozen plasma (FFP).
- a nucleic acid is obtained from formalin-fixed, paraffin-embedded (FFPE) tissues. Any tissue or body fluid specimen may be used as a source for a nucleic acid for use in the invention.
- Nucleic acid templates can also be isolated from cultured cells, such as a primary cell culture or a cell line. The cells or tissues from which template nucleic acids are obtained can be infected with a virus or other intracellular pathogen.
- a sample can also be total RNA extracted from a biological specimen, a cDNA library, viral, or genomic DNA.
- a biological sample may be homogenized or fractionated in the presence of a detergent or surfactant.
- Cells within a sample may also be lysed by a lysing agent to release nucleic acid molecules and cell components.
- Lysis or homogenization solutions may further contain other agents, such as reducing agents or lytic enzymes. Examples of such reducing agents include dithiothreitol (DTT), beta.-mercaptoethanol, DTE, GSH, cysteine, cysteamine, tricarboxyethyl phosphine (TCEP), or salts of sulfurous acid.
- the nucleic acid is denatured by any method known in the art to produce single stranded nucleic acid templates and a pair of first and second oligonucleotides is hybridized to the single stranded nucleic acid template such that the first and second oligonucleotides flank a target region on the template.
- aspects of the invention further comprise amplifying nucleic acid molecules.
- Amplification refers to production of additional copies of a nucleic acid sequence and is generally carried out using polymerase chain reaction (PCR) or isothermal amplification. Amplification aids in and may also be used to detect nucleic acid molecules.
- PCR polymerase chain reaction
- amplification reactions include, but are not limited to, quantitative PCR (qPCR), quantitative fluorescent PCR (QF-PCR), multiplex fluorescent PCR (MF-PCR), digital droplet PCR (ddPCR) single cell PCR, PCR-RFLP/real time-PCR-RFLP, hot start PCR, nested PCR, in situ polony PCR, in situ rolling circle amplification (RCA), bridge PCR, picotiter PCR, emulsion PCR and reverse transcriptase PCR (RT-PCR).
- quantitative PCR quantitative fluorescent PCR
- MF-PCR multiplex fluorescent PCR
- ddPCR digital droplet PCR
- LCR ligase chain reaction
- transcription amplification self-sustained sequence replication
- selective amplification of target polynucleotide sequences consensus sequence primed polymerase chain reaction (CP-PCR), arbitrarily primed polymerase chain reaction (AP-PCR), degenerate oligonucleotide-primed PCR (DOP- PCR) and nucleic acid based sequence amplification (NAB SA).
- CP-PCR consensus sequence primed polymerase chain reaction
- AP-PCR arbitrarily primed polymerase chain reaction
- DOP- PCR degenerate oligonucleotide-primed PCR
- NAB SA nucleic acid based sequence amplification
- Sequencing nucleic acid molecules may be performed by methods known in the art. For example, see, generally, Quail, et al., 2012, A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers, BMC Genomics 13:341. Nucleic acid molecule sequencing techniques include classic di deoxy sequencing reactions (Sanger method) using labeled terminators or primers and gel separation in slab or capillary, or next generation sequencing methods. For example, sequencing may be performed according to technologies described in U.S. Pub. 2011/0009278, U.S. Pub.
- FIG. 3 is an image of gel analysis of the two genes. As shown, gene 1 and gene 2 are each captured in the multiplexed library using linked adapters of the invention.
- FIG. 4 is an image of gel analysis of two genes using forward and reverse primers. As shown, the forward primer (read 1) captured gene 1 located on leaf 1 (arm 1) of the linked adapter and the reverse primer (read 2) captured gene 2 located on leaf 2 (arm 2) of the linked adapter.
- FIG. 5 is a plot of sequence reads from the two genes.
- FIG. 6 is a table of sequence reads from the two genes. As shown, paired reads were captured by CLOVER adapters throughout the plot, capturing both gene 1 and gene 2 in 84.05% of reads, with only 8.22% of reads unmapped.
- TCR capture using adapters of the invention was conducted on TCRs identified in FIG.
- FIG. 9 is a graph of the identification of T-cell receptor P (TRBV) variable regions from a TCR.
- CLOVER adapters identified TRBV2 and TRBV19 as predominant in the sample.
- FIG. 10 is a graph of the identification of T-cell receptor a (TRAV) variable regions gene usage.
- CLOVER adapters identified TRAV13-2 as predominant in the sample.
- FIG. 12 is a heatmap of TCR a and P variable regions detected using adapters of the invention. The linked adapters enabled paired sequencing of full length TCR a and P chains.
- FIG. 16 is a heatmap of TCR a and P variable regions detected using adapters of the invention.
- the linked adapters of the invention were utilized to analyze cell line TCRs.
- the linked adapters of the invention enabled paired sequencing of full length TCR a and P chains, with the heat map notably showing identification of TRBV12-3 and TRBV12-4 with TRAV8-4 and TRBV13 with TRAV8-6. Overall, chain pairing was identified as shown below:
- Hut78-a — Hut78-b 8.9%
- a multi-leaf (arm) adapter was synthesis for TCR capture and sequencing using 3-leaf antibody tagged adapters.
- the first leaf in the linked adapter captures a TCRa
- the second leaf captures a TCRp
- a third leaf provides an antibody (Ab) tag.
- FIG. 15 is an image of a gel analysis of a 3-leaf clover library. As shown, the adapters three-leaf CLOVER adapters allow for TCRaP+AbTag capture in bulk.
- a method for preparing a molecular library comprising: preparing a first mixture by segregating together at least two nucleic acid molecules comprising a first nucleic acid molecule and a second nucleic acid molecule from a single source.
- the method comprising adding to the first mixture a plurality of linked adapters, each linked adapter of the plurality of linked adapters.
- the linked adapter comprises a first primer and a second primer that are physically linked.
- the method comprises associating the first nucleic acid molecule from the single source to the first primer of a first linked adapter of the plurality of linked adapters and associating the second nucleic acid molecule from the single source to the second primer of the first linked adapter.
- the method comprising synthesizing a complementary strand to the first nucleic acid molecule and synthesizing a complementary strand to the second nucleic acid molecule without breaking the physical link of the linked adapters.
- the step of preparing the first mixture comprises compartmentalizing the first mixture into a plurality of compartments.
- compartmentalizing the first mixture comprises compartmentalizing a single cell or a single bead from the first mixture.
- the first primer and the second primer of the linked adapters are physically linked by ligation, click chemistry, an amplification reaction, or a oligonucleotide chemical synthesis.
- the at least two nucleic acid molecules encode a cell receptor protein.
- nucleic acid molecules encode a B-cell receptor or T-cell receptors.
- the associating step comprises annealing the first nucleic acid molecule and the second nucleic acid molecule to the first primer and the second primer, respectively.
- the at least two nucleic acid molecules are each tagged with a same cell specific barcode.
- the step of synthesizing complementary strands comprises seeding the first nucleic acid molecule and the second nucleic acid molecule on the same cluster on a flow cell without breaking the physical link of each of the linked adapters.
- seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on the flow cell comprises coupling each of the first nucleic acid molecule and the second nucleic acid molecule to a substrate of the flow cell via primers that are physically bound to the substrate and coupled to the end of each of the first nucleic acid molecule and the second nucleic acid molecules by an end that is not linked by the physical link of the linked adapter.
- a modified substrate comprising a plurality of discrete locations, each location comprising two nucleic acid molecules, each of the two nucleic acid molecules coupled at one end to the modified substrate via a primer that is physically bound to the modified substrate and coupled to each other at another end by a physical linking.
- the modified substrate is part of a flow cell, and the flow cell is configured to allow for the synthesis of complementary strands to each nucleic acid molecule without breaking the physical linking between the nucleic acid molecules.
- nucleic acid molecules are physically linked by a non-covalent linkage.
- the two nucleic acid molecules are physically linked by ligation, click chemistry, an amplification reaction, or an oligonucleotide chemical synthesis.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention provides methods, primers, and modified substrates that allow for the physical pairing of nucleic acid molecules together from a single source.
Description
PREPARATION OF MOLECULAR LIBRARIES BY LINKED ADAPTERS
CROSS-REFERENCE TO RELATED APPLICATIONS
[1] This application claims priority to and the benefit of U.S. Provisional Application No. 63/526,746, filed July 14, 2023, the contents of which are incorporated herein in their entirety.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[2] This invention was made with government support, under Grant Nos. 2019-19081900002 awarded by the Intelligence Advanced Research Projects Activity, S4599-001 awarded by the Department of Defense, and AI142780 awarded by the National Institutes of Health. The government has certain rights in the invention.
TECHNICAL FIELD
[3] This disclosure relates to methods and systems for profiling single cells.
BACKGROUND
[4] Differences exist between even the smallest populations of cells within an organism. As a result, analysis of whole cellular populations within a sample provides only an average of the sample that may not be representative of any one cell. For example, cancer cells generally develop differences from one another over time, becoming more heterogeneous. This can result in the emergence of rare variants and differences between cells in responses to interventions. Failure to capture these differences can result in improper diagnostic and ameliorative efforts.
[5] Single-cell analysis seeks to analyze cells individually. However, in order to capture an entire population of cells at the single cell level, millions of cells may need to be individually processed and molecules within the cells labeled for later analysis. As the data produced increases, analysis of that data becomes a major challenge, increasing both the expense and time needed to analyze a sample.
SUMMARY
[6] The present invention provides methods, primers, and modified substrates that allow for the physical pairing of molecules together from a single source, for example a single cell. This allows for the sequencing of linked molecules simultaneously without needing to label each
molecule as originating from that single source. This greatly reduces the data burden produced during sample preparation for later analysis, reducing the expense and time needed for single-cell analysis.
[7] Aspects of the invention provide methods for preparing a molecular library. The methods comprise preparing a first mixture by segregating together at least two nucleic acid molecules from a single source, such as a single cell. The first mixture is then contacted with a plurality of linked adapters, each linked adapter comprising a first primer and a second primer that are physically linked. In particular, a first nucleic molecule from the single source is associated with the first primer of a linked adapter and a second nucleic acid molecule from the single source is associated with the second primer of the same linked adapter. Complementary strands are then generated from each of the first nucleic acid molecule and the second nucleic acid molecule without breaking the physical link of the linked adapters. Aspects of the invention may further comprise the step of amplifying the first and second nucleic acid molecules after the associating step and/or after the step of generating complementary strands to each of the first and second nucleic acid molecules.
[8] Advantageously, any method for isolating molecules within a sample may be used together with the invention. For example, the step of preparing the first mixture may comprise compartmentalizing the first mixture, for example through the use of droplet emulsification or microfluidics. Accordingly, the step of compartmentalizing the first mixture may comprise compartmentalizing a single cell from the mixture, thereby segregating the single cell from the mixture (e.g. from other single cells in the mixture) and thereby segregating the nucleic acid molecules from the single cell from the mixture.
[9] Linked adapters of the invention may also be physically linked by any method including covalent cross-linking and non-covalent linkage. For example, the primers of the linked adapters may be physically linked by ligation, click chemistry, protein linkers, polymer linkers, an amplification reaction, or oligonucleotide chemical synthesis.
[10] Further advantageously, because nucleic acid molecules from a single source are linked by linked adapters, they do not need to remain segregated for sequencing. As a result, aspects of the invention may comprise the step of breaking compartments formed in the preparing step into a single container, thereby pooling the contents of the compartment (e.g. microfluidic droplets)
into a single container. The step of breaking compartments may be prior to the step of generating sequence reads.
[11] Any nucleic acid molecule may be analyzed by methods of the invention. For example, the at least two nucleic acid molecules may comprise nucleic acid molecules encoding a cell receptor protein. The nucleic acid molecules may encode a B-cell receptor or a T-cell receptor (TCR). Advantageously, the linked primers of the invention may be used to link first and second nucleic acid molecules that encode related molecules, such as subunits of a protein. For example, the first nucleic acid molecule paired to the first primer of the linked adapter may encode a TCRa and the second nucleic acid molecule paired to the second primer of the linked adapter may encode a TCRp of a TCR, thereby linking a nucleic acid molecule encoding each protein chain of a TCR heterodimer. The same principle may be used to link nucleic acid molecules encoding any related molecules.
[12] Nucleic acid molecules may also be associated with primers of the linked adapters by any method. For example, associating a first and second nucleic acid molecule to the first and second primer of the linked adapter may comprise annealing the nucleic acid molecules to the primers. For example, the associating step may comprise annealing the first and second nucleic acid molecules to the first and second primers.
[13] Advantageously, greatly reducing the cost of complexity of cell analysis, methods of the invention allow for the sequencing of linked molecules simultaneously without the need for individual molecules to need to be labeled as from that single source. Accordingly, methods of the invention may be completed with the proviso that the steps are completed without tagging the nucleic acid molecules with a cell specific barcode. Alternatively, the at least two nucleic acid molecules may also be tagged with a cell specific barcode.
[14] Methods for analyzing nucleic acid molecules may comprise sequencing nucleic acid molecules. Accordingly, aspects of the invention further comprise the step of generating sequence reads for the first nucleic acid molecule and the second nucleic acid molecule.
[15] The step of synthesizing complementary strands for each nucleic acid molecule may comprise seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on a flow cell without breaking the physical link of the linked adapters. For example, the
linked nucleic acid molecules may be seeded as a single cluster on a flow cell, resulting in a cluster with the two linked nucleic acid molecules seeded together. This is contrasted to conventional methods for flow cell sequencing that require “pure” clusters that comprise only a single nucleic acid sequence.
[16] Generating sequence reads can be done by any known method, including using next generation sequencing and long read sequencing. For example, adapters may be paired to a target DNA molecule to sequence each strand of the DNA molecule multiple times in one continuous long read. Several DNA fragments may be concatenated to produce a longer DNA amplicon for sequencing, which may be attached to adapters. For example, individual genes may be amplified and the amplicons concatenated into a single fragment for sequencing. Methods of long-read sequencing are described, for example, in Kanwar et al. (2021) Scientific reports 11 :18065, the entirety of the contents of which are incorporated by reference herein.
[17] Seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on the flow cell may comprise coupling each of the first nucleic acid molecule and second nucleic acid molecule to a substrate of the flow cell via binding primers that are physically bound to the substrate and coupled to the end of each first and second nucleic acid molecules by an end that is not linked by the physical link of the linked adapter. Binding primers are typically complementary to a sequence at a point on the substrate of the flow cell, for example on the surface of the substrate or within a well on the substrate. The binding primer added to each nucleic acid molecule may also comprise binding sites and adapters for sequencing and downstream processing. Alternatively, seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on the flow cell may comprise coupling each of the first nucleic acid molecule and second nucleic acid molecule to a substrate of the flow cell via a binding primer physically bound to the substrate and coupled to the end of each nucleic acid molecule that is linked by the physical link of the linked adapter.
[18] Aspects of the invention further provide a modified substrate comprising a plurality of discrete locations, each location comprising two nucleic acid molecules. Each of the two nucleic acid molecules are coupled at one end to the substrate via a primer that is physically bound to the substrate and coupled to one another at their other end by a physical linking. Each of the two nucleic acid molecules may be distinct and different.
[19] As described, linked adapters of the invention may be physically linked by any method including covalent cross-linking and non-covalent linkage. For example, the primers of the linked adapters may be physically linked by ligation, click chemistry, an amplification reaction, or oligonucleotide chemical synthesis.
[20] Advantageously, the distinct nucleic acid molecules do not need to be and are not tagged with a cell specific barcode. Although in some embodiments, they may also be tagged with a barcode, such as a cell specific barcode or a unique molecule identifier (UMI).
[21] The modified substrate may be part of a flow cell. For example, each discrete location is a cluster on the flow cell. The flow cell may be configured to allow for the synthesis of complementary strands to each nucleic acid without breaking the physical link between the nucleic acids at a discrete location.
BRIEF DESCRIPTION OF DRAWINGS
[22] FIG. l is a diagram of mRNA and cDNA synthesis from cell samples.
[23] FIG. 2 is a diagram of an exemplary method of the invention for TCR analysis.
[24] FIG. 2 is a diagram of TCR capture and sequencing using adapters of the invention.
[25] FIG. 3 is an image of gel analysis of two genes.
[26] FIG. 4 is an image of gel analysis of two genes.
[27] FIG. 5 is a plot of sequence reads from two genes.
[28] FIG. 6 is a table of sequence reads from two genes.
[29] FIG. 7 is a diagram of linked adapter capture of the invention.
[30] FIG. 8 is a table of sample reads aligned to genes using adapters of the invention.
[31] FIG. 9 is a graph of identification of T-cell receptor (TRBV) variable regions.
[32] FIG. 10 is a graph of the identification of T-cell receptor a (TRAV) variable regions.
[33] FIG. 11 is a table identifying TCR variable (V), diversity (D), and joining (J) fragments using adapters of the invention.
[34] FIG. 12 is a heatmap of gene combinations detected by the invention.
[35] FIG. 13 is a diagram of multi-leaf adapter synthesis using click chemistry.
[36] FIG. 14 is a diagram TCR capture and sequencing using 3-leaf antibody tagged adapters of the invention.
[37] FIG. 15 is an image of gel analysis of a 3-leaf clover library.
[38] FIG. 16 is a heatmap of TRAV-TRBV gene combinations.
DETAILED DESCRIPTION
[39] The present invention provides methods, primers, and modified substrates that allow for the physical pairing of molecules together from a single source, for example a single cell. This allows for the sequencing of linked molecules simultaneously without the need for individual molecules to need to be labeled as from that single source. This greatly reduces the data burden produced during sample preparation for later analysis, reducing the expense and time needed for single-cell analysis while improving cell capture rate and throughput of single cell analysis.
[40] The present invention, without limitation, may be applied to improve deep mining of immune receptor repertoires, direct cloning of sample regeneration, and the analysis of novel targets for therapeutics and novel biomarkers for patient stratification.
[41] An exemplary method for preparing a molecular library comprises preparing a first mixture by segregating together (for example, via microfluidic droplet formation) at least two nucleic acid molecules from a single source; contacting the first mixture with a plurality of linked adapters, each linked adapter comprising a first and second primer that are physically linked; associating a first nucleic acid molecule from the single source to the first primer of a linked adapter and a second nucleic acid molecule from the single source to the second primer of the linked adapter; and synthesizing a complementary strand to the first nucleic acid molecule and a complementary strand to the second nucleic acid molecule without breaking the physical link of the linked adapters.
[42] FIG. l is a diagram of mRNA and cDNA synthesis from cell samples. Magnetic beads in a lysis buffer and a cell sample (e.g. comprising cells 101, 105, 109, and their components) are fed into separate fluidic channels, for example under microfluidic control. Oil in water
suspensions are used to create droplets capturing the magnetic beads together with cells. Capture sequences on the surface of the beads are used to capture mRNA from the cells (e.g. cell 109 and corresponding mRNA as shown) segregated into the droplets. Droplets are broken and the contents pooled for bulk cDNA analysis with mRNA bound to the magnetic beads.
[43] FIG. 2 diagrams an exemplary method of the invention. Linked adapters of the invention are segregated under microfluidic control together with the magnetic beads carrying mRNA from cells now bound to the magnetic beads, for example mRNA encoding TCR sequences. It can be appreciated that cells may be directly introduced together with lytic agents and the linked adapters in the absence of magnetic beads. A TCRa and TCRP from a TCR are each linked to a respective primer on a single linked adapter within the droplet. A binding sequence complementary to a position on the solid support of a flow cell is added to the ends of each nucleic acid molecule that are not associated with the linked adapter. Droplets are then broken and the contents of each droplet are pooled together for sequencing on a flow cell. By the binding sequence, the ends of each nucleic acid molecule that are not associated with the linked adapter form a binding point for the nucleic acid molecules to the flow cell.
[44] Once bound, each cluster represents one linked adapter which represents molecules from one cell. A first read is then generated for the TCRa and a second read is generated for the TCRp.
[45] FIG. 3 is an image of gel analysis of two genes. As shown, gene 1 and gene 2 are each captured in the multiplexed library using linked adapters of the invention.
[46] FIG. 4 is an image of gel analysis of two genes using forward and reverse primers. As shown, the forward primer (read 1) captures gene 1 located on leaf 1 (arm 1) of the linked adapter and the reverse primer (read 2) captures gene 2 located on leaf 2 (arm 2) of the linked adapter.
[47] FIG. 5 is a plot of sequence reads from the two genes. In the multiplexed library, linked adapters of the invention (identified specifically as CLOVER adapters, used throughout the disclosure in reference to linked adapters of the invention) capture both gene 1 and gene 2.
[48] FIG. 6 is a table of sequence reads from the two genes. Clover adapters capture both gene 1 and gene 2 in 84.05% of reads, with only 8.22% of reads unmapped.
[49] FIG. 7 is a diagram of TCR capture and sequencing using adapters of the invention. A first TCRa and TCRp from a TCR are each linked to a respective primer on a single linked adapter. Each of the TCRa and TCRP provide a read for sequencing as separate “leafs” from the linked adapter.
[50] FIG. 8 is a table of sample reads aligned to genes using adapters of the invention. From 152,336 reads from the sample, 142,372 reads are aligned to a TCRa bound to leaf (arm) 1 of a linked adapter. 100,767 reads are aligned to the TCRp , leaf (arm) 2 of the linked adapter.
[51] FIG. 9 is a graph of the identification of T-cell receptor p (TRBV) variable regions, identifying TRBV2 and TRBV19.
[52] FIG. 10 is a graph of the identification of T-cell receptor a (TRAV) variable regions gene usage, identifying TRAV13-2.
[53] FIG. 11 is a table identifying TCR variable (V), diversity (D), and joining (J) fragments using adapters of the invention.
[54] FIG. 12 is a heatmap of TCR a and P variable regions detected using adapters of the invention. The linked adapters enable paired sequencing of full length TCR a and P chains, with the heat map showing identification of TRBV2 with TRAV13-2 and TRBV19 with TRAV13-2 from the sample.
[55] FIG. 13 is a diagram of multi-leaf (arm) adapter synthesis using click chemistry. A first leaf is bound to an azide and a second leaf bound to an alkyne. Under reaction conditions (for example a copper catalyzed reaction), a 5-member heteroatom ring is formed with leaf 1 and leaf 2.
[56] Advantageously, linked adapters of the invention may comprise 3-arms or 3-leafs, used interchangeably herein.
[57] FIG. 14 is a diagram showing TCR capture and sequencing using 3-leaf antibody tagged adapters of the invention. As shown, a first leaf in a linked adapter captures a TCRa, a second leaf captures the TCRp, and a third leaf is represented by an antibody (Ab) tag.
[58] FIG. 15 is an image of a gel analysis of a 3-leaf clover library.
[59] FIG. 16 is a heatmap of TCR a and P variable regions detected using adapters of the invention.
Linked Adapters
[60] Linked adapters of the invention may be physically linked by any method including covalent linking and non-covalent linkage. For example, the primers of the linked adapters may be physically linked by ligation, click chemistry, an amplification reaction, or oligonucleotide chemical synthesis. Linked adapters that may be used in the present invention may be as described in U.S. Patent No. 10,982,278. Alternatively, polymer, glycan (dextran), or protein linkers may form a scaffold onto which primers are grafted to form the linked adapters.
[61] In aspects of the invention, the linked adapters may comprise two or more separate oligonucleotide arms that are linked to one another at the 5' end of each arm. Thus, each arm of the linked adapter has two or more free 3' ends and comprise at least a hybridization domain capable of binding to a complementary sequence on a target oligonucleotide. The linked adapters may be designed to bind to RNA, DNA, or a combination thereof. Amplification reactions utilizing the linked adapters then result in a single amplicon or molecule that incorporates the sequence of both target molecules. This single amplicon or molecule may then be used in further processing steps such as, but not limited to, sequencing.
[62] In aspects of the invention, covalent linkage of polynucleotides is achieved using 5 '-5' linked oligonucleotides, these linked adapters also referred to as a “chain-seq” or “crab-seq” oligonucleotides. A 5'-5' linked oligonucleotide comprises two or more “arms,” each comprising an oligonucleotide sequence that is linked at the 5' end via a covalent or non-covalent biocompatible linkage. Thus, the chain-seq oligonucleotide comprises two or more free 3' ends. Each arm of the 5' -5' linked oligonucleotide may comprise the same oligonucleotide sequence, or each arm may comprise a different oligonucleotide sequence. Likewise, the oligonucleotide sequences of each arm may be the same or of different lengths. In certain embodiments, an individual arm may be from about 8 to about 1000 nucleotides in length. For example, an individual arm may be 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800,
850, 900, 950, 1000 nucleotides, or the like. Linking together individual nucleic acid molecules as described herein may result in a nucleic acid construct of any size, containing any number of joined nucleic acid molecules appropriate in accordance with the invention.
[63] Each arm may be single stranded, double stranded, or a combination thereof. The oligonucleotide may comprise DNA, RNA, or a combination thereof. The arms may also comprise, in full or in part, nucleotide analogs such as peptide nucleic acids, morpholino and locked nucleic acids, glycol nucleic acid, and threose nucleic acids.
[64] In some embodiments, a portion of each arm of the 5 '-5' linked oligonucleotide may comprise a binding domain comprising a nucleic acid sequence that is complementary to and hybridizes with a target sequence. A target sequence may be a naturally occurring nucleic acid sequence or may be artificially introduced into a target polynucleotide as appropriate depending on the application.
[65] In certain embodiments, a 5'-5' linked oligonucleotide may comprise at least two, at least three, at least four, at least five, at least six, at least seven, at least eight arms, or any number of arms as appropriate for the number of target oligonucleotides to be linked via the methods disclosed herein. In some embodiments, the arms of an oligonucleotide as described herein may be connected via a common linkage. In certain embodiments, each arm may recognize the same target sequence. In other embodiments, each arm, or a subset of arms, may recognize a different target sequence. For example, given a four-arm 5 '-5' linked oligonucleotide, each arm may recognize up to four different target sequences. Alternatively, two arms could hybridize to a first target sequence and the remaining two arms could hybridize to a second target sequence. Other similar variations are contemplated and are within the scope of this invention.
[66] The 5' ends of each arm may be linked to each other using means known in the art for linking nucleic acids to each other. In some embodiments, nucleic acids are linked together via a biocompatible reaction. In certain embodiments, a biocompatible reaction may comprise use of “click chemistry” (see, e.g., Rostovtsev et al., Angew Chem Int Ed 41 :2596-2599, 2002; Himo et al., J Am Chem Soc 127:210-216, 2005; Boren et al., J Am Chem Soc 130:8923-8930, 2008). An example of a click chemistry reaction is the Huisgen 1,3-dipolar cycloaddition of alkynes to azides to form l,4-disubstituted-l,2,3-triazoles. The copper(I)-catalyzed reaction is mild and very efficient, requiring no protecting groups, and requiring no purification, in many cases. The
azide (AZ) and alkyne (AK) functional groups are largely inert towards biological molecules and aqueous environments, which allows the use of the Huisgen 1,3-dipolar cycloaddition in target- guided synthesis and activity-based protein profding. Thus, in some embodiments, a chain oligo is formed by linking the 5' end of one nucleic acid strand that includes an azide group to the 5' end of another nucleic acid strand that includes an alkyne group. Other exemplary biocompatible reactions include copper(I)-catalyzed azide-alkyne cycloaddition (CuAAC) reaction, a copper- free strain-promoted azide-alkyne cycloaddition (SPAAC) reaction, and a thiol-ene reaction.
[67] In certain example embodiments, each arm of the 5 '-5' linked oligonucleotide may be connected indirectly via a binding or scaffolding molecule. In other embodiments, indirect means for linking the arms of such an oligo may include use of binding or scaffolding molecules such as, but not limited to polymers, such as polyethylene glycol (PEG) and other polyethers.
[68] In certain example embodiments, spacers may be employed, for example, to reduce steric hindrance between individual arms. The spacer may be an alkyne or an azide spacer. In some embodiments, a spacer may be joined to an oligo of the invention using direct or indirect means, including, but not limited to polymers, such as polyethylene glycol (PEG) and other polyethers. In certain embodiments, a spacer may be 8 to 1000 nucleotides in length. For example, a spacer may be 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 nucleotides, or the like.
[69] As used herein, a nucleic acid (a polymer of nucleotides) is “single-stranded” if nucleotides that form the nucleic acid are unpaired. That is, nucleotides of a single-stranded nucleic acid are not base-paired (via Watson-Crick base pairs, e.g., guanine-cytosine and adenine-thymine/uracil) to nucleotides of another nucleic acid. A single-stranded nucleic acid may be contrasted with a double-stranded (paired) nucleic acid, a typical example of which is a DNA double helix. Single-stranded nucleic acids may include a contiguous (uninterrupted) sequence of nucleotides or, in some embodiments, a single-stranded nucleic acid may be a conjugate that includes two nucleic acid strands j oined together, for example, through a chemical (covalent) linkage.
[70] In nature, a single strand of a nucleic acid (e.g., DNA or RNA) has a 5' end (five-prime end) and a 3' end (three-prime end). The 5' end typically contains a phosphate group attached to the 5' carbon of the ribose ring of a nucleotide and a 3' end, which is unmodified from the ribose — OH substituent. Nucleic acids are synthesized in vivo in the 5' to 3' direction. Polymerase relies on the energy produced by breaking nucleoside triphosphate bonds to attach new nucleoside monophosphates to the 3 '-hydroxyl ( — OH) group via a phosphodiester bond.
[71] An engineered single-stranded nucleic acid of the present disclosure has two 3' ends (a chain oligo). Each terminus of the single-stranded nucleic acid includes a 3 '-hydroxyl ( — OH) group. In some embodiments, a single-stranded chain oligo is formed by joining (linking) the 5' end of one single-stranded nucleic acid to the 5' end of another single-stranded nucleic acid. In some embodiments, the linkage between two 5' ends is a covalent linkage. In other embodiments, the linkage is non-covalent.
[72] Each arm of a chain-oligo may comprise a hybridization domain. A “domain” refers to a discrete, contiguous sequence of nucleotides or nucleotide base pairs, depending on whether the domain is unpaired (single-stranded nucleotides) or paired (double-stranded nucleotide base pairs), respectively. A hybridization domain facilitates binding of the chain-oligo to a complementary sequence on a target oligonucleotide i.e., the target sequence. A domain is “complementary to” a target sequence if the domain contains nucleotides that base pair (hybridize/bind through Watson-Crick nucleotide base pairing) with nucleotides of the target sequence such that a paired (double-stranded) or partially-paired molecular species/ structure is formed. Complementary domains need not be perfectly (100%) complementary to form a paired structure, although perfect complementarity is provided, in some embodiments. The length of a hybridization domain may vary. In some embodiments, a hybridization domain may have a length of 5-50 nucleotides. For example, an anchor domain may have a length of 5-45, 5-40, 5- 35, 5-30, 5-25, 5-20, 5-15, 5-10, 10-50, 10-45, 10-40, 10-35, 10-30, 10-25, 10-20, 10-15, 15-50, 15-45, 15-40, 15-35, 15-30, 15-25, 15-20, 20-50, 20-45, 20-40, 20-35, 20-30, 20-25, 25-40, 25- 35, 25-30, 30-40, 30-35, or 35-40 nucleotides. In other embodiments, a hybridization domain may have a length of 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides. In some embodiments, a hybridization domain may have a length of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
or 50 nucleotides. A hybridization domain, in some embodiments, may be longer than 50 nucleotides, or shorter than 5 nucleotides.
[73] In certain example embodiments, one or more chain-oligo arms may further comprise a primer domain. A primer domain is a domain to which a primer binds. A primer is a strand of short nucleotide sequence that serves as a starting point for nucleic acid (e.g., DNA) synthesis. In some embodiments, chain oligos may comprise a pair of internal primer domains (e.g., near the linked 5' ends), which may be used for amplification of sequence-ready constructs produced using the methods of the present disclosure. The length of a primer domain may vary. In some embodiments, a primer domain may have a length of 5-50 nucleotides, for example, a length of 5-45, 5-40, 5-35, 5-30, 5-25, 5-20, 5-15, 5-10, 10-50, 10-45, 10-40, 10-35, 10-30, 10-25, 10-20, 10-15, 15-50, 15-45, 15-40, 15-35, 15-30, 15-25, 15-20, 20-50, 20-45, 20-40, 20-35, 20-30, 20- 25, 25-40, 25-35, 25-30, 30-40, 30-35, or 35-40 nucleotides. In some embodiments, a primer domain has a length of 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides. In some embodiments, a primer domain has a length of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides. A primer domain, in some embodiments, may be longer than 50 nucleotides, or shorter than 5 nucleotides.
[74] In certain example embodiments, one or more chain-oligo arms may further comprise a sequencing adapter. A sequencing adapter is a nucleotide sequence that facilitates binding of oligonucleotide sequences generated using the methods disclosed herein to complementary sequences used in certain next-generation sequencing technologies.
[75] In some embodiments, a method of the invention may involve the use of a binding tag. For example, a nucleic acid as described herein may be labeled with an affinity tag, for example an affinity pull-down functional group, on the first arm, or the second arm, or both. In some embodiments, an affinity tag may be used to isolate a biomolecule of interest, for example an amplified nucleic acid, such as an amplified segment of a template DNA molecule or fragment or portion thereof. Such an amplified nucleic acid may contain one or more adapter molecules as described herein, which may serve as a means for isolation of the nucleic acid.
[76] As described herein, a chain oligo may have more than two arms, and thus an affinity tag may be present on a chain oligo on only a single arm, on multiple arms, or on all arms of the
chain oligo. As used herein, an affinity tag may be used to isolate a biomolecule of interest, such as a nucleic acid, polynucleotide, protein, or the like. Affinity tags attached to as described herein may be removed by chemical or enzymatic means. One of skill in the art would be able to identify appropriate methods and means for an affinity tag in accordance with the invention. Non-limiting examples of affinity tags include an enzymatic modification such as biotin or desthiobiotin; a fluorescent tag, such as green fluorescent protein (GFP); or a solubilization tag, such as thioredoxin, maltose binding protein, glutathione-S-transferase, or poly(NANP). In accordance with the invention, any binding tag appropriate for the specific application may be used to isolate or separate a biomolecule of interest as described herein.
[77] In addition to linking together DNA molecules, some embodiments of the invention involve the use of chain oligos as described herein to link together two RNA molecules. For example, a chain oligo having two arms may be bound to a first RNA molecule and a second RNA molecule such that the RNA molecules are linked together to form a single long RNA molecule, wherein the chain oligo is located between the first and second RNA molecules. Reverse transcription may then be performed to produce cDNA of both the first and the second RNA molecules, wherein both 3' ends of the chain oligo serve as primer molecules for first- strand cDNA synthesis according to methods known in the art. In some embodiments, the newly produced cDNA may be dissociated from the template RNA molecules and a second, distinct chain oligo may hybridize to the 3' ends of the cDNA, then the first chain oligo and second strand synthesis as known in the art may be performed in order to produce a double-stranded DNA copy of the starting RNA molecule. In some embodiments, second strand cDNA may be synthesized using additional conventional mRNA specific primers, or using a common template switching adapter and a primer priming a sequence in the template switching adapter.
[78] In further embodiments, two or more chain oligos may be used to link together two or more RNA molecules. For example, a first chain oligo may be hybridized to a first RNA molecule, wherein the first 3' end of the first chain oligo hybridizes to the first RNA molecule and the second 3' end of the first chain oligo hybridizes to a second RNA molecule; a second chain oligo may be hybridized to the second RNA molecule, wherein the first 3' end of the second chain oligo hybridizes to the second RNA molecule and the second 3' end of the second chain oligo hybridizes to a third RNA molecule; a third chain oligo may be hybridized to the
third RNA molecule, wherein the first 3' end of the third chain oligo hybridizes to the third RNA molecule and the second 3' end of the third chain oligo hybridizes to the first RNA molecule, such that a circular nucleic acid molecule is formed by the hybridization of the first, second, and third RNA molecules and the first, second, and third chain oligos. First strand cDNA synthesis may be performed as known in the art and as described herein.
[79] The nucleic acid molecules linked by the 5 '-5' linked oligonucleotides may be mRNA molecules encoding different transcripts. The nucleic acid molecules may encode immunoglobulin heavy and light chains, or T cell receptor a and T cell receptor . The nucleic acid molecules linked by the 5'-5' linked oligonucleotides may be DNA molecules, for example, genomic DNAs harboring different mutations or polymorphisms. The nucleic acid molecules may be isolated from different cells, for example, immune cells including T cells, B cells, dendritic cells, macrophages, neutrophils, mast cells, eosinophils, basophils, and natural killer cells. The nucleic acid molecules may encode any cellular receptors or lectins.
[80] In further embodiments, the 5 '-5' linked oligonucleotides may be used in linking two ends of a nucleic acid molecule and downstream amplification and sequencing procedures. The nucleic acid molecule may be a DNA or RNA. The nucleic acid molecule may be at least 1 kb, at least 2 kb, at least 3 kb, at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, or at least 10 kb in length. The use of the 5 '-5' linked oligonucleotides in sequencing may be applied to archiving an immune repertoire by capturing all sequences in a Crab-Seq library derived from a human sample, such as a blood or urine sample. In further embodiments, the 5 '-5' linked oligonucleotides may be used to characterize cell type specific mRNAs and to identify or profile any cell types.
Flow Cell
[81] Aspects of the invention provide a modified substrate comprising a plurality of discrete locations, each location comprising two nucleic acid molecules, each of the two nucleic acid molecules coupled at one end to the substrate via a primer that is physically bound to the substrate and coupled to each other at another end by a physical linking. Substrates refer to physical structures/surfaces onto which molecules of interest may be immobilized in geographically distinct regions, thereby allowing molecules to be screened in parallel. Substrates
may comprise glass, metal, metal oxide, and carbon, which materials may be selected based on wettability, charge, and reactivity of the material.
[82] In aspects of the invention, the modified substrate may be a part of a flow cell.
[83] A flow cell is a channel for fixing mobile nucleic acid molecules on a flow cell surface, referred to as a substrate. As nucleic acid molecules flow through the channel and over the substrate, nucleic acid molecules are bound, e.g. by hybridization, to fixed points along the substrate. The bound nucleic acid molecules may then be amplified, e.g. by bridge amplification, to form a clonal cluster.
[84] In conventional flow cell methods, nucleic acid molecules in the sequencing library will randomly attach to the lanes on the surface of the flow cell when they pass through it. For example, each flow cell may have 8 lanes, each lane having a number of binding primers attached to the surface, which can match binding primers added at the ends of the nucleic acid molecules during sample processing. Binding primers may also be found on nanowells on the substrate that space out clusters and prevent overcrowding of clusters.
[85] Amplification is then performed using the adapters on the flow cell surface as a template. After continuous amplification and mutation cycles, each nucleic acid molecule will eventually be clustered in bundles at their respective locations, each containing many copies of a single nucleic acid template.
[86] Advantageously, in methods of the invention, two nucleic acid molecules linked by a linked adapter are seeded at single location along the flow cell. For example, after associating each nucleic acid molecule with a linked adapter, the same binding primer may be added to each nucleic acid molecule, thereby resulting in the two nucleic acid molecules hybridizing to the same point on the flow cell. As a result, rather than attaching individual nucleic acid molecules to lanes on the surface of the flow cell, linked nucleic acid molecules bind to the same location on the flow cell allowing for complementary strands to be synthesized at the same location in the flow cell. Even after, for example, washing away the linked molecule, the synthesized complementary strands remain close to one another on the flow cell and form part of the same cluster following amplification.
[87] Examples of flow cells that may be used together with the present invention include the Genome Analyzer (commercially available by Illumina, Inc.), a parallel, fluorescence-based readout of immobilized sequences that are iteratively sequenced. Sequencing according to this technology is described in U.S. Pat. Nos. 7,960,120; 7,835,871; 7,232,656; 7,598,035;
6,911,345; 6,833,246; 6,828,100; 6,306,597; 6,210,891; U.S. Pub. 2011/0009278; U.S. Pub. 2007/0114362; U.S. Pub. 2006/0292611; and U.S. Pub. 2006/0024681, each of which are incorporated by reference in their entirety.
[88] Methods of the invention may also be used together with fluorescence-based approaches, for examples fluorescence in situ hybridization (FISH). For example, the methods may include microcontact printing to stamp molecules to a substrate for polymerase catalyzed extension and amplification. Polymerase DNA colonies (polonies) are then formed on the surface of the substrate. For stamping, an elastomeric crosslinked polyacrylamide may be used to efficiently stamp molecules. Each polony facilitates the registration of fluorescence signals. Polony based assays are described, for example, in Fu et al. (2022) Cell 185:4621-4633, the entirety of the contents of which are incorporated herein.
[89] In aspects of the invention, DNA templates can be amplified to polonies on a gel surface. A tissue cryosection may be placed onto a polony gel for mRNA capture and spatial indexing prior to library amplification. Intensity profiles for polonies stained by fluorescence can be generated for spatial index mapping. DNA copies per polonies can be quantified by real-time image quantification.
T-cell and B-cell Receptor Profiling
[90] In humans and closely related species, cellular immunity is mediated by T cells (or T lymphocytes), which participate directly in the detection and neutralization of pathogenic threats. Essential to T-cell function are highly specialized extracellular receptors (T-cell receptors or TCRs) that selectively bind specific antigens displayed by major histocompatibility complex (MHC) molecules on the surface of antigen-presenting cells (APCs). Antigen recognition by TCRs activates T cells, causing them to proliferate rapidly and mount immune responses through the release of cytokines.
[91 ] Given the relative specificity of TCR-antigen interactions, a tremendous diversity of TCRs are required to recognize the wide assortment of pathogenic agents one might encounter. To this end, the adaptive immune system has evolved a system for somatic diversification of TCRs that is unrivaled in all of biology. The vast majority of TCRs are heterodimers composed of two distinct subunit chains (a- and 0-), which both contain variable domains and, in humans, are encoded by single-copy genes. The term “clonotype” is typically used to refer either to a particular TCR variant (TCR-a or TCR-P subunit), or to a particular pairing of TCR subunit variants (TCR-a+TCR-P) shared among a clonal population of T cells.
[92] The state of arts for T-cell clonotype with pairing of TCR subunit variants (TCR-a+TCR- P) is using cell-based emulsion overlap-extension RT-PCR technique (Turchaninova et al., Eur J Immunol 43:2507-15, 2013). But there are critical shortcomings in this method. It relies on blocker oligos to inhibit unfused molecule amplification, and can only use DNA polymerase without 3' to 5' exonuclease activity, like Taq polymerase, instead of any high fidelity polymerase. Otherwise, the blocker oligos will be degraded. Therefore, the final sequencing results will contain many artificial sequencing errors introduced by low fidelity polymerase. Additionally, some of the unfused molecules from different cells are possibly fused and amplified with each other in the nested PCR step after breaking emulsion and pooled the molecules in bulk.
[93] Using linked adapters as described herein, these shortcomings can be overcome and throughput and efficiency can be significantly increased. For example, for any specific clonotyped T-cell, specifically designed linked adapters can be used to amplify and clone the coding sequences of TCR-a and TCR-p. Once sequenced, the coding sequences of TCR-a and TCR-P can in turn be cloned into B-cells. Linked adapters may be designed that are specific to, for example, the constant regions in a-chain mRNAs, such as the 5' untranslated region (UTR) and “constant” (C) segment coding region, or to the constant regions in P-chain mRNAs. Following chain-Seq PCR as described in the Examples, the coding regions of the a- and P- chains that encode “variable” (V) and “joining” (J) segments may be linked using the methods described herein.
[94] Furthermore, two linked adapters can be used to isolate and/or amplify even longer sequences for sequencing. After a single linked adapter grabs a pair of particular DNA
sequences, only around 300 bp of nucleic acid sequence can be sequenced. However, using two compatible linked adapters, two separate molecules that may be sequenced can be produced, allowing sequencing of twice the length of nucleic acid, i.e., 300+300 bp.
[95] In some embodiments, the invention provides a diagnostic method to capture heavy and light chain transcripts of B-cells or TCR-a and TCR-fl sequences wherein no adapter is required. In such a case, one or more linked adapters may directly link the transcript pair via PCR. Such a method may only require isolation of a cell either in a container or spatially on a surface. In this case each linked adapter contains a primer pair corresponding to a conserved framework in, for example, the heavy (H) or light (L) chains of an antibody sequence, which can then extend and capture the full-length chain information. To ensure that each linked adapter contains H and L pairing, a cell or container barcode may be added to identify single cells or samples. In some embodiments, this could be performed in emulsion to ensure single-cell copies or in situ, such as within agarose, in a method similar to polymerase colonies (i.e., “polonies”). Such methods would enable very cost-effective preparation of single-cell resolution immune cell profiles, or the pairing of any transcript set of interest. In some embodiments, the B cell or T cell of interest can be isolated from a subject with a recent infection or with a vaccine administration.
[96] In other embodiments, linked adapters as described herein may be used to produce an antibody of any combination of components, such as H and L chains. Any number of coding regions for antibody components may be joined together in any configuration using any number of linked adapters to link the nucleic acid sequences together. In a particular embodiment, once such nucleic acids are joined together using linked adapters, the linked adapters themselves may then be removed or excised from the joined complex using specific transposases, including, but not limited to a piggyback transposase, in order to remove the chain oligo such that there is no “scar” left in the joined nucleic acid complex. The resulting nucleic acid complex may then be introduced into a B-cell in order to produce a specific desired antibody. The present invention, therefore, enables the production of engineered antibodies having any desired sequence.
Sample Preparation
[97] Aspects of the invention allow for the improved analysis of nucleic acid molecules. Nucleic acid molecules may be acquired from a sample or a subject. Nucleic acid molecules include DNA and RNA. Methods of the invention are applicable to DNA from whole cells,
portions of genetic or proteomic material obtained from one or more cells, artificial nucleic acid molecules, or viral DNA or RNA. Nucleic acid molecules may be extracted from a sample by any method. The nucleic acid molecule may include DNA, RNA, cDNA, PNA, LNA and others that are contained within a sample.
[98] Biological samples for use in the present invention include viral particles or preparations. Nucleic acid molecules can be obtained directly from an organism or from a biological sample obtained from an organism, e.g., from blood, urine, cerebrospinal fluid, seminal fluid, saliva, sputum, stool and tissue. Any tissue or body fluid specimen may be used as a source for nucleic acid for use in the invention. Nucleic acid molecules can also be isolated from cultured cells, such as a primary cell culture or a cell line. The cells or tissues from which template nucleic acids are obtained can be infected with a virus or other intracellular pathogen. In addition, nucleic acids can be obtained from non-cellular or non-tissue samples, such as viral samples, or environmental samples.
[99] In certain embodiments, the nucleic acid molecules are bound to other target molecules such as proteins, enzymes, substrates, antibodies, binding agents, beads, small molecules, peptides, or any other molecule and serve as a surrogate for quantifying and/or detecting the target molecule. Nucleic acid molecules may be single-stranded, double-stranded, or doublestranded with single-stranded regions (for example, stem- and loop-structures). Proteins or portions of proteins (amino acid polymers) that can bind to high affinity binding moieties, such as antibodies or aptamers, are target molecules for oligonucleotide labeling, for example, in droplets.
[100] In aspects of the invention, molecules may be compartmentalized. A compartment may be an aqueous droplet in a water-in-oil emulsion. Said droplets may be formed using microfluidic devices according to known techniques in the art. Other methods for generating droplets as described herein may be used as appropriate, including, but not limited to, high speed vortex, ultrasonic waves, extrusion, filtering, microsieve chips, or the like. Individual oligonucleotides may be loaded into separate droplets according to known methods in the art.
[101] Nucleic acid templates can be obtained directly from an organism or from a biological sample obtained from an organism, e.g., from blood, urine, cerebrospinal fluid, seminal fluid, saliva, sputum, stool and tissue. In a particular embodiment, a nucleic acid is obtained from fresh
frozen plasma (FFP). In a particular embodiment, a nucleic acid is obtained from formalin-fixed, paraffin-embedded (FFPE) tissues. Any tissue or body fluid specimen may be used as a source for a nucleic acid for use in the invention. Nucleic acid templates can also be isolated from cultured cells, such as a primary cell culture or a cell line. The cells or tissues from which template nucleic acids are obtained can be infected with a virus or other intracellular pathogen. A sample can also be total RNA extracted from a biological specimen, a cDNA library, viral, or genomic DNA.
[102] A biological sample may be homogenized or fractionated in the presence of a detergent or surfactant. Cells within a sample may also be lysed by a lysing agent to release nucleic acid molecules and cell components. Lysis or homogenization solutions may further contain other agents, such as reducing agents or lytic enzymes. Examples of such reducing agents include dithiothreitol (DTT), beta.-mercaptoethanol, DTE, GSH, cysteine, cysteamine, tricarboxyethyl phosphine (TCEP), or salts of sulfurous acid. Once obtained, the nucleic acid is denatured by any method known in the art to produce single stranded nucleic acid templates and a pair of first and second oligonucleotides is hybridized to the single stranded nucleic acid template such that the first and second oligonucleotides flank a target region on the template.
Amplification and sequencing
[103] Aspects of the invention further comprise amplifying nucleic acid molecules. Amplification refers to production of additional copies of a nucleic acid sequence and is generally carried out using polymerase chain reaction (PCR) or isothermal amplification. Amplification aids in and may also be used to detect nucleic acid molecules. Examples of amplification reactions include, but are not limited to, quantitative PCR (qPCR), quantitative fluorescent PCR (QF-PCR), multiplex fluorescent PCR (MF-PCR), digital droplet PCR (ddPCR) single cell PCR, PCR-RFLP/real time-PCR-RFLP, hot start PCR, nested PCR, in situ polony PCR, in situ rolling circle amplification (RCA), bridge PCR, picotiter PCR, emulsion PCR and reverse transcriptase PCR (RT-PCR). Other suitable amplification methods include the ligase chain reaction (LCR), transcription amplification, self-sustained sequence replication, selective amplification of target polynucleotide sequences, consensus sequence primed polymerase chain reaction (CP-PCR), arbitrarily primed polymerase chain reaction (AP-PCR), degenerate
oligonucleotide-primed PCR (DOP- PCR) and nucleic acid based sequence amplification (NAB SA).
[104] Aspects of the invention further provide for sequencing of gene variants in the nucleic acid library. Sequencing nucleic acid molecules may be performed by methods known in the art. For example, see, generally, Quail, et al., 2012, A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers, BMC Genomics 13:341. Nucleic acid molecule sequencing techniques include classic di deoxy sequencing reactions (Sanger method) using labeled terminators or primers and gel separation in slab or capillary, or next generation sequencing methods. For example, sequencing may be performed according to technologies described in U.S. Pub. 2011/0009278, U.S. Pub.
2007/0114362, U.S. Pub. 2006/0024681, U.S. Pub. 2006/0292611, U.S. Pat. 7,960,120, U.S. Pat. 7,835,871, U.S. Pat. 7,232,656, U.S. Pat. 7,598,035, U.S. Pat. 6,306,597, U.S. Pat. 6,210,891, U.S. Pat. 6,828,100, U.S. Pat. 6,833,246, and U.S. Pat. 6,911,345, each incorporated by reference.
EXPERIMENTAL EXAMPLES
Paired gene capture
[105] Paired target capture using adapters of the invention, referred to as CLOVER adapters, was conducted. Gel electrophoresis was conducted to analyze capture of two genes, gene 1 and gene 2. FIG. 3 is an image of gel analysis of the two genes. As shown, gene 1 and gene 2 are each captured in the multiplexed library using linked adapters of the invention. FIG. 4 is an image of gel analysis of two genes using forward and reverse primers. As shown, the forward primer (read 1) captured gene 1 located on leaf 1 (arm 1) of the linked adapter and the reverse primer (read 2) captured gene 2 located on leaf 2 (arm 2) of the linked adapter.
[106] FIG. 5 is a plot of sequence reads from the two genes. FIG. 6 is a table of sequence reads from the two genes. As shown, paired reads were captured by CLOVER adapters throughout the plot, capturing both gene 1 and gene 2 in 84.05% of reads, with only 8.22% of reads unmapped.
Full length TCR capture
[107] TCR capture using adapters of the invention was conducted on TCRs identified in FIG.
11. Each of the TCRa and TCRp were analyzed as being able to provide a read from separate “leafs” from the CLOVER adapter. FIG. 8 is a table of sample reads from TCRs using CLOVER adapters. From 152,336 reads from the sample, 142,372 reads were aligned to a TCRa bound to leaf (arm) 1 of a linked adapter. 100,767 reads were aligned to the TCR|3 , leaf (arm) 2 of the linked adapter.
[108] FIG. 9 is a graph of the identification of T-cell receptor P (TRBV) variable regions from a TCR. CLOVER adapters identified TRBV2 and TRBV19 as predominant in the sample. FIG. 10 is a graph of the identification of T-cell receptor a (TRAV) variable regions gene usage. CLOVER adapters identified TRAV13-2 as predominant in the sample. FIG. 12 is a heatmap of TCR a and P variable regions detected using adapters of the invention. The linked adapters enabled paired sequencing of full length TCR a and P chains.
[109] The linked adapters of the invention were further utilized to analyze cell line TCRs identified in the table below:
[110] Jurkat and Hut78 cells were prepared to have the TCR gene usage identified.
FIG. 16 is a heatmap of TCR a and P variable regions detected using adapters of the invention. The linked adapters of the invention were utilized to analyze cell line TCRs. The linked adapters of the invention enabled paired sequencing of full length TCR a and P chains, with the heat map notably showing identification of TRBV12-3 and TRBV12-4 with TRAV8-4 and TRBV13 with TRAV8-6. Overall, chain pairing was identified as shown below:
Chain paring:
Jurkat-a — Jurkat-b : 88.1%
Hut78-a — Hut78-b: 8.9%
Mispairing (error)
Jurkat-a — Hut78-b: 1.1%
Hut78-a — Jurkat-b : 1.4%
Antibody tagged TCRafi capture
[111] A multi-leaf (arm) adapter was synthesis for TCR capture and sequencing using 3-leaf antibody tagged adapters. The first leaf in the linked adapter captures a TCRa, the second leaf captures a TCRp, and a third leaf provides an antibody (Ab) tag.
[112] FIG. 15 is an image of a gel analysis of a 3-leaf clover library. As shown, the adapters three-leaf CLOVER adapters allow for TCRaP+AbTag capture in bulk.
Non-limiting example combinations
[113] Features described above as well as those claimed below may be combined in various ways without departing from the scope thereof. The following examples illustrate some possible, non-limiting combinations:
[114] (Al) A method for preparing a molecular library, the method comprising: preparing a first mixture by segregating together at least two nucleic acid molecules comprising a first nucleic acid molecule and a second nucleic acid molecule from a single source. The method comprising adding to the first mixture a plurality of linked adapters, each linked adapter of the plurality of linked adapters. The linked adapter comprises a first primer and a second primer that are physically linked. The method comprises associating the first nucleic acid molecule from the single source to the first primer of a first linked adapter of the plurality of linked adapters and associating the second nucleic acid molecule from the single source to the second primer of the first linked adapter. The method comprising synthesizing a complementary strand to the first nucleic acid molecule and synthesizing a complementary strand to the second nucleic acid molecule without breaking the physical link of the linked adapters.
[115] (A2) For the method denoted as (Al), the step of preparing the first mixture comprises compartmentalizing the first mixture into a plurality of compartments.
[1 16] (A3) For the method denoted as any of (Al) through (A2), compartmentalizing the first mixture comprises compartmentalizing a single cell or a single bead from the first mixture.
[117] (A4) For the method denoted as any of (Al) through (A3), the first primer and the second primer of the plurality of linked adapters are physically linked by a covalent linkage.
[118] (A5) For the method denoted as any of (Al) through (A3), the first primer and the second primer of the plurality of linked adapters are physically linked by a non-covalent linkage.
[119] (A6) For the method denoted as any of (Al) through (A3), the first primer and the second primer of the linked adapters are physically linked by ligation, click chemistry, an amplification reaction, or a oligonucleotide chemical synthesis.
[120] (A7) For the method denoted as any of (Al) through (A6), breaking and pooling compartments formed in the preparing step into a single container.
[121] (A8) For the method denoted as any of (Al) through (A7), amplifying the first nucleic acid molecule and the second nucleic acid molecule after the associating step.
[122] (A9) For the method of denoted as any of (Al) through (A8), the at least two nucleic acid molecules encode a cell receptor protein.
[123] (A10) For the method denoted as any of (Al) through (A9), the nucleic acid molecules encode a B-cell receptor or T-cell receptors.
[124] (Al 1) For the method denoted as any of (Al) through (A10), the associating step comprises annealing the first nucleic acid molecule and the second nucleic acid molecule to the first primer and the second primer, respectively.
[125] (A12) For the method denoted as any of (Al) through (Al 1), the at least two nucleic acid molecules are each tagged with a same cell specific barcode.
[126] (A13) For the method of denoted as any of (Al) through (A12), comprising the step of generating sequence reads for the first nucleic acid molecule and the second nucleic acid molecule.
[127] (A14) For the method of denoted as any of (Al) through (A13), the step of synthesizing complementary strands comprises seeding the first nucleic acid molecule and the second nucleic
acid molecule on the same cluster on a flow cell without breaking the physical link of each of the linked adapters.
[128] (A15) For the method of denoted as (A14), seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on the flow cell comprises coupling each of the first nucleic acid molecule and the second nucleic acid molecule to a substrate of the flow cell via primers that are physically bound to the substrate and coupled to the end of each of the first nucleic acid molecule and the second nucleic acid molecules by an end that is not linked by the physical link of the linked adapter.
[129] (Bl) A modified substrate comprising a plurality of discrete locations, each location comprising two nucleic acid molecules, each of the two nucleic acid molecules coupled at one end to the modified substrate via a primer that is physically bound to the modified substrate and coupled to each other at another end by a physical linking. The modified substrate is part of a flow cell, and the flow cell is configured to allow for the synthesis of complementary strands to each nucleic acid molecule without breaking the physical linking between the nucleic acid molecules.
[130] (B2) For the modified substate denoted as (Bl), the two nucleic acid molecules are physically linked by covalent cross-linking.
[131] (B3) For the modified substrate denoted as (Bl), the nucleic acid molecules are physically linked by a non-covalent linkage.
[132] (B4) For the modified substrate denoted as (Bl), the two nucleic acid molecules are physically linked by ligation, click chemistry, an amplification reaction, or an oligonucleotide chemical synthesis.
[133] (B5) For the modified substrate of denoted as any of (Bl) through (B4), the two nucleic acid molecules are each tagged with a same cell specific barcode.
INCORPORATION BY REFERENCE
[134] References and citations to other documents, such as patents, patent applications, patent publications, journals, books, papers, web contents, have been made throughout this disclosure. All such documents are hereby incorporated herein by reference in their entirety for all purposes.
EQUIVALENTS
[135] Various modifications of the invention and many further embodiments thereof, in addition to those shown and described herein, will become apparent to those skilled in the art from the full contents of this document, including references to the scientific and patent literature cited herein. The subject matter herein contains important information, exemplification and guidance that can be adapted to the practice of this invention in its various embodiments and equivalents thereof.
Claims
1. A method for preparing a molecular library, the method comprising: preparing a first mixture by segregating together at least two nucleic acid molecules comprising a first nucleic acid molecule and a second nucleic acid molecule from a single source; adding to the first mixture a plurality of linked adapters, each linked adapter of the plurality of linked adapters comprising: a first primer and a second primer that are physically linked; associating the first nucleic acid molecule from the single source to the first primer of a first linked adapter of the plurality of linked adapters and associating the second nucleic acid molecule from the single source to the second primer of the first linked adapter; synthesizing a complementary strand to the first nucleic acid molecule and synthesizing a complementary strand to the second nucleic acid molecule without breaking the physical link of the linked adapters.
2. The method of claim 1, wherein the step of preparing the first mixture comprises compartmentalizing the first mixture into a plurality of compartments.
3. The method of claim 2, wherein the step of compartmentalizing the first mixture comprises compartmentalizing a single cell or a single bead from the first mixture.
4. The method of claim 1, wherein the first primer and the second primer of the plurality of linked adapters are physically linked by a covalent linkage.
5. The method of claim 1, wherein the first primer and the second primer of the plurality of linked adapters are physically linked by a non-covalent linkage.
6. The method of claim 1, wherein the first primer and the second primer of the plurality of linked adapters are physically linked by ligation, click chemistry, an amplification reaction, or a oligonucleotide chemical synthesis.
7. The method of claim 2, further comprising the step of breaking and pooling compartments formed in the preparing step into a single container.
8. The method of claim 7, further comprising the step of amplifying the first nucleic acid molecule and the second nucleic acid molecule after the associating step.
9. The method of claim 1, wherein the at least two nucleic acid molecules encode a cell receptor protein.
10. The method of claim 9, wherein the nucleic acid molecules encode a B-cell receptor or a T- cell receptor.
11. The method of claim 1, wherein the associating step comprises annealing the first nucleic acid molecule and the second nucleic acid molecule to the first primer and the second primer, respectively.
12. The method of claim 1, wherein the at least two nucleic acid molecules are each tagged with a same cell specific barcode.
13. The method of claim 1, further comprising the step of generating sequence reads for the first nucleic acid molecule and the second nucleic acid molecule.
14. The method of claim 1, wherein the step of synthesizing complementary strands comprises seeding the first nucleic acid molecule and the second nucleic acid molecule on the same cluster on a flow cell without breaking the physical link of each of the linked adapters.
15. The method of claim 14, wherein seeding the first nucleic acid molecule and second nucleic acid molecule on the same cluster on the flow cell comprises: coupling each of the first nucleic acid molecule and the second nucleic acid molecule to a substrate of the flow cell via primers, wherein the primers are physically bound to the substrate, and wherein the primers are coupled to the end of each of the first nucleic acid molecule and the second nucleic acid molecules by an end of the first nucleic acid molecule and the second nucleic acid molecule that is not linked by the physical link of the linked adapter.
16. A modified substrate comprising a plurality of discrete locations, each location comprising two nucleic acid molecules, each of the two nucleic acid molecules coupled at one end to the modified substrate via a primer, wherein the primer is physically bound to the modified substrate, and wherein the two nucleic acid molecules are coupled to each other at another end by a physical linking, wherein the modified substrate is part of a flow cell, and wherein the flow cell is configured to allow for the synthesis of complementary strands to each nucleic acid molecule without breaking the physical linking between the nucleic acid molecules.
17. The modified substate of claim 16, wherein the two nucleic acid molecules are physically linked by covalent cross-linking.
18. The modified substrate of claim 16, wherein the two nucleic acid molecules are physically linked by a non-covalent linkage.
19. The modified substrate of claim 16, wherein the two nucleic acid molecules are physically linked by ligation, click chemistry, an amplification reaction, or an oligonucleotide chemical synthesis.
20. The modified substrate of claim 16, wherein the two nucleic acid molecules are each tagged with a same cell specific barcode.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363526746P | 2023-07-14 | 2023-07-14 | |
| US63/526,746 | 2023-07-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025019319A1 true WO2025019319A1 (en) | 2025-01-23 |
Family
ID=92258701
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/037811 Pending WO2025019319A1 (en) | 2023-07-14 | 2024-07-12 | Preparation of molecular libraries by linked adapters |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025019319A1 (en) |
Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
| US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
| US6828100B1 (en) | 1999-01-22 | 2004-12-07 | Biotage Ab | Method of DNA sequencing |
| US6833246B2 (en) | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
| US6911345B2 (en) | 1999-06-28 | 2005-06-28 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences |
| US20060024681A1 (en) | 2003-10-31 | 2006-02-02 | Agencourt Bioscience Corporation | Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof |
| US20060292611A1 (en) | 2005-06-06 | 2006-12-28 | Jan Berka | Paired end sequencing |
| US20070114362A1 (en) | 2005-11-23 | 2007-05-24 | Illumina, Inc. | Confocal imaging methods and apparatus |
| US7232656B2 (en) | 1998-07-30 | 2007-06-19 | Solexa Ltd. | Arrayed biomolecules and their use in sequencing |
| US7598035B2 (en) | 1998-02-23 | 2009-10-06 | Solexa, Inc. | Method and compositions for ordering restriction fragments |
| US7835871B2 (en) | 2007-01-26 | 2010-11-16 | Illumina, Inc. | Nucleic acid sequencing system and method |
| US7960120B2 (en) | 2006-10-06 | 2011-06-14 | Illumina Cambridge Ltd. | Method for pair-wise sequencing a plurality of double stranded target polynucleotides |
| US20160122814A1 (en) * | 2014-11-04 | 2016-05-05 | Boreal Genomics, Inc. | Methods of sequencing with linked fragments |
| WO2017168329A1 (en) * | 2016-03-28 | 2017-10-05 | Boreal Genomics, Inc. | Droplet-based linked-fragment sequencing |
| WO2018200884A1 (en) * | 2017-04-26 | 2018-11-01 | The Broad Institute, Inc. | Methods for linking polynucleotides |
| WO2020141464A1 (en) * | 2019-01-03 | 2020-07-09 | Boreal Genomics, Inc. | Linked target capture |
-
2024
- 2024-07-12 WO PCT/US2024/037811 patent/WO2025019319A1/en active Pending
Patent Citations (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
| US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
| US7598035B2 (en) | 1998-02-23 | 2009-10-06 | Solexa, Inc. | Method and compositions for ordering restriction fragments |
| US7232656B2 (en) | 1998-07-30 | 2007-06-19 | Solexa Ltd. | Arrayed biomolecules and their use in sequencing |
| US6828100B1 (en) | 1999-01-22 | 2004-12-07 | Biotage Ab | Method of DNA sequencing |
| US6911345B2 (en) | 1999-06-28 | 2005-06-28 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences |
| US6833246B2 (en) | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
| US20060024681A1 (en) | 2003-10-31 | 2006-02-02 | Agencourt Bioscience Corporation | Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof |
| US20060292611A1 (en) | 2005-06-06 | 2006-12-28 | Jan Berka | Paired end sequencing |
| US20070114362A1 (en) | 2005-11-23 | 2007-05-24 | Illumina, Inc. | Confocal imaging methods and apparatus |
| US7960120B2 (en) | 2006-10-06 | 2011-06-14 | Illumina Cambridge Ltd. | Method for pair-wise sequencing a plurality of double stranded target polynucleotides |
| US7835871B2 (en) | 2007-01-26 | 2010-11-16 | Illumina, Inc. | Nucleic acid sequencing system and method |
| US20110009278A1 (en) | 2007-01-26 | 2011-01-13 | Illumina, Inc. | Nucleic acid sequencing system and method |
| US20160122814A1 (en) * | 2014-11-04 | 2016-05-05 | Boreal Genomics, Inc. | Methods of sequencing with linked fragments |
| WO2017168329A1 (en) * | 2016-03-28 | 2017-10-05 | Boreal Genomics, Inc. | Droplet-based linked-fragment sequencing |
| WO2018200884A1 (en) * | 2017-04-26 | 2018-11-01 | The Broad Institute, Inc. | Methods for linking polynucleotides |
| US10982278B2 (en) | 2017-04-26 | 2021-04-20 | The Broad Institute, Inc. | Methods for linking polynucleotides |
| WO2020141464A1 (en) * | 2019-01-03 | 2020-07-09 | Boreal Genomics, Inc. | Linked target capture |
Non-Patent Citations (8)
| Title |
|---|
| BOREN ET AL., J AM CHEM SOC, vol. 130, 2008, pages 8923 - 8930 |
| BRENT C. SATTERFIELD: "Cooperative Primers", THE JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 16, no. 2, 1 March 2014 (2014-03-01), pages 163 - 173, XP055417313, ISSN: 1525-1578, DOI: 10.1016/j.jmoldx.2013.10.004 * |
| FU ET AL., CELL, vol. 185, 2022, pages 4621 - 4633 |
| HIMO ET AL., J AM CHEM SOC, vol. 127, 2005, pages 210 - 216 |
| KANWAR ET AL., SCIENTIFIC REPORTS, vol. 11, 2021, pages 18065 |
| QUAIL ET AL.: "A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers", BMC GENOMICS, vol. 13, 2012, pages 341, XP021132475, DOI: 10.1186/1471-2164-13-341 |
| ROSTOVTSEV ET AL., ANGEW CHEM, vol. 41, 2002, pages 2596 - 2599 |
| TURCHANINOVA ET AL., EUR J IMMUNOL, vol. 43, 2013, pages 2507 - 15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250207176A1 (en) | Reagents and Methods for Molecular Barcoding of Nucleic Acids of Single Cells | |
| US11739443B2 (en) | Profiling of highly expressed and lowly expressed proteins | |
| US20240360438A1 (en) | Rna preservation and recovery from fixed cells | |
| JP4436039B2 (en) | Method and apparatus for multiplexing amplification reactions | |
| US20240287591A1 (en) | Use of decoy polynucleotides in single cell multiomics | |
| EP4471161A2 (en) | Nuclei barcoding and capture in single cells | |
| AU2018371063B2 (en) | Methods and kits for amplification of double stranded DNA | |
| US20240254538A1 (en) | Particles associated with oligonucleotides | |
| EP3615683B1 (en) | Methods for linking polynucleotides | |
| WO2022117625A1 (en) | Whole transcriptome analysis in single cells | |
| US10655162B1 (en) | Identification of biomolecular interactions | |
| NL2022043B1 (en) | Tagmentation-Associated Multiplex PCR Enrichment Sequencing | |
| CN116949132A (en) | A method for constructing single-cell sequencing libraries | |
| WO2025019319A1 (en) | Preparation of molecular libraries by linked adapters | |
| US20250230494A1 (en) | Methods of single cell rna-sequencing | |
| US20250305047A1 (en) | Single cell co-sequencing of dna methylation and rna | |
| WO2025193609A1 (en) | Methods and compositions for cost-effective assessment of nucleic acid transcripts and isoforms in cells | |
| WO2024250155A1 (en) | Method for constructing single cell sequencing library | |
| Hara et al. | Small sample whole-genome amplification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24752206 Country of ref document: EP Kind code of ref document: A1 |