WO2025085821A1 - Methods, systems, and compositions for cell storage and analysis - Google Patents
Methods, systems, and compositions for cell storage and analysis Download PDFInfo
- Publication number
- WO2025085821A1 WO2025085821A1 PCT/US2024/052079 US2024052079W WO2025085821A1 WO 2025085821 A1 WO2025085821 A1 WO 2025085821A1 US 2024052079 W US2024052079 W US 2024052079W WO 2025085821 A1 WO2025085821 A1 WO 2025085821A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- composition
- instances
- cell
- cells
- genomic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- A—HUMAN NECESSITIES
- A01—AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
- A01N—PRESERVATION OF BODIES OF HUMANS OR ANIMALS OR PLANTS OR PARTS THEREOF; BIOCIDES, e.g. AS DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; PEST REPELLANTS OR ATTRACTANTS; PLANT GROWTH REGULATORS
- A01N1/00—Preservation of bodies of humans or animals, or parts thereof
- A01N1/10—Preservation of living parts
- A01N1/12—Chemical aspects of preservation
- A01N1/122—Preservation or perfusion media
- A01N1/126—Physiologically active agents, e.g. antioxidants or nutrients
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2500/00—Specific components of cell culture medium
- C12N2500/05—Inorganic components
- C12N2500/10—Metals; Metal chelators
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2500/00—Specific components of cell culture medium
- C12N2500/05—Inorganic components
- C12N2500/10—Metals; Metal chelators
- C12N2500/12—Light metals, i.e. alkali, alkaline earth, Be, Al, Mg
- C12N2500/14—Calcium; Ca chelators; Calcitonin
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2500/00—Specific components of cell culture medium
- C12N2500/05—Inorganic components
- C12N2500/10—Metals; Metal chelators
- C12N2500/12—Light metals, i.e. alkali, alkaline earth, Be, Al, Mg
- C12N2500/16—Magnesium; Mg chelators
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2500/00—Specific components of cell culture medium
- C12N2500/30—Organic components
- C12N2500/40—Nucleotides, nucleosides or bases
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2500/00—Specific components of cell culture medium
- C12N2500/60—Buffer, e.g. pH regulation, osmotic pressure
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2501/00—Active agents used in cell culture processes, e.g. differentation
- C12N2501/70—Enzymes
- C12N2501/73—Hydrolases (EC 3.)
- C12N2501/734—Proteases (EC 3.4.)
Definitions
- composition comprising: (a) a salt; (b) an ionic surfactant at a concentration from about 0.00001 to about 1 volume percent; (c) a protease at a concentration from about 0.01 units to about 5 units per milliliter; (d) a reducing agent; and (e) a chelator.
- the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof.
- the protease is thermolabile. A thermolabile protease can be neutralized by heat, in some cases prior to whole genome amplification or sequencing.
- the protease comprises or is Proteinase K.
- the Proteinase K is thermolabile.
- the salt comprises or is Tris-HCl.
- the reducing agent comprises or is dithiothreitol (DTT).
- the chelator comprises or is ethylenediaminetetraacetic acid (EDTA).
- the composition is used for performing at least one of: storing the cell or genomic materials of the cell, and preparing the genomic materials of the cell for amplification. The genomic materials of the cells may be amplified in the downstream. In some embodiments, sequencing may be performed subsequently. In some embodiments, the composition is used for storing the cell or genomic materials of the cell and preparing the genomic materials of the cell for amplification. Amplification and sequencing may follow.
- storing comprises storing for at least about 1 hour, 4 hours, 8 hours, 12 hours, 24 hours, at least about 48 hours, at least about 72 hours or longer. In some cases, storage may be performed for at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 11 days, 12 days, 13 days, 14 days, 15 days, 16 days, 17 days, 18 days, 19 days, 20 days, or longer.
- the storage temperature is from about -20 to about 30 degrees Celsius (C).
- the sample is shipped during storage. In some embodiments, the temperature changes or fluctuates during storage.
- the sample and/or components thereof remain substantially stable during storage despite shipment or changes or fluctuations in storage conditions such as temperature, humidity, and pressure.
- the composition is configured for storing a sample comprising one or more cells or constituents of one or more cells.
- the one or more cells comprise at most about 20, at most about 15, or at most about 10 cells.
- the one or more cells is a single cell.
- the one or more cells comprise live cells.
- the one or more cells comprise fixed cells.
- the composition further comprises a second salt different from the salt, wherein the second salt stimulates the activity of the protease, the ionic surfactant, or both.
- the composition may retain the stability of the nucleic acid molecules of the one or more cells.
- the composition may lyse the one or more cells and extract genomics contents of the one or more cells.
- the genomic contents of the one or more cells may remain stable in the composition during storage.
- the protease comprises Proteinase K, Proteocut K, Nargarse, optionally wherein the protease is thermolabile.
- the salt comprises Tris-HCl, HEPES, or TES.
- the composition further comprises a salt selected from CaCh, MgCh, KC1, MgSC , K2SO4, orNaHCCE.
- the reducing agent comprises dithiothreitol (DTT), tris(2-carboxyethyl)phosphine, (TCEP), or P-mercapto ethanol.
- the chelator comprises ethylenediaminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA).
- the salt has a concentration of 10-100 mM; the ionic surfactant has a concentration volume percent of 0.001 to 0.1; the protease has a concentration of about 0.01 units to about 5 units per milliliter; the reducing agent has a concentration of 0.1-10 mM; and the chelator has a concentration of 0.1 to 10 mM. In some embodiments, the ionic surfactant has a concentration volume percent of 0.01 to 0.5.
- kits comprising the composition of any one of the preceding embodiments.
- the kit comprises a second composition different from the composition.
- the second composition comprises a neutralizing buffer comprising a component that substantially neutralizes the ionic surfactant.
- the composition may be referred to as storage buffer, and the second composition may be referred to as neutralizing buffer.
- the kit further comprises instructions for storing a cell and preparing the genomic contents of the cell for amplification, using the composition and the second composition, wherein the composition is used to store the cell and the second composition is used to neutralize the composition, wherein neutralizing the composition improves the results of a downstream amplification performed on the cell.
- the second composition comprises a non-ionic surfactant, zwitterionic surfactant, charge-neutral surfactant, or any combination thereof.
- the non-ionic surfactant may comprise a polysorbate, a poly(ethylene glycol) derivative, or both.
- the non-ionic surfactant comprises Tween (e.g., Tween-20, Tween-40, Tween-60, or Tween-80) or Triton (e.g., Triton-X-100).
- a method of cell analysis comprising one or more steps of: (a) providing or obtaining a sample comprising one or more cells stored in a composition comprising one or more components of: (i) a salt; (ii) an ionic surfactant; (iii) a protease; (iv) a reducing agent; and (v) a chelator.
- the method may further comprise (b) amplifying genomic materials of the one or more cells; and (c) performing genomic analysis on the genomic materials of the one or more cells.
- a method of cell analysis comprising: (a) providing or obtaining a sample comprising one or more cells stored in a composition comprising: (i) a salt; (ii) an ionic surfactant; (iii) a protease; (iv) a reducing agent; and (v) a chelator.
- the method may further comprise (b) amplifying genomic materials of the one or more cells; and (c) performing genomic analysis on the genomic materials of the one or more cells.
- the one or more cells comprises from 1 to about 10 cells.
- the sample has a single cell.
- the sample comprises at most about 1000, at most about 800, at most about 600, at most about 400, at most about 200, at most about 100, at most about 80, at most about 60, at most about 40, at most about 20, at most about 10, at most about 8, at most about 6, at most about 4, a smaller number of cells, or a single cell.
- the method does not require column filtration and is amenable to retrieval of the genomic materials of the one or more cells from the sample.
- the sample loss caused by the methods and compositions may be substantially minimal. In some cases, certain washing protocols, or multiple washing steps may lead to sample loss or cell loss from samples. Samples containing small numbers of cells or single cells may be sensitive to filtration and/or washing steps.
- the one or more cells is a single cell, and genomic analysis is single cell genomics or multiomics, wherein the composition is configured for storing the single cell and preparing the genomic contents of the single cell for amplification.
- the method further comprises sorting the cells before storing the sample in the composition.
- the method further comprises neutralizing the composition with a second composition.
- the second composition comprises a neutralizing buffer capable of neutralizing the ionic surfactant.
- the composition may be referred to as storage buffer and the second composition may be referred to as neutralizing buffer.
- non-ionic surfactant may comprise a polysorbate, a poly(ethylene glycol) derivative, or both.
- the neutralizing buffer comprises Tween (e.g., Tween-20, Tween-40, Tween-60, or Tween-80) or Triton (e.g., Triton-X-100).
- the ionic surfactant comprises from about 0.00001 to about 1 volume percent of the composition. In some embodiments, the ionic surfactant comprises from about 0.03 to about 0.09 volume percent of the composition. In some embodiments, the protease comprises from about 0.01 units to about 5 units per milliliter of the composition.
- the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof.
- the protease comprises or is Proteinase K.
- the protease is thermolabile. A thermolabile protease will allow for neutralizing or halting the activity of the protease (e.g., proteinase K) when intended.
- the salt comprises or is Tris-HCl.
- the chelator comprises or is ethylenediaminetetraacetic acid (EDTA).
- the composition further comprises a second salt different from the salt, wherein the second salt is capable of stimulating the activity of the protease, the ionic surfactant, or both.
- the second salt comprises or is a divalent cation.
- the second salt comprises or is calcium chloride.
- the method further comprises lysing the one or more cells and extracting the genomic materials thereof. Genomic materials may comprise the whole genomes of the one or more cells.
- genomic materials comprise nucleic acid molecules.
- nucleic acid molecules comprise ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or both.
- the method may further comprise performing genomic analysis on the one or more cells.
- Genomic analysis may comprise or be single cell genomics or multiomics.
- Genomic analysis may comprise amplification.
- Amplification may comprise or be isothermal amplification.
- genomic analysis comprises primary template amplification (PTA).
- the methods, systems, and compositions for cell storage are compatible with single cell samples or samples with small cell numbers, long term storage and shipment, fluctuation or changes in the storage conditions such as temperature, pressure, humidity, and other conditions, amplification (e.g., isothermal amplification) of genomic materials of the cells upon or after storage, and any combination thereof, such as integrated workflows comprising any combination or all of the aforementioned procedures.
- the protease comprises Proteinase K, Proteocut K, Nargarse, optionally wherein the protease is thermolabile.
- the salt comprises Tris-HCl, HEPES, or TES.
- the composition further comprises a salt selected from CaCh, MgCh, KC1, MgSCU, K2SO4, or NaHCCE.
- the reducing agent comprises dithiothreitol (DTT), tris(2- carboxyethyl)phosphine, (TCEP), or P-mercapto ethanol.
- the chelator comprises ethylenedi aminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA).
- the salt has a concentration of 10- 100 mM; the ionic surfactant has a concentration volume percent of 0.001 to 0.1; the protease has a concentration of about 0.01 units to about 5 units per milliliter; the reducing agent has a concentration of 0.1-10 mM; and the chelator has a concentration of 0.1 to 10 mM. In some embodiments, the ionic surfactant has a concentration volume percent of 0.01 to 0.5.
- sequencing metrics comprising: providing a single cell stored in a buffer for a time period of at least 1 day; lysing the single cell; amplifying mRNA transcripts and genomic DNA from the cell to generate cDNA and genomic DNA libraries, respectively; and sequencing mRNA transcripts and genomic DNA from the cell to obtain one or more sequencing metrics, wherein one or more of: the yield of pre-amplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of preamplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day.
- sequencing metrics comprises one or more Picard Metrics.
- sequencing metrics comprises one or more of PreSeq, protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera, and Gini index.
- the single cell is an embryonic cell. In some instances, the single cell is a human embryonic cell. In some instances, providing comprises shipping or storing. In some instances, wherein the time period is at least 5 days. In some instances, the time period is at least 10 days. In some instances, the time period is 3-15 days. In some instances, the single cell was stored at a storage temperature. In some instances, the storage temperature is from about -20 to about 30 degrees Celsius (°C), optionally with a variation of from about 0 to about 20%.
- the storage temperature changes or fluctuates.
- the yield of preamplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day
- the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day
- the sequencing metrics comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day.
- the yield of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day.
- the single cell is suspended in a composition provided herein.
- nucleic acid sample comprises ctDNA.
- nucleic acid sample comprises cfDNA.
- composition is present at 10X concentration.
- a total yield of amplicons is within 5, 10, 15, 20, 25, 30, 40, 50, or within 60 % compared to amplification without the composition.
- the non-ionic surfactant comprises Tween or Triton.
- compositions as described in Tables 1-8 or FIG. 5A are compositions as described in Tables 1-8 or FIG. 5A.
- FIG. 1 presents amplification multicomponent plots for samples stored in storage buffer compositions (the composition) of the present disclosure with varying concentrations of an ionic surfactant; the x-axis is labeled cycle from 0 to 24 at 2 unit intervals, and the y-axis is labeled fluorescence from -10,000 to 50,000 at 5000 unit intervals;
- FIG. 2 presents amplification multicomponent plots for samples neutralized with different compositions of neutralizing buffer (the second composition); the x-axis is labeled cycle from 0 to 24 at 2 unit intervals, and the y-axis is labeled fluorescence from -10,000 to 50,000 at 5000 unit intervals;
- FIG. 3 presents broad first pass experiment pre-amplification quality control data in the form of electrophoresis gels for samples stored in different buffer compositions and neutralized with different neutralizing buffers;
- FIG. 4A presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of an ionic surfactant and different compositions of a neutralizing buffer on which reverse transcription or preamplification has been performed; the x-axis is labeled cycle from 0 to 24 at 2 unit intervals, and the y-axis is labeled fluorescence from 0 to 600,000 at 50,000 unit intervals;
- FIG. 4B presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of an ionic surfactant and different compositions of a neutralizing buffer which have been processed using V2 DNA Amplification Kit from Illumina; the x-axis is labeled cycle from 0 to 70 at 10 unit intervals, and the y-axis is labeled fluorescence from 0 to 700,000 at 50,000 unit intervals;
- FIG. 5A presents reverse transcription pre-amplification of RNA in samples stored in different storage buffer compositions and reverse transcribed with different reverse transcription buffers;
- the bar graph shows results (left to right for each of day 1, 2, and 3): SB4 5xRwa, SB4 5xRxa, SB5 5xRwa, and SB5 5xRxa;
- the x-axis is labeled storage condition (days 1, 2, 3) and the y-axis is labeled ng/microliter from 0 to 80 at 20 unit intervals;
- FIG. 5B presents data on primary template amplification (PTA) of DNA in samples stored in different storage buffer compositions and amplified using different transcription buffers;
- the bar graph shows results (left to right for each of day 1, 2, and 3): SB4 5xRwa, SB4 5xRxa, SB 5 5xRwa, and SB 5 5xRxa; each grouping of 3 bars on the left of each graph represent results for buffer SB4 5xRwa, and each grouping of 3 bars on the right of each graph represent results for buffer SB5 5xRwa;
- the x-axis is labeled storage condition (days 1, 2, 3) and the y-axis is labeled ng/microliter from 0 to 25 at 5 unit intervals;
- FIG. 6 presents multiomics (DNA and mRNA) metrics quantified for samples stored in different buffer compositions for predetermined durations of time; upper left: proportion of exonic sequences (y-axis labeled from 0.0 to 1.0 at 0.2 unit intervals; upper right: Gini coefficient (y-axis labeled from 0.00 to 0.05 at 0.01 unit intervals; lower left: protein coding transcripts (y-axis labeled protein coding transcripts from 0 to 15,000 at 5,000 unit intervals); lower right: PreSeq counts (y-axis labeled counts from 0 to 6xl0 9 at 2xl0 9 unit intervals;
- FIG. 7 presents RNA sequencing results for samples stored in different storage buffers for varying durations of time and amplified with different transcription buffers; the proportion of genomic origins mapped to exonic, intronic, or intergenic regions are shown on the right graph;
- FIG. 8 presents pre-amplification yields obtained from samples stored under different conditions using the methods and compositions of the present disclosure; the x-axis represents different samples, and the y-axis is pre-amp yield (ng) from 0 to 140 at 20 unit intervals;
- FIG. 9 presents transcriptomics metrics quantified for samples stored in different buffer compositions for varying durations of time and amplified with different reverse transcription buffers; buffer 5xRa is shown in the left bar and 5xRwa is the right bar for each condition; upper left: y-axis is labeled protein coding transcripts from 0 to 10,000 at 5,000 unit intervals; upper middle: y-axis is labeled proportion exonic sequences from 0.0 to 1.0 at 0.2 unit intervals; upper right: y-axis is labeled ratio transcript body from 0.0 to 1.5 at 0.5 unit intervals; lower left: y- axis is labeled protein coding transcripts from 0 to 10,000 at 5,000 unit intervals; lower middle: y-axis is labeled proportion exonic sequences from 0.0 to 1.0 at 0.2 unit intervals; lower right: y- axis is labeled ratio transcript body from 0.0 to 1.5 at 0.5 unit intervals; data in the top row of graphs was obtained by operator 1 and data in the
- FIG. 10 presents reverse transcription pre-amplification sequencing data performed on samples stored in different buffer conditions for varying durations of time and amplified with different reverse transcription buffers; the proportion of genomic origins mapped to exonic, intronic, or intergenic regions are shown on the right graph;
- FIG. 11 presents a transcript coverage rainbow plot resulting from genome-wide amplification and RNA sequencing using a conventional cell buffer (“CB”) and a buffer presented herein (“SB4”) demonstrating more uniform transcript coverage upon using the buffer of the present disclosure (SB4); the x-axis is labeled 5’-3’ body percentile from 0 to 100 at 20 unit intervals, and the y-axis is labeled coverage from 0.0 to 1.0 at 0.2 unit intervals;
- CB cell buffer
- SB4 buffer presented herein
- FIG. 12 presents a heat map of results of expression analysis for different storage conditions; the legend shows expression levels from -1 (purple) to 1 (yellow);
- FIG. 13A presents PTA DNA yield (ng) for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis represents yield (ng) from 0 to 1000 at 200 unit intervals;
- FIG. 13B presents PreSeq counts for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis represents counts from 0 to 6xl0 9 at 2xl0 9 unit intervals;
- FIG. 13C presents preamp cDNA yields for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis yield (ng) from 0 to 500 at 100 unit intervals;
- FIG. 13D presents protein coding transcripts detected for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis yield (ng) from 0 to 8000 at 1000 unit intervals;
- FIG. 13E presents the proportion of exonic sequences detected for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis is proportion from 0.0 to 1.0 at 0.2 unit intervals; and [0032] FIG. 13F presents the ratio of transcript bodies detected for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis is labeled yield (ng) from 0.0 to 1.5 at 0.5 unit intervals.
- FIG. 14A depicts a graph showing changes in DNA total yield vs. various analyte, buffer, and lot combinations.
- the x-axis represents different samples and the y-axis represents DNA total yield (ng) from 0 to 2500 at 500 unit intervals (the dotted line represents 300 ng).
- Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
- FIG. 14B depicts a graph showing changes in cDNA total yield vs. various analyte, buffer, and lot combinations.
- the x-axis represents different samples and the y-axis represents cDNA total yield (ng) from 0 to 1500 at 500 unit intervals (the dotted line represents 150 ng).
- Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
- FIG. 14C depicts a graph showing changes PreSeq count vs. various analyte, buffer, and lot combinations.
- the x-axis represents different samples and the y-axis represents counts from 0 to 6xl0 9 at 2xl0 9 unit intervals (the dotted line represents 3.5xl0 9 ).
- Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
- FIG. 14D depicts a graph showing changes in the proportion of exonic reads vs. various analyte, buffer, and lot combinations.
- the x-axis represents different samples and the y-axis represents counts from 0.0 to 1.5 at 0.5 unit intervals (the dotted line represents 0.6).
- Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
- FIG. 14E depicts a graph showing changes in Protein Coding Genes vs. various analyte, buffer, and lot combinations.
- the x-axis represents different samples and the y-axis represents the number of protein coding genes from 0 to 8000 at 2000 unit intervals (the dotted line represents 2000 genes).
- Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
- FIG. 15 depicts a series of graphs showing recovery of genes in B-cell line or clinical samples for conditions using no buffer (cell line, NA), PBS, or SB (stability buffer).
- the genes tested were: GAPDH, ACTB, ACTG1, RPL36, HINT1, TBP, PPIA, HPRT1, UBC, RPL13A, PGK1, EIF3K, RPLP0, GUSB, CLTC, HMBS, EEF1A1, and MALATE
- FIG. 16A depicts a heatmap of the top five mRNA expression levels for clinical samples stored in standard PBS buffer vs. SB. Expression levels are shown from -1 (purple) to 0 (black) to 1 (yellow). Genes listed on the left side (top to bottom) are ATP2B1, RABL6, LGMN, CREG1, PTMS, MTND2P28, MT-ND4L, MT-ATP8, MTCO1P12, MTND1P23, DNMT3L, FTH1, MT-ATP8, RPL26, and H4C3. Rows 1-6 are labeled (top to bottom): sample media, percent.rb, percent.mt, nCount_RNA, nFeature_RNA, and run_tag2.
- FIG. 16B depicts a heatmap of the top ten mRNA expression levels for clinical samples stored in standard PBS buffer vs. SB. Expression levels are shown from -1 (purple) to 0 (black) to 1 (yellow). Genes listed on the left side (top to bottom) are ATP2B1, WAC, RABL6, LGMN, CREG1, EMC2, PTMS, PRMT2, GTF3C3, ZNF888, MT-ND1, MTND2P28, MT-CYB, MT- ND4L, MT-ND6, MT-ATP8, MT-ND5, MTCO1P12, MTDATP6P1, MTND1P23, RPS5, RPS27, DNMT3L, MTATP6P1, DNAJC15, FTH1, MT-ATP8, RPL26, H4C3, and NDUFAB1. Rows 1-6 are labeled (top to bottom): sample media, percent.rb, percent.mt, nCount RNA,
- FIG. 17 depicts a graph showing cfDNA amplification yields after PTA for samples stored in no stability buffer vs. stability buffer (SB).
- the x-axis depicts either the presence (+) or absence (-) of the non-ionic surfactant; the y-axis represents total yield (ng) from 0 to 1500 at 500 unit intervals.
- Columns 1 and 2 did not use stability buffer; columns 3 and 4 used 10X stability buffer.
- cell analysis methods and systems may comprise miniaturized and/or high-throughput cell analysis (e.g., single cell analysis).
- Such methods may comprise genomics, proteomics, and multiomics analysis which in some cases may be performed on samples comprising a relatively small number of cells, such as less than about 1000 cells, about 800 cell, about 600 cell, about 400 cells, about 200 cell, about 100 cell, about 80 cell, about 60 cells, about 40 cells, about 20 cells, about 10 cells or smaller number of cells, in some cases single cells.
- cells at a predetermined cell density and sample volume may be sorted prior to storage in the composition of the present disclosure. Sorting may comprise sorting the one or more cells (e.g., any cell number indicated above) in containers.
- the samples may comprise one or more cells and/or constituents of one or more cells such as genomic materials of cells.
- Genomic materials of cells may comprise cellular nucleic acid molecules, such as deoxyribonucleic acid (DNA) molecules, ribonucleic acid (RNA) molecules, or both.
- composition of the present disclosure may facilitate maintaining the stability of the genomic materials of cells during storage in the composition. Storage may comprise long-term storage and/or shipment.
- the composition may comprise or maintain cells that are fixed.
- the one or more cells may be fixed (e.g., chemically fixed using an alcohol, an aldehyde, another chemical, or any combination thereof).
- the one or more cells may be fixed to a substrate or a surface.
- the composition may further lyse the cells, extract, or preserve the genomic materials of the cells during storage.
- storage may comprise storage for at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, or longer.
- storage may comprise storage in a temperature of at least about -20 °C, -10 °C, 0 °C, 5 °C, 10 °C, 20 °C, 25 °C, or higher. In some cases, temperature might change or fluctuate.
- storage methods and compositions that reduce the temperature-sensitivity of the samples or make them more stable under a wide range of temperature and storage conditions can be highly beneficial, and can reduce the risk of compromising the samples or adversely affecting the results of cell analysis performed on the samples.
- storage may comprise storage in a temperature of -40 to -20, -30 to -20, -30 to -10, -30 to 0, -20 to 0, -15 to 0, -10 to 0, 0 to 5, 0 to 5, 0 to 10, 0 to 20, 5 to 10, 5 to 15, 5 to 20, or 10-30 degrees C.
- Samples containing such small numbers of cells or single cells may need to be handled with care and may need or benefit from cell handling, cell manipulation, or storage methods and compositions that can minimize sample loss and cell loss from samples during sample preparation, storage, potential shipment, and potential temperature variation, while maintaining the stability of the samples and components thereof such as genomics materials of cells.
- samples may comprise one or more cells (e.g., a small number of cells or single cells).
- the cells may be sorted in the samples prior to storage in the composition. Cell sorting may be done using any suitable cell sorting technique such as flow cytometry, fluorescence- activated cell sorting (FACS), or sorting using a miniaturized or microfluidic system/device.
- the composition may lyse the cells, extract, and preserve the genomic materials of the cells, after the cells are sorted in the composition.
- provided herein are methods, systems, and compositions that enable handling, storing, or optionally shipping samples comprising one or more cells (e.g., single cells) or constituents of one or more cells. Such methods, systems, or compositions can result in significantly improved stability of the samples and components thereof, such that once the sample is used for analysis upon or after storage and/or shipment, the results are found to be substantially uncompromised by storage and/or shipment.
- reagents and buffers for cell storage, and methods of use thereof such as to prepare, store, or preserve samples comprising a small number of cells or single cells. The samples may further be used for genomics, proteomics, or multiomics using a variety of techniques.
- Genomic analysis may comprise amplification, which may, in some cases, comprise isothermal amplification, or in some cases, comprise primary template amplification (PTA). Any combination of analyses may be performed on the samples stored and optionally shipped using the cell storage methods and compositions of the present disclosure.
- amplification which may, in some cases, comprise isothermal amplification, or in some cases, comprise primary template amplification (PTA).
- PTA primary template amplification
- nucleic acid molecules there is a need to keep nucleic acid molecules stable in the sample for long-term storage and potential shipment. Presence of enzymes that can degrade nucleic acid molecules such as DNA and RNA (e.g., DNAse and/or RNAse enzymes) may interfere with the stability of the samples.
- the buffers and compositions of the present disclosure may facilitate stabilizing nucleic acid molecules, in some cases by inhibiting the nucleic acid molecule degrading enzymes such as DNAse or RNAse.
- one function of the composition of the present disclosure may be RNAse inhibition, DNAse inhibition, or both. The composition and its contents are elaborated on in detail throughout the present disclosure.
- nucleic acid encompasses multi-stranded, as well as single-stranded molecules.
- nucleic acid strands need not be coextensive (i.e., a double- stranded nucleic acid need not be double-stranded along the entire length of both strands).
- Nucleic acid templates described herein may be any size depending on the sample (from small cell-free DNA fragments to entire genomes), including but not limited to 50-300 bases, 100-2000 bases, 100-750 bases, 170-500 bases, 100-5000 bases, 50-10,000 bases, or 50-2000 bases in length.
- templates are at least 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000 50,000, 100,000, 200,000, 500,000, 1,000,000 or more than 1,000,000 bases in length.
- Methods described herein provide for the amplification of nucleic acid acids, such as nucleic acid templates.
- Methods described herein additionally provide for the generation of isolated and at least partially purified nucleic acids and libraries of nucleic acids.
- methods described herein provide for extracted nucleic acids (e.g., extracted from tissues, cells, or media).
- Nucleic acids include but are not limited to those comprising DNA, RNA, circular RNA, mtDNA (mitochondrial DNA), cfDNA (cell free DNA), cfRNA (cell free RNA), siRNA (small interfering RNA), cffDNA (cell free fetal DNA), circular DNA, extrachromosomal DNAs (ecDNAs), mRNA, tRNA, rRNA, miRNA (microRNA), synthetic polynucleotides, polynucleotide analogues, any other nucleic acid consistent with the specification, or any combinations thereof.
- nt nucleotides
- bp bases
- kb kilobases
- Gb gigabases
- a composition such as a storage buffer.
- more than one buffer such as one buffer, two buffers, three buffer, or more may be provided or used.
- one or more compositions or one or more buffers may be provided as part of a kit and be accompanied by instructions for use. Provided herein are also methods of cell handling, storage, and analysis using the compositions of the present disclosure.
- a composition e.g., a storage buffer
- a composition comprising: (a) a salt; (b) an ionic surfactant; (c) a protease; (d) a reducing agent; and (e) a chelator.
- the ionic surfactant in the composition may be at a concentration from about 0.00001 to about 1 volume percent.
- the ionic surfactant in the composition may be at a concentration of at least about 0.00001 %, at least about 0.0001 %, at least about 0.001%, at least about 0.01%, at least about 0.02%, at least about 0.03%, at least about 0.04%, at least about 0.05%, at least about 0.06%, at least about 0.07%, at least about 0.08%, at least about 0.09%, at least about 0.1%, at least about 0.2%, at least about 0.3%, at least about 0.4%, at least about 0.5%, at least about 1% by volume or greater.
- the concentration of the ionic surfactant in the composition is at most about 1%, at most about 0.5%, at most about 0.4%, at most about 0.3%, at most about 0.2%, at most about 0.1%, at most about 0.09%, at most about 0.08%, at most about 0.07%, at most about 0.07%, at most about 0.06%, at most about 0.05%, athand about 0.04%, at most about 0.03%, at most about 0.02%, at most about 0.01% by volume, or less.
- the concentration of the ionic surfactant in the composition may be from about 0.01% to about 0.09% by volume.
- the concentration of the ionic surfactant in the composition may be from about 0.02% to about 0.08% by volume. In some examples, the concentration of the ionic surfactant in the composition may be from about 0.005% to about 0.02% by volume. In some examples, the concentration of the ionic surfactant in the composition may be from about 0.008% to about 0.015% by volume. In some examples, the concentration of the ionic surfactant in the composition may be about 0.01% by volume.
- the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof.
- SDS sodium dodecyl sulfate
- SLS sodium lauryl sulfate
- SLES sodium laureth sulfate
- ALS ammonium lauryl sulfate
- ALES ammonium laureth sulfate
- sodium stearate potassium cocoate
- potassium cocoate potassium cocoate
- the protease may be at a concentration from about 0.01 units to about 5 units per milliliter. In some examples, the protease may be at a concentration of at least about 0.01 units, at least about 0.02 units per milliliter, at least about 0.03 units per milliliter, at least about 0.04 units per milliliter, at least about 0.05 units per milliliter, at least about 0.06 units per milliliter, at least about 0.07 units per milliliter, at least about 0.08 units per milliliter, at least about 0.09 units per milliliter, at least about 0.1 units per milliliter, at least about 0.5 units per milliliter, at least about 1 unit per milliliter, at least about 2 units per milliliter, at least about 3 units per milliliter, at least about 4 units per milliliter or more. In some examples, the protease may be Proteinase K. The protease (e.g., proteinase K) may be thermolabile.
- the protease may be at a concentration of at most about 5 units per milliliter, at most about 4 units per milliliter, at most about 3 units per milliliter, at most about 2 units per milliliter, at most about 1 units per milliliter, at most about 0.5 units per milliliter, at most about 0.1 units per milliliter, at most about 0.09 units per milliliter, at most about 0.08 units per milliliter, at most about 0.07 units per milliliter, at most about 0.06 units per milliliter, at most about 0.05 units per milliliter at most about 0.04 units per milliliter, at most about 0.03 units per milliliter, at most about 0.02 units per milliliter or less.
- the protease may be Proteinase K.
- the protease (e.g., proteinase K) may be thermolabile.
- the concentration of the protease may be from about 50 micro grams per milli-Liter (pg/mL) to about 1000 pg/mL. In some examples, the concentration of the protease may be at least about 50 pg/mL, at least about 100 pg/mL, at least about 200 pg/mL, at least about 300 pg/mL, at least about 400 pg/mL, at least about 500 pg/mL, at least about 600 pg/mL, at least about 700 pg/mL, at least about 800 pg/mL or more.
- the concentration of the protease may be at most about 800 pg/mL, at most about 700 pg/mL, at most about 600 pg/mL, at most about 500 pg/mL, at most about 400 pg/mL, at most about 300 pg/mL, at most about 200 pg/mL, or less. In some examples, the concentration of the protease may be about 200 micro grams per milli-Liter (pg/mL). In some cases, the protease may be thermolabile. In some examples, the protease may be Proteinase K. In some cases, the Proteinase K may be thermolabile.
- the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof.
- the protease comprises or is Proteinase K.
- the salt comprises or is Tris-HCL.
- the reducing agent comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof.
- the protease comprises or is Proteinase K.
- the salt comprises or is Tris-HCL.
- the reducing agent comprises
- the chelator comprises or is Ethylenediaminetetraacetic acid (EDTA).
- the composition (e.g., storage buffer) comprises sodium dodecyl sulfate (SDS) at a concentration from about 0.005% to about 0.06% by volume, Proteinase K at a concentration from about 0.01 units to about 5 units per milliliter or , Proteinase K at a concentration from about 50 micro grams per milli-Liter (pg/mL) to about 1000 pg/mL, Tris- HCL, Dithiothreitol (DTT), and EDTA.
- SDS sodium dodecyl sulfate
- the concentration of Proteinase K may be about 200 micro grams per milli-Liter (pg/mL).
- the concentrations of Tris-HCL, Dithiothreitol (DTT), and EDTA may be any suitable amount.
- the concentration of Tris-HCL may be about 50 milli-Molar (mM).
- the concentration of EDTA may be about 1 milli-Molar (mM).
- the pH of the buffer may be from about 7 to about 9. In an example, the pH of the buffer may be approximately 8.5. In an example, the pH of the buffer may be approximately 8. In an example, the pH of the buffer may be approximately 7.5.
- the composition is used for performing at least one of lysing the cell, storing the cell and/or genomic materials of the cell, and preparing the sample for downstream amplification of the genomic materials of the cell.
- amplification may comprise or be isothermal amplification.
- amplification may comprise primary template amplification (PTA).
- the composition e.g., the storage buffer
- the composition is used for lysing a cell, extracting the genomic materials of the cell, storing the cell or genomic materials of the cell, and preparing the sample for amplification of the genomic materials of the cell.
- a sample comprising one or more cells or a single cell is prepared and stored in the composition. In some cases, optionally, the sample may be shipped from one place to another during storage.
- storing comprises storing for at least about 1 hour, 4 hours, 8 hours, 12 hours, 24 hours, at least about 48 hours, at least about 72 hours or longer.
- storage may comprise storage for at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, or longer.
- the cells are stored for about 1-5, 1-10, 1-12, 1-18, 1-24, 3-8, 3-12, 6-12, 6-24, 12-24, 12-36, 12-48, 12-72, 24-36, 24-48, 24-72, or 48-72 hours.
- the cells are stored for about 1-3, 1-7, 1-10, 2-5, 2-8, 3-5, 3-10, 5-7, 5- 10, or 5-14 days. In some examples, the cells are stored for about one day or about seven days as provided for example in FIGs. 14A-14E. In some cases, storage may comprise storage in a temperature of at least about -20 °C, -10 °C, 0 °C, 5 °C, 10 °C, 20 °C, 25 °C, or higher. In some examples, the cells are stored at about -20 °C to 0 °C, -20 °C to 10 °C, 0 °C to 10 °C, or 0 °C to 25 °C.
- the cells are stored at about -20 °C, about 6-10 °C, or at room temperature, as provided for example in FIGs. 14A-14E.
- storage comprises longterm storage.
- sample may be shipped during storage.
- the temperature changes or fluctuates during storage.
- the sample and/or components thereof remain substantially stable during storage despite shipment or changes or fluctuations in storage conditions such as temperature, humidity, and pressure.
- the composition (e.g., storage buffer) is configured for storing a sample comprising one or more cells or constituents of one or more cells.
- the cells are stored using a stability buffer, as provided herein.
- the stability buffer stabilized nucleic acids across varying cell amounts, temperatures, and time, as show for example in FIGs. 14A-14E.
- the samples (e.g., cells or genomic material therein) in the stability buffer demonstrate improved recovery of genes compared to samples stored not in the stability buffer (e.g., FIG. 15.
- the samples (e.g., cells or genomic material therein) in the stability buffer demonstrate enhanced recovery of mRNA compared to samples stored not in the stability buffer (e.g., FIGs. 16A-B).
- the one or more cells comprise at most about 20, at most about 15, or at most about 10 cells. In some examples, the one or more cells is a single cell.
- a temperature variation of at least about 1%, at least about 5%, at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60% or greater may occur during storage and/or shipment.
- Such temperature changes or variation may in some cases be non-ideal or unintentional.
- the sample stored in the composition of the present disclosure may be substantially insensitive and/or resilient to such change(s) in temperature.
- the properties of the sample, the stability of the cell constituents such as genomic materials of the cells, or other sample properties may not be compromised as a result of such temperature variation during storage.
- compositions may be more optimal for cell storage or shipment compared to pre-existing formulations.
- Such compositions can significantly reduce waste and inefficiencies in the experimental workflows performed on the samples of the present disclosure, such as isothermal amplification.
- the compositions may also increase the duration for which the sample can be stored, the distances the sample can be shipped, and the range of conditions that the sample can sustain during storage and shipment.
- a stability buffer as described herein can stabilize nucleic acids across varying cell amount, temperatures, or time.
- the composition may lyse the cell and extract the genomic contents of the cell. In some instances, the composition may further preserve the genomic contents of the cells during storage. During storage (e.g., long term storage as described anywhere herein), the stability of the genomic materials of the cells may be preserved by the composition. For example, the chromatin, DNA, RNA, or any combination thereof of the one or more cells may remain stable during storage and/or potential shipment.
- the composition (e.g., storage buffer) further comprises a second salt different from the salt.
- the second salt stimulates the activity of the protease.
- the second salt may comprise or be a divalent cation such as calcium chloride (CaCh), magnesium chloride (MnCh), or cobalt chloride (C0CI2).
- the composition may comprise any suitable amount of the second salt, in some examples, at least about 0.1 mM, at least about 0.2 mM, at least about 0.3 mM, at least about 0.4 mM, at least about 0.5 mM, at least about 0.6 mM, at least about 0.7 mM or more.
- kits comprising the composition of any one of the preceding embodiments.
- the kit comprises a second composition (e.g., a neutralizing buffer) different from the composition (e.g., the storage buffer).
- the second composition comprises a neutralizing buffer comprising a component that substantially neutralizes the ionic surfactant.
- the kit further comprises instructions for storing a cell.
- the kit further comprises instructions for preparing the genomic contents of the cell for downstream amplification using the composition and the second composition.
- the composition is used to store the cell and the second composition is used to neutralize the composition.
- neutralizing the composition improves the results of a downstream amplification performed on the cell.
- the second composition comprises a non-ionic surfactant, a zwitterionic surfactant, a charge-neutral surfactant, or any combination thereof.
- non-ionic surfactant may comprise a polysorbate, a polyethylene glycol) derivative, or both.
- non-ionic surfactant may comprise a polysorbate, a poly(ethylene glycol) derivative, or both.
- the non-ionic surfactant comprises Tween (e.g., Tween-20, Tween-40, Tween-60, Tween-80) or Triton (e.g., Triton-X-100).
- Tween e.g., Tween-20, Tween-40, Tween-60, Tween-80
- Triton e.g., Triton-X-100.
- One function of the non-ionic surfactant may be neutralizing the ionic surfactant.
- the composition may be referred to as storage buffer and the second composition may be referred to as neutralizing buffer.
- the non- ionic surfactant is maintained in one or more downstream reactions.
- a presence of a non-ionic surfactant can be maintained when amplifying cfDNA.
- the cfDNA is fragmented or degraded.
- the cfDNA is amplified in the presence of a stability buffer and the non-ionic surfactant, for example, as shown in FIG. 17.
- the kit further comprises additional buffers.
- the kit may further comprise reagents for performing amplification, such as isothermal amplification or PTA.
- the kit may comprise reagents for performing genomics and/or multiomics analysis.
- the kit may comprise ResolveOMETM reagents from BioSkryb Genomics.
- the kit may comprise reverse transcription (RT) buffers.
- a reverse transcription buffer may comprise a salt (e.g., Tris- HCL), a divalent cation, DTT, Tween, Triton, dATP, dCTP, dTTP, dGTP, Ammonium Sulfate, or any combination thereof.
- the reverse transcription buffer may comprise a salt such as Tris-HCL at a concentration of from about 30 mM to about 80 mM, for example 60 mM.
- the RT buffer may comprise the divalent cation at a concentration from about 8 mM to about 20 mM, in an example 12 mM.
- the RT buffer may comprise DTT at a concentration from about 10 mM to about 50 mM, in an example 20 mM.
- the RT buffer may comprise a non-ionic surfactant, such as a polysorbate, such as Tween (e.g., Tween-20) at a concentration from about 1% to about 10%, in an example, about 5%.
- the RT buffer may further comprise Triton at a concentration from about 0% to about 5%, in an example 0.5%.
- the RT buffer may comprise dATP, dCTP, dTTP, dGTP, GTP, or any combination thereof, each at a concentration of from about 1 mM to about 10 mM, or from about 3 mM to about 6 mM, in an example, from about 4mM to about 5mM.
- the RT buffer may further comprise Ammonium Sulfate at a concentration from about 10 mM to 100 mM, at least about 20 mM, at least about 30 mM, at least about 40 mM, at least about 50 mM, at least about 60 mM, or higher, in an example, 75 mM. Examples of RT buffer recipes are provided in FIG. 5A.
- a method of cell analysis comprising: (a) providing or obtaining a sample comprising one or more cells stored in a composition comprising: (i) a salt; (ii) an ionic surfactant; (iii) a protease; (iv) a reducing agent; and (v) a chelator.
- the method may further comprise (b) amplifying genomic materials of the one or more cells; and (c) performing genomic analysis on the genomic materials of the one or more cells.
- the composition may be any composition described anywhere throughout the present disclosure. Genomic analysis may comprise performing isothermal amplification.
- the one or more cells comprises from 1 to about 10 cells.
- the sample has a single cell.
- the sample comprises at most about 1000, at most about 800, at most about 600, at most about 400, at most about 200, at most about 100, at most about 80, at most about 60, at most about 40, at most about 20, at most about 10, at most about 8, at most about 6, at most about 4, a smaller number of cells, or a single cell.
- the method does not require column filtration and is amenable to retrieval of the genomic materials of the one or more cells from the sample.
- the sample loss caused by the methods and compositions may be substantially minimal. In some cases, certain washing protocols, or multiple washing steps may lead to sample loss or cell loss from samples.
- Samples containing small numbers of cells or single cells may be sensitive to filtration and/or washing steps.
- the compositions of the present disclosure may make it possible to prepare and store the sample without column filtration. This may reduce or prevent sample loss or cell loss from the sample. This may improve the results of genomic analysis performed on the sample. This may enable sample storage for longer periods, retrieval of genomic contents from the sample with minimal to no compromise thereto, facilitate safe shipment of the sample, and reduce the sensitivity of the sample to storage conditions such as temperature, humidity, and pressure.
- the compositions may work particularly well for samples containing small numbers of cells or single cells as described elsewhere herein, filling existing gap in preceding technologies.
- the one or more cells is a single cell
- genomic analysis is single cell genomics or multiomics, wherein the composition is configured for storing the single cell and preparing the genomic contents of the single cell for downstream amplification.
- Amplification may be isothermal amplification.
- the method further comprises sorting the cells prior to storage.
- the method further comprises neutralizing the composition with a second composition.
- the second composition comprises a neutralizing buffer capable of neutralizing the ionic surfactant.
- nonionic surfactant may comprise a polysorbate, a polyethylene glycol) derivative, or both.
- the neutralizing buffer comprises Tween (e.g., Tween-20, Tween-40, Tween-60, or Tween-80) or Triton (e.g., Triton-X-100).
- the composition may be referred to as storage buffer and the second composition may be referred to as neutralizing buffer.
- the non-ionic surfactant is maintained in one or more downstream reactions.
- a presence of a non-ionic surfactant can be maintained when amplifying cfDNA.
- the cfDNA is fragmented or degraded.
- the cfDNA is amplified in the presence of a stability buffer and the non-ionic surfactant, for example, as shown in FIG. 17.
- the ionic surfactant comprises from about 0.00001 to about 1 volume percent of the composition. In some examples, the ionic surfactant comprises from about 0.03 to about 0.09 volume percent of the composition. In some examples, the ionic surfactant comprises from about 0.01 to about 0.05 volume percent of the composition.
- the protease comprises from about 0.01 units to about 5 units per milliliter of the composition.
- the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof.
- the protease comprises or is Proteinase K.
- the salt comprises or is Tris-HCl, TES, or HEPES.
- the chelator comprises or is Ethylenediaminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA).
- the composition further comprises a second salt different from the salt, wherein the second salt is capable of stimulating the activity of the ionic surfactant.
- the second salt comprises or is a divalent cation.
- the second salt comprises or is calcium chloride.
- the method further comprises lysing the one or more cells and extracting the genomic materials thereof.
- cells may be lysed after storage.
- Genomic materials may comprise the whole genomes of the one or more cells.
- genomic materials comprise nucleic acid molecules.
- nucleic acid molecules comprise ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or both.
- the method may further comprise performing genomic analysis on the one or more cells.
- Genomic analysis may comprise or be single cell genomics or multiomics.
- genomic analysis comprises isothermal amplification, such as primary template amplification (PTA).
- PTA primary template amplification
- the methods, systems, and compositions for cell storage are compatible with single cell samples or samples with small cell numbers, long term storage and shipment, fluctuation or changes in the storage conditions such as temperature, pressure, humidity, and other conditions, downstream amplification of genomic materials of the cells upon or after storage, and any combination thereof, such as integrated workflows comprising any combination or all of the aforementioned procedures.
- the methods may be performed using any composition, buffer recipe, and kit described anywhere herein.
- the cell storage and neutralization buffers provide comparable sample quality and sequencing metrics after storge which are comparable to freshly harvested cells.
- methods of cell analysis comprising one or more steps of: providing a single cell stored in a buffer for a time period of at least 1 day; lysing the single cell; amplifying mRNA transcripts and genomic DNA from the cell to generate cDNA and genomic DNA libraries using a neutralization buffer provided herein, respectively; and sequencing mRNA transcripts and genomic DNA from the cell to obtain one or more sequencing metrics, wherein one or more of: the yield of pre-amplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of preamplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within lOx of values obtained when compared to storage conditions having a time
- sequencing metrics comprises one or more Picard Metrics. In some instances, sequencing metrics comprises one or more of PreSeq, protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera, and Gini index.
- the yield of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day.
- the yield of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day.
- the single cell is suspended in a composition provided herein.
- the yield of pre-amplification of cDNA or genomic DNA comprise values about the same as values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of preamplification of cDNA or genomic DNA comprise values about the same as values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values about the same as values obtained when compared to storage conditions having a time period of less than 1 day.
- the time period is at least 10 days.
- the time period is 1-30, 1-20, 1-15, 1-10, 3-20, 3-15, 3-12, 3- 10, 3-8, 3-5, 5-20, 5-15, 5-10, 8-30, 8-20, 8-15, 8-12, or 10-20 days.
- the single cell is an embryonic cell.
- the single cell is a human embryonic cell.
- providing comprises shipping or storing.
- the time period is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or at least 20 days.
- the single cell was stored at a storage temperature.
- the storage temperature is from about -20 to about 30 degrees Celsius (°C), optionally with a variation of from about 0 to about 20%.
- Methods provided herein in some instances result in improved downstream processing outcomes. In some instances, improvements are obtained vs. a standard cell buffer, such as PBS.
- use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75% increase in total yield of DNA obtained from a PTA reaction, relative to use without the storage buffer. In some instances, use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75% increase in total yield of cDNA obtained from a multiomics PTA workflow, relative to use without the storage buffer.
- use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75% increase in the number of detected protein coding genes obtained from a multiomics PTA workflow, relative to use without the storage buffer. In some instances, use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75% increase in the proportion of exonic genes from a multiomics PTA workflow, relative to use without the storage buffer.
- methods of amplifying a fragmented or degraded nucleic acid sample comprising one or more steps of: (a) providing or obtaining a sample comprising fragmented or degraded nucleic acids; (b) suspending the fragmented or degraded nucleic acids in a composition described herein; (c) amplifying genomic materials of the one or more cells, wherein a non-ionic surfactant is maintained during amplification; and (d) performing genomic analysis on the genomic materials of the one or more cells.
- the nucleic acid sample comprises ctDNA.
- the nucleic acid sample comprises cfDNA.
- the composition is present at 10X concentration.
- methods wherein a total yield of amplicons is within 5, 10, 15, 20, 25, 30, 40, 50, or within 60 % compared to amplification without the composition.
- a method comprises one or more steps of isolating a single cell from a population of cells, wherein the single cell comprises RNA and genomic DNA; amplifying the RNA by RT- PCR to generate a cDNA library; isolating the cDNA from the genomic DNA; contacting the genomic DNA with at least one amplification primer, at least one nucleic acid polymerase, and a mixture of nucleotides; isolating the cDNA from a genomic library, and sequencing the cDNA library and the genomic DNA library.
- the mixture of nucleotides comprises at least one nucleotide configured for digestion (or removal, or reaction) by an enzyme or chemical process. In some instances, the mixture of nucleotides comprises dUTP. In some instances, the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the polymerase to generate a genomic DNA library. In some instances, a terminator nucleotide comprises an irreversible terminator. In some instances, an irreversible terminator inhibits or is resistant to 3’ to 5’ exonuclease activity. In some instances, a multi omic experiment comprises measuring expression levels of a panel of mRNAs.
- PTA may be used as a replacement for any number of other known methods in the art which are used for single cell sequencing (multiomics or the like).
- PTA may substitute genomic DNA sequencing methods such as MDA, PicoPlex, DOP- PCR, MALBAC, or target-specific amplifications.
- PTA replaces the standard genomic DNA sequencing method in a multiomics method including DR-seq (Dey et al., 2015), G&T seq (MacAulay et al., 2015), scMT-seq (Hu et al., 2016), sc-GEM (Cheow et al., 2016), scTrio-seq (Hou et al., 2016), simultaneous multiplexed measurement of RNA and proteins (Darmanis et al., 2016), scCOOL-seq (Guo et al., 2017), CITE-seq (Stoeckius et al., 2017), REAP-seq (Peterson et al., 2017), scNMT-seq (Clark et al., 2018), or SIDR-seq (Han et al., 2018).
- DR-seq Dey et al., 2015
- a method described herein comprises PT A and a method of polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of non-polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of total (polyadenylated and non-polyadenylated) mRNA transcripts.
- PTA is combined with a standard RNA sequencing method to obtain genome and transcriptome data.
- a multiomics method described herein comprises PTA and one of the following: Drop-seq (Macosko, et al.
- an RT reaction mix is used to generate a cDNA library.
- the RT reaction mixture comprises a crowding reagent, at least one primer, a template switching oligonucleotide (TSO), a reverse transcriptase, and a dNTP mix.
- an RT reaction mix comprises an RNAse inhibitor.
- an RT reaction mix comprises one or more surfactants.
- an RT reaction mix comprises Tween-20 and/or Triton-X.
- an RT reaction mix comprises Betaine.
- an RT reaction mix comprises one or more salts.
- an RT reaction mix comprises a magnesium salt (e.g., magnesium chloride) and/or tetramethylammonium chloride.
- an RT reaction mix comprises gelatin.
- an RT reaction mix comprises PEG (PEG1000, PEG2000, PEG4000, PEG6000, PEG8000, or PEG of other length).
- Multi omic methods described herein may provide both genomic and RNA transcript information from a single cell (e.g., a combined or dual protocol).
- genomic information from the single cell is obtained from the PTA method, and RNA transcript information is obtained from reverse transcription to generate a cDNA library.
- a whole transcript method is used to obtain the cDNA library.
- 3’ or 5’ end counting is used to obtain the cDNA library.
- cDNA libraries are not obtained using UMIs.
- a multiomic method provides RNA transcript information from the single cell for at least 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or at least 15,000 genes.
- a multi omic method provides RNA transcript information from the single cell for about 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or about 15,000 genes. In some instances, a multiomic method provides RNA transcript information from the single cell for 100-12,000 1000-10,000, 2000-15,000, 5000-15,000, 10,000-20,000, 8000- 15,000, or 10,000-15,000 genes.
- samples that are stored or analyzed according to the present disclosure express one or more genes.
- the one or more genes can comprise housekeeping genes that demonstrate a signature of a clinical sample.
- the one or more genes can comprise GAPDH, ACTB, ACTG1, RPL36, HINT1, TBP, PPIA, HPRT1, UBC, RPL13A, PGK1, EIF3K, RPLP0, GUSB, CLTC, HMBS, EEF1A1, MALAT1, or any combination thereof (e.g., FIG. 15).
- the one or more genes can comprise ATP2B1, RABL6, LGMN, CREG1, PTMS, MTND2P28, MT-ND4L, MT-ATP8, MTCO1P12, MTND1P23, DNMT3L, FTH1, MT-ATP8, RPL26, and H4C3, or any combination thereof (e.g., FIG. 16A).
- the one or more genes can comprise ATP2B1, WAC, RABL6, LGMN, CREG1, EMC2, PTMS, PRMT2, GTF3C3, ZNF888, MT-ND1, MTND2P28, MT- CYB, MT-ND4L, MT-ND6, MT-ATP8, MT-ND5, MTCO1P12, MTDATP6P1, MTND1P23, RPS5, RPS27, DNMT3L, MTATP6P1, DNAJC15, FTH1, MT-ATP8, RPL26, H4C3, NDUFAB1, or any combination thereof (e.g., FIG. 16B).
- recovery of the one or more genes are improved if the sample is stored or prepared using a stability buffer of the present disclosure.
- a multiomic method provides genomic sequence information for at least 80%, 90%, 92%, 95%, 97%, 98%, or at least 99% of the genome of the single cell. In some instances, a multiomic method provides genomic sequence information for about 80%, 90%, 92%, 95%, 97%, 98%, or about 99% of the genome of the single cell.
- RNA may be amplified in the multiomics methods described herein. In some instances, RNA is amplified to isolate mRNA transcripts. In some instances, template-switching polynucleotides are used. In some instances, amplification of RNA uses labeled primers. In some instances, a label comprises biotin.
- cDNA polynucleotides are isolated with affinity binding to the label.
- multiomics methods comprise amplification of RNA to generate a cDNA library.
- a cDNA library is generated having at least 10, 20, 30, 50, 75, 100, 125, 150, 175, 200, 225, 250, 300, 350, 400, 500, 600, 700, 800, 900 or at least 1,000 ng of DNA.
- a cDNA library is generated having 10-500, 20-500, 30-500, 50-500, 50-400, 50-300, 100-500, 100-400, 100-300, 100-200, 200-500, 300-500, 400-750, 500-1,000, 600-1,200, 800-1,500, or 1,000-1,500 ng of DNA.
- at least some polynucleotides in the cDNA library comprise a barcode.
- the cDNA comprises polynucleotides corresponding to at least 100, 500, 1000, 1500, 2000, 2500, 3000, 3500, or at least 4000 genes.
- the cDNA comprises a 5’ to 3’ transcript bias of 0.5-1.5, 0.6-1.5, 0.7-1.5, 0.8-1.5, 0.9-1.5, 0.8-1.5, 1-1.5, 1-2.0, 1.2-2.0, 0.5-2.0.
- Multi omic methods may comprise analysis of single cells from a population of cells. In some instances, at least 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or at least 8000 cells are analyzed. In some instances, about 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or about 8000 cells are analyzed. In some instances, 5-100, 10-100, 50-500, 100-500, 100-1000, 50-5000, 100- 5000, 500-1000, 500-10000, 1000-10000, or 5000-20,000 cells are analyzed.
- Multi omic methods may generate yields of genomic DNA from the PTA reaction based on the type of single cell.
- the amount of DNA generated from a single cell is about 0.1, 1, 1.5, 2, 3, 5, or about 10 micrograms.
- the amount of DNA generated from a single cell is about 0.1, 1, 1.5, 2, 3, 5, or about 10 femtograms.
- the amount of DNA generated from a single cell is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 micrograms.
- the amount of DNA generated from a single cell is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 femtograms.
- the amount of DNA generated from a single cell is about 0.1-10, 1-10, 1.5-10, 2-20, 2-50, 1-3, or 0.5-3.5 micrograms. In some instances, the amount of DNA generated from a single cell is about 0.1-10, 1-10, 1.5-10, 2-20, 2-4, 1-3, or 0.5-4 femtograms. In some instances, the amount of DNA generated from a single cell is about 0.5-2.5, 0.5-3, 0.5-5, 0.2-5, 1-2.5, or 1-5 ng of DNA. In some instances, the amount of DNA generated from a single cell is at least 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 4, or at least 5 ng of DNA.
- DNA libraries may comprise an allelic balance.
- the allelic balance is 50-100, 60-100, 70-100, 80-100, 60-95, 70-95, 80-95, 85-95, 90-95, 90-98, 90-99, 85-99, or 95- 99 percent.
- the allelic balance is at least 50, 60, 70, 80, 83, 85, 87, 90, 92, 95, 98, or at least 99 percent.
- DNA libraries may comprise a sensitivity for one or more SNVs.
- the sensitivity is 0.50-1, 0.60-1, 0.70-1, 0.80-1, 0.60-0.95, 0.70-0.95, 0.80-0.95, 0.85-0.95, 0.90- 0.95, 0.90-0.98, 0.90-0.99, 0.85-0.99, or 0.95-0.99.
- the sensitivity is at least 0.50, 0.60, 0.70, 0.80, 0.83, 0.85, 0.87, 0.90, 0.92, 0.95, 0.98, or at least 0.99.
- DNA libraries may comprise a precision for one or more SNVs.
- the precision is 0.50-1, 0.60-1, 0.70-1, 0.80-1, 0.60-0.95, 0.70-0.95, 0.80-0.95, 0.85-0.95, 0.90-0.95, 0.90-0.98, 0.90-0.99, 0.85-0.99, or 0.95-0.99.
- the precision is at least 0.50, 0.60, 0.70, 0.80, 0.83, 0.85, 0.87, 0.90, 0.92, 0.95, 0.98, or at least 0.99.
- methylome analysis comprises identifying the location of methylated bases (e.g., methylC, hydroxymethylC). In some instances, these methods further comprise parallel analysis of the transcriptome, methylome, and/or proteome of the same cell.
- Methods of detecting methylated genomic bases include selective restriction with methylation-sensitive restriction enzymes or endonucleases, followed by processing with the PTA method. Sites cut by such enzymes are determined from sequencing, and methylated bases are identified. In another instance, bisulfite treatment of genomic DNA libraries converts unmethylated cytosines to uracil.
- Libraries are then in some instances amplified with methylation-specific primers which selectively anneal to methylated sequences.
- non-methylation-specific PCR is conducted, followed by one or more methods to discriminate between bisulfite-reacted bases, including direct pyrosequencing, MS- SnuPE, HRM, COBRA, MS-SSCA, or base-specific cleavage/MALDI-TOF.
- genomic DNA samples are split for parallel analysis of the genome (or an enriched portion thereof) and methylome analysis.
- analysis of the genome and methylome comprises enrichment of genomic fragments (e.g., exome, or other targets) or whole genome sequencing.
- methylated bases in a genomic sample are identified by (a) conversion of a methylated base to a different base, or (b) conversion of a non-methylated base to a different base. Such conversions in some instances are performed on whole genomes or genomic fragments. The resulting sequences are then compared to a reference sequence (obtained without conversion/treatment) to identify which bases are methylated.
- a conversion method (or process) comprises treatment with a deamination reagent.
- a conversion method comprises treatment with bisulfate.
- one or more enzymes are used to selectively discriminate between methylated and unmethylated bases.
- enzymes comprises TET (ten eleven translocation) family enzymes.
- a TET family enzyme comprises TET2.
- enzymes comprise T4-BGT.
- a conversion method comprises treatment with a reagent to protect methylcytosines (e.g., TET2 for oxidation), followed by treatment with an enzyme to deaminate unprotected cytosines (e.g., APOBEC). Additional reagents which differentiate methylated and non-methylated bases are also consistent with the methods disclosed herein.
- unmethylated cytosines are converted to uracil.
- amplification of these uracil- containing modified genomes results in conversion of uracil to thymine.
- amplification comprises use of uracil tolerant polymerases described herein.
- adapters described herein are modified to replace cytosines with methylcytosines or other base which resists conversion.
- the methods may comprise single cell bisulfite sequencing and reduced representation bisulfite sequencing.
- the data obtained from single-cell analysis methods such as whole genome amplification, in some case isothermal amplification, and in some cases PTA described herein may be compiled into a database.
- Data from the proteome, genome, transcriptome, methylome or other data is in some instances combined/integrated into a database and analyzed.
- Bioinformatic data integration methods and systems in some instances comprise one or more of protein detection (FACS and/or NGS), mRNA detection, and/or genome variance detection. In some instances, this data is correlated with a disease state or condition.
- data from a plurality of single cells is compiled to describe properties of a larger cell population, such as cells from a specific sample, region, organism, or tissue.
- protein data is acquired from fluorescently labeled antibodies which selectively bind to proteins on a cell.
- a method of protein detection comprises grouping cells based on fluorescent markers and reporting sample location post-sorting.
- a method of protein detection comprises detecting sample barcodes, detecting protein barcodes, comparing to designed sequences, and grouping cells based on barcode and copy number.
- protein data is acquired from barcoded antibodies which selectively bind to proteins on a cell.
- transcriptome data is acquired from sample and RNA specific barcodes.
- a method of mRNA detection comprises detecting sample and RNA specific barcodes, aligning to genome, aligning to RefSeq/Encode, reporting Exon/Intro/Intergenic sequences, analyzing exon-exon junctions, grouping cells based on barcode and expression variance and clustering analysis of variance and top variable genes.
- genomic data is acquired from sample and DNA specific barcodes.
- a method of genome variance detection comprises detecting sample and DNA specific barcodes, aligning to the genome, determine genome recovery and SNV mapping rate, filtering reads on exon-exon junctions, generating variant call file (VCF), and clustering analysis of variance and top variable mutations.
- a mutation is a difference between an analyzed sequence (e.g., using the methods described herein) and a reference sequence.
- Reference sequences are in some instances obtained from other organisms, other individuals of the same or similar species, populations of organisms, or other areas of the same genome.
- mutations are identified on a plasmid or chromosome.
- a mutation is an SNV (single nucleotide variation), SNP (single nucleotide polymorphism), or CNV (copy number variation, or CNA/copy number aberration).
- a mutation is base substitution, insertion, or deletion. In some instances, a mutation is a transition, transversion, nonsense mutation, silent mutation, synonymous or non-synonymous mutation, non-pathogenic mutation, missense mutation, or frameshift mutation (deletion or insertion).
- PTA results in higher detection sensitivity and/or lower rates of false positives for the detection of mutations when compared to methods such as in-silico prediction, ChlP-seq, GUIDE-seq, circle-seq, HTGTS (High- Throughput Genome-Wide Translocation Sequencing), IDLV (integration-deficient lentivirus), Digenome-seq, FISH (fluorescence in situ hybridization), or DISCOVER-seq.
- the methods and compositions of the present disclosure may be used for genomics screens.
- cells may be prepared and sorted at predetermined cell densities and sample quantities prior to storage in a composition of the present disclosure (e.g., a storage buffer).
- the sample may be stored, in some cases for the long term, and optionally shipped from one place to another.
- the composition may keep the sample, the cells, and genomic contents of the cells stable.
- a second composition e.g., a neutralizing buffer
- the composition e.g., the storage buffer
- a storage buffer may comprise an ionic surfactant which can help reduce DNAse and RNAse activity in the stored sample
- the neutralizing buffer may comprise a non-ionic surfactant which may neutralize the ionic surfactant in the storage buffer.
- the cells may then be prepared and amplified. Amplification may comprise or be isothermal amplification. In some cases, amplification may be primary template amplification.
- PTA Primary Template- Directed Amplification
- amplicons are preferentially generated from the primary template (“direct copies”) using a polymerase (e.g., a strand displacing polymerase). Consequently, errors are propagated at a lower rate from daughter amplicons during subsequent amplifications compared to MDA.
- this method can amplify low DNA input including the genomes of single cells with high coverage breadth and uniformity in an accurate and reproducible manner.
- PTA enables kinetic control of an amplification reaction.
- PTA results in a pseudo-linear amplification reaction (rather than exponential amplification).
- the terminated amplification products can undergo direction ligation after removal of the terminators, allowing for the attachment of a cell barcode to the amplification primers so that products from all cells can be pooled after undergoing parallel amplification reactions.
- template nucleic acids are not bound to a bead.
- direct copies of template nucleic acids are not bound to a bead.
- one or more primers are not bound to a bead.
- no primers are not bound to a bead.
- a primer is attached to a first bead, and a template nucleic acid is attached to a second bead, wherein the first and the second bead are not the same.
- PTA is used to analyze single cells from a larger population of cells. In some instances, PTA is used to analyze more than one cell from a larger population of cells, or an entire population of cells.
- the methods and systems of the present disclosure can be used to co-encapsulate a cell with one or more beads delivering the components of the reaction such as a primer into a reaction chamber of the device.
- the device comprises a plurality of reaction chambers.
- a sub-population or each of the plurality of reaction chambers may encapsulate a single cell, a first bead, and in some cases a second bead, wherein each of the first bead and the second bead deliver a reagent (e.g., a primer, a probe, or another reagent) to the reaction chamber to participate in the reaction with the cell.
- a reagent e.g., a primer, a probe, or another reagent
- nucleic acid polymerases with strand displacement activity for amplification.
- such polymerases comprise strand displacement activity and low error rate.
- such polymerases comprise strand displacement activity and proofreading exonuclease activity, such as 3 ’->5’ proofreading activity.
- nucleic acid polymerases are used in conjunction with other components such as reversible or irreversible terminators, or additional strand displacement factors.
- the polymerase has strand displacement activity, but does not have exonuclease proofreading activity.
- such polymerases include bacteriophage phi29 ( 29) polymerase, which also has very low error rate that is the result of the 3’->5’ proofreading exonuclease activity (see, e.g., U.S. Pat. Nos. 5,198,543 and 5,001,050).
- examples of strand displacing nucleic acid polymerases include, e.g., genetically modified phi29 ( ⁇ E>29) DNA polymerase, KI enow Fragment of DNA polymerase I (Jacobsen et al., Eur. J. Biochem.
- phage M2 DNA polymerase (Matsumoto et al., Gene 84:247 (1989)), phage phiPRDl DNA polymerase (Jung et al., Proc. Natl. Acad. Sci. USA 84:8287 (1987); Zhu and Ito, Biochim. Biophys. Acta. 1219:267-276 (1994)), Bst DNA polymerase (e.g., Bst large fragment DNA polymerase (Exo(-) Bst; Aliotta et al., Genet. Anal.
- Bst DNA polymerase e.g., Bst large fragment DNA polymerase (Exo(-) Bst; Aliotta et al., Genet. Anal.
- T7 DNA polymerase T7-Sequenase
- T7 gp5 DNA polymerase PRDI DNA polymerase
- T4 DNA polymerase Kaboord and Benkovic, Curr. Biol. 5: 149-157 (1995)
- Additional strand displacing nucleic acid polymerases are also compatible with the methods described herein.
- the ability of a given polymerase to carry out strand displacement replication can be determined, for example, by using the polymerase in a strand displacement replication assay (e.g., as disclosed in U.S. Pat. No. 6,977,148).
- Such assays in some instances are performed at a temperature suitable for optimal activity for the enzyme being used, for example, 32°C for phi29 DNA polymerase, from 46°C to 64°C for exo(-) Bst DNA polymerase, or from about 60°C to 70°C for an enzyme from a hyperthermophylic organism.
- Another useful assay for selecting a polymerase is the primerblock assay described in Kong et al., J. Biol. Chem. 268: 1965-1975 (1993).
- the assay consists of a primer extension assay using an M13 ssDNA template in the presence or absence of an oligonucleotide that is hybridized upstream of the extending primer to block its progress.
- polymerases incorporate dNTPs and terminators at approximately equal rates.
- the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are about 1 : 1, about 1.5: 1, about 2: 1, about 3: 1 about 4: 1 about 5: 1, about 10: 1, about 20: 1 about 50: 1, about 100: 1, about 200: 1, about 500: 1, or about 1000: 1.
- the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are 1 : 1 to 1000: 1, 2:1 to 500: 1, 5: 1 to 100: 1, 10: 1 to 1000: 1, 100: 1 to 1000: 1, 500: 1 to 2000: 1, 50: 1 to 1500: 1, or 25: 1 to 1000: 1.
- a polynucleotide mixture used herein for PTA may comprise dNTPs.
- dNTPs comprise one or more of dA, dT, dG, and dC.
- the concentration of dNTPs is no more than 10, 8, 7, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 mM.
- the concentration of dNTPs is 0.5-10, 0.5-5, 0.5-3, 0.5-2.5, 0.5-2, 0.5-1.5, 0.5-1, 0.1-5, 0.1-3, 0.1-3, 1-3, 0.5-2.5, or 1-2 mM.
- Such mixtures in some instances also comprise one or more terminators.
- a polynucleotide mixture used herein for PTA may comprise terminators.
- terminators comprise ddNTPs.
- terminators comprise irreversible terminators.
- irreversible terminators comprise alpha-thio dideoxynucleotides.
- the concentration of terminators is no more than 1, 0.8, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, or no more than 0.001 mM.
- the concentration of dNTPs is 0.05-1, 0.05-0.5, 0.05-0.3, 0.05-0.25, 0.05-0.2, 0.05-0.15, 0.05-0.1, 0.01-0.5, 0.01-0.3, 0.01-0.3, 0.1-0.3, 0.05-0.25, or 0.1-0.2 mM.
- strand displacement factors such as, e.g., helicase.
- additional amplification components such as polymerases, terminators, or other component.
- a strand displacement factor is used with a polymerase that does not have strand displacement activity.
- a strand displacement factor is used with a polymerase having strand displacement activity.
- strand displacement factors may increase the rate that smaller, double stranded amplicons are reprimed.
- any DNA polymerase that can perform strand displacement replication in the presence of a strand displacement factor is suitable for use in the PTA method, even if the DNA polymerase does not perform strand displacement replication in the absence of such a factor.
- Strand displacement factors useful in strand displacement replication in some instances include (but are not limited to) BMRF1 polymerase accessory subunit (Tsurumi et al., J. Virology 67(12):7648-7653 (1993)), adenovirus DNA-binding protein (Zijderveld and van der Vliet, J. Virology 68(2): 1158-1164 (1994)), herpes simplex viral protein ICP8 (Boehmer and Lehman, J.
- bacterial SSB e.g., E. coli SSB
- RPA Replication Protein A
- mtSSB human mitochondrial SSB
- Recombinases e.g., Recombinase A (RecA) family proteins, T4 UvsX, T4 UvsY, Sak4 of Phage HK620, Rad51, Dmcl, or Radb.
- RecA Recombinase A family proteins
- the PTA method comprises use of a singlestrand DNA binding protein (SSB, T4 gp32, or other single stranded DNA binding protein), a helicase, and a polymerase (e.g., SauDNA polymerase, Bsu polymerase, Bst2.0, GspM, GspM2.0, GspSSD, or other suitable polymerase).
- a polymerase e.g., SauDNA polymerase, Bsu polymerase, Bst2.0, GspM, GspM2.0, GspSSD, or other suitable polymerase.
- reverse transcriptases are used in conjunction with the strand displacement factors described herein.
- reverse transcriptases are used in conjunction with the strand displacement factors described herein.
- amplification is conducted using a polymerase and a nicking enzyme (e.g., “NEAR”), such as those described in US 9,617,586.
- the nicking enzyme is Nt.BspQI, Nb.BbvCi, Nb.BsmI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nt.BbvCI, Nt.BstNBI, Nt.CviPII, Nb.BpulOI, or Nt.BpulOI.
- amplification methods comprising use of terminator nucleotides, polymerases, and additional factors or conditions.
- factors are used in some instances to fragment the nucleic acid template(s) or amplicons during amplification.
- factors comprise endonucleases.
- factors comprise transposases.
- mechanical shearing is used to fragment nucleic acids during amplification.
- nucleotides are added during amplification that may be fragmented through the addition of additional proteins or conditions. For example, uracil is incorporated into amplicons; treatment with uracil D-glycosylase fragments nucleic acids at uracil-containing positions.
- amplification methods comprising use of terminator nucleotides, which terminate nucleic acid replication thus decreasing the size of the amplification products.
- terminator nucleotides are in some instances used in conjunction with polymerases, strand displacement factors, or other amplification components described herein.
- terminator nucleotides reduce or lower the efficiency of nucleic acid replication.
- Such terminators in some instances reduce extension rates by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%.
- Such terminators reduce extension rates by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%- 80%.
- terminators reduce the average amplicon product length by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Terminators in some instances reduce the average amplicon length by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%-80%. In some instances, amplicons comprising terminator nucleotides form loops or hairpins which reduce a polymerase’s ability to use such amplicons as templates.
- terminators slows the rate of amplification at initial amplification sites through the incorporation of terminator nucleotides (e.g., dideoxynucleotides that have been modified to make them exonuclease-resistant to terminate DNA extension), resulting in smaller amplification products.
- terminator nucleotides e.g., dideoxynucleotides that have been modified to make them exonuclease-resistant to terminate DNA extension
- PTA amplification products undergo direct ligation of adapters without the need for fragmentation, allowing for efficient incorporation of cell barcodes and unique molecular identifiers (UMI).
- UMI unique molecular identifiers
- Terminator nucleotides are present at various concentrations depending on factors such as polymerase, template, or other factors.
- the amount of terminator nucleotides in some instances is expressed as a ratio of non-terminator nucleotides to terminator nucleotides in a method described herein. Such concentrations in some instances allow control of amplicon lengths.
- the ratio of terminator to non-terminator nucleotides is modified for the amount of template present or the size of the template. In some instances, the ratio of ratio of terminator to non-terminator nucleotides is reduced for smaller samples sizes (e.g., femtogram to picogram range).
- the ratio of non-terminator to terminator nucleotides is about 2:l, 5: 1, 7:1, 10: 1, 20: 1, 50: 1, 100: 1, 200: 1, 500: 1, 1000: 1, 2000: 1, or 5000: 1. In some instances the ratio of non-terminator to terminator nucleotides is 2: 1-10: 1, 5: 1-20: 1, 10: 1-100: 1, 20: 1-200:1, 50: 1-1000: 1, 50: 1-500: 1, 75: 1-150: 1, or 100: 1-500: 1. In some instances, at least one of the nucleotides present during amplification using a method described herein is a terminator nucleotide.
- each terminator need not be present at approximately the same concentration; in some instances, ratios of each terminator present in a method described herein are optimized for a particular set of reaction conditions, sample type, or polymerase.
- each terminator may possess a different efficiency for incorporation into the growing polynucleotide chain of an amplicon, in response to pairing with the corresponding nucleotide on the template strand.
- a terminator pairing with cytosine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration.
- a terminator pairing with thymine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration.
- a terminator pairing with guanine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with adenine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with uracil is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. Any nucleotide capable of terminating nucleic acid extension by a nucleic acid polymerase in some instances is used as a terminator nucleotide in the methods described herein.
- a reversible terminator is used to terminate nucleic acid replication.
- a non-reversible terminator is used to terminate nucleic acid replication.
- non-limited examples of terminators include reversible and non-reversible nucleic acids and nucleic acid analogs, such as, e.g., 3’ blocked reversible terminator comprising nucleotides, 3’ unblocked reversible terminator comprising nucleotides, terminators comprising 2’ modifications of deoxynucleotides, terminators comprising modifications to the nitrogenous base of deoxynucleotides, or any combination thereof.
- terminator nucleotides are dideoxynucleotides.
- nucleotide modifications that terminate nucleic acid replication and may be suitable for practicing the invention include, without limitation, any modifications of the r group of the 3’ carbon of the deoxyribose such as inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof.
- any modifications of the r group of the 3’ carbon of the deoxyribose such as inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleo
- terminators are polynucleotides comprising 1, 2, 3, 4, or more bases in length.
- terminators do not comprise a detectable moiety or tag (e.g., mass tag, fluorescent tag, dye, radioactive atom, or other detectable moiety).
- terminators do not comprise a chemical moiety allowing for attachment of a detectable moiety or tag (e.g., “click” azide/alkyne, conjugate addition partner, or other chemical handle for attachment of a tag).
- all terminator nucleotides comprise the same modification that reduces amplification to at region (e.g., the sugar moiety, base moiety, or phosphate moiety) of the nucleotide.
- At least one terminator has a different modification that reduces amplification.
- all terminators have a substantially similar fluorescent excitation or emission wavelengths.
- terminators without modification to the phosphate group are used with polymerases that do not have exonuclease proofreading activity. Terminators, when used with polymerases which have 3 ’->5’ proofreading exonuclease activity (such as, e.g., phi29) that can remove the terminator nucleotide, are in some instances further modified to make them exonuclease-resistant.
- dideoxynucleotides are modified with an alpha-thio group that creates a phosphorothioate linkage which makes these nucleotides resistant to the 3 ’->5’ proofreading exonuclease activity of nucleic acid polymerases.
- Such modifications in some instances reduce the exonuclease proofreading activity of polymerases by at least 99.5%, 99%, 98%, 95%, 90%, or at least 85%.
- Examples of other terminator nucleotide modifications providing resistance to the 3’->5’ exonuclease activity include in some instances: nucleotides with modification to the alpha group, such as alpha-thio dideoxynucleotides creating a phosphorothioate bond, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' Fluoro bases, 3' phosphorylation, 2'-O-Methyl modifications (or other 2’-O-alkyl modification), propyne-modified bases (e.g., deoxycytosine, deoxyuridine), L-DNA nucleotides, L-RNA nucleotides, nucleotides with inverted linkages (e.g., 5’ -5’ or 3 ’-3 ’), 5’ inverted bases (e.g., 5’ inverted 2’,3’-dideoxy dT), methylphosphonate backbones, and trans nucleic acids.
- nucleotides with modification include base-modified nucleic acids comprising free 3’ OH groups (e.g., 2-nitrobenzyl alkylated HOMedU triphosphates, bases comprising modification with large chemical groups, such as beads or other large moiety).
- a polymerase with strand displacement activity but without 3 ’->5 ’exonuclease proofreading activity is used with terminator nucleotides with or without modifications to make them exonuclease resistant.
- nucleic acid polymerases include, without limitation, Bst DNA polymerase, Bsu DNA polymerase, Deep Vent (exo-) DNA polymerase, Klenow Fragment (exo-) DNA polymerase, Therminator DNA polymerase, and VentR (exo-).
- amplicon libraries resulting from amplification of at least one target nucleic acid molecule are in some instances generated using the methods described herein, such as those using terminators. Such methods comprise use of strand displacement polymerases or factors, terminator nucleotides (reversible or irreversible), or other features and embodiments described herein.
- reversible terminators are capable of removal by an exonuclease (e.g., or polymerase having exonuclease activity).
- irreversible terminators are not capable of substantial removal by an exonuclease (e.g., or polymerase having exonuclease activity).
- amplicon libraries generated by use of terminators described herein are further amplified in a subsequent amplification reaction (e.g., PCR). In some instances, subsequent amplification reactions do not comprise terminators. In some instances, amplicon libraries comprise polynucleotides, wherein at least 50%, 60%, 70%, 80%, 90%, 95%, or at least 98% of the polynucleotides comprise at least one terminator nucleotide. In some instances, the amplicon library comprises the target nucleic acid molecule from which the amplicon library was derived.
- the amplicon library comprises a plurality of polynucleotides, wherein at least some of the polynucleotides are direct copies (e.g., replicated directly from a target nucleic acid molecule, such as genomic DNA, RNA, or other target nucleic acid). For example, at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule.
- At least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 15% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 50% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule.
- 3%-5%, 3-10%, 5%-10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule.
- at least some of the polynucleotides are direct copies of the target nucleic acid molecule, or daughter (a first copy of the target nucleic acid) progeny.
- at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny.
- At least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 30% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny.
- 3%-5%, 3%-10%, 5%-10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny.
- direct copies of the target nucleic acid are 50- 2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length.
- daughter progeny are 1000-5000, 2000-5000, 1000-10,000, 2000-5000, 1500-5000, 3000-7000, or 2000-7000 bases in length.
- the average length of PTA amplification products is 25-3000 nucleotides in length, 50-2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length.
- amplicons generated from PTA are no more than 5000, 4000, 3000, 2000, 1700, 1500, 1200, 1000, 700, 500, or no more than 300 bases in length.
- amplicons generated from PTA are 1000-5000, 1000-3000, 200-2000, 200-4000, 500-2000, 750-2500, or 1000-2000 bases in length.
- Amplicon libraries generated using the methods described herein comprise at least 1000, 2000, 5000, 10,000, 100,000, 200,000, 500,000 or more than 500,000 amplicons comprising unique sequences.
- the library comprises at least 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 2000, 2500, 3000, or at least 3500 amplicons.
- at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of less than 1000 bases are direct copies of the at least one target nucleic acid molecule.
- At least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of no more than 2000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of 3000-5000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, the ratio of direct copy amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000:1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1.
- the ratio of direct copy amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000: 1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1, wherein the direct copy amplicons are no more than 700-1200 bases in length. In some instances, the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000: 1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1.
- the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000: 1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1, wherein the direct copy amplicons are 700-1200 bases in length, and the daughter amplicons are 2500-6000 bases in length.
- the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50- 1000, about 150-2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule.
- the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50-1000, about 150- 2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule or daughter amplicons.
- the number of direct copies may be controlled in some instances by the number of amplification cycles. In some instances, no more than 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or 3 cycles are used to generate copies of the target nucleic acid molecule. In some instances, about 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or about 3 cycles are used to generate copies of the target nucleic acid molecule.
- cycles are used to generate copies of the target nucleic acid molecule.
- 2-4, 2-5, 2-7, 2-8, 2-10, 2-15, 3-5, 3-10, 3-15, 4-10, 4-15, 5-10 or 5-15 cycles are used to generate copies of the target nucleic acid molecule.
- Amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further amplification. In some instances, such additional steps precede a sequencing step.
- the cycles are PCR cycles.
- the cycles represent annealing, extension, and denaturation.
- the cycles represent annealing, extension, and denaturation which occur under isothermal or essentially isothermal conditions.
- Methods described herein may additionally comprise one or more enrichment or purification steps.
- one or more polynucleotides (such as cDNA, PTA amplicons, or other polynucleotide) are enriched during a method described herein.
- polynucleotide probes are used to capture one or more polynucleotides.
- probes are configured to capture one or more genomic exons.
- a library of probes comprises at least 1000, 2000, 5000, 10,000, 50,000, 100,000, 200,000, 500,000, or more than 1 million different sequences.
- a library of probes comprises sequences capable of binding to at least 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000 or more than 10,000 genes.
- probes comprise a moiety for capture by a bead, such as biotin.
- an enrichment step occurs after a PTA step.
- an enrichment step occurs before a PTA step.
- probes are configured to bind genomic DNA libraries.
- probes are configured to bind cDNA libraries.
- Amplicon libraries of polynucleotides generated from the PTA methods and compositions (terminators, polymerases, etc.) described herein in some instances have increased uniformity. Uniformity, in some instances, is described using a Lorenz curve, or other such method. Such increases in some instances lead to lower sequencing reads needed for the desired coverage of a target nucleic acid molecule (e.g., genomic DNA, RNA, or other target nucleic acid molecule). For example, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 80% of a cumulative fraction of sequences of the target nucleic acid molecule.
- no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 60% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 70% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 90% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, uniformity is described using a Gini index (wherein an index of 0 represents perfect equality of the library and an index of 1 represents perfect inequality).
- amplicon libraries described herein have a Gini index of no more than 0.55, 0.50, 0.45, 0.40, or 0.30. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50. In some instances, amplicon libraries described herein have a Gini index of no more than 0.40.
- Such uniformity metrics in some instances are dependent on the number of reads obtained. For example, no more than 100 million, 200 million, 300 million, 400 million, or no more than 500 million reads are obtained. In some instances, the read length is about 50, 75, 100, 125, 150, 175, 200, 225, or about 250 bases in length. In some instances, uniformity metrics are dependent on the depth of coverage of a target nucleic acid.
- the average depth of coverage is about 10X, 15X, 20X, 25X, or about 30X. In some instances, the average depth of coverage is 10-3 OX, 20-5 OX, 5-40X, 20-60X, 5-20X, or 10-20X.
- amplicon libraries described herein have a Gini index of no more than 0.55, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein about 300 million reads was obtained.
- amplicon libraries described herein have a Gini index of no more than 0.55, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is about 15X.
- amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is no more than 15X.
- amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is no more than 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is no more than 15X. Uniform amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further PCR amplification. In some instances, such additional steps precede a sequencing step.
- Primers comprise nucleic acids used for priming the amplification reactions described herein.
- Such primers in some instances include, without limitation, random deoxynucleotides of any length with or without modifications to make them exonuclease resistant, random ribonucleotides of any length with or without modifications to make them exonuclease resistant, modified nucleic acids such as locked nucleic acids, DNA or RNA primers that are targeted to a specific genomic region, and reactions that are primed with enzymes such as primase.
- a set of primers having random or partially random nucleotide sequences be used.
- nucleic acid sample of significant complexity specific nucleic acid sequences present in the sample need not be known and the primers need not be designed to be complementary to any particular sequence. Rather, the complexity of the nucleic acid sample results in a large number of different hybridization target sequences in the sample, which will be complementary to various primers of random or partially random sequence.
- the complementary portion of primers for use in PTA are in some instances fully randomized, comprise only a portion that is randomized, or be otherwise selectively randomized.
- the number of random base positions in the complementary portion of primers in some instances, for example, is from 20% to 100% of the total number of nucleotides in the complementary portion of the primers.
- the number of random base positions in the complementary portion of primers is 10% to 90%, 15-95%, 20%-100%, 30%-100%, 50%- 100%, 75-100% or 90-95% of the total number of nucleotides in the complementary portion of the primers. In some instances, the number of random base positions in the complementary portion of primers is at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or at least 90% of the total number of nucleotides in the complementary portion of the primers.
- Sets of primers having random or partially random sequences are in some instances synthesized using standard techniques by allowing the addition of any nucleotide at each position to be randomized. In some instances, sets of primers are composed of primers of similar length and/or hybridization characteristics.
- random primer refers to a primer which can exhibit four-fold degeneracy at each position. In some instances, the term “random primer” refers to a primer which can exhibit three-fold degeneracy at each position.
- Random primers used in the methods described herein in some instances comprise a random sequence that is 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more bases in length. In some instances, primers comprise random sequences that are 3-20, 5-15, 5-20, 6-12, or 4-10 bases in length. Primers may also comprise non-extendable elements that limit subsequent amplification of amplicons generated thereof. For example, primers with non-extendable elements in some instances comprise terminators.
- primers comprise terminator nucleotides, such as 1, 2, 3, 4, 5, 10, or more than 10 terminator nucleotides. Primers need not be limited to components which are added externally to an amplification reaction. In some instances, primers are generated in-situ through the addition of nucleotides and proteins which promote priming. For example, primase-like enzymes in combination with nucleotides is in some instances used to generate random primers for the methods described herein. Primase-like enzymes in some instances are members of the DnaG or AEP enzyme superfamily. In some instances, a primase- like enzyme is TthPrimPol. In some instances, a primase-like enzyme is T7 gp4 helicase- primase.
- primases are in some instances used with the polymerases or strand displacement factors described herein. In some instances, primases initiate priming with deoxyribonucleotides. In some instances, primases initiate priming with ribonucleotides. In some instances, primers are irreversible primers. In some instances, irreversible primers comprise phosphonothioate linkages.
- the PTA amplification can be followed by selection for a specific subset of amplicons. Such selections are in some instances dependent on size, affinity, activity, hybridization to probes, or other known selection factor in the art. In some instances, selections precede or follow additional steps described herein, such as adapter ligation and/or library amplification. In some instances, selections are based on size (length) of the amplicons. In some instances, smaller amplicons are selected that are less likely to have undergone exponential amplification, which enriches for products that were derived from the primary template while further converting the amplification from an exponential into a quasi-linear amplification process.
- amplicons comprising 50-2000, 25-5000, 40-3000, 50-1000, 200-1000, 300- 1000, 400-1000, 400-600, 600-2000, or 800-1000 bases in length are selected. Size selection in some instances occurs with the use of protocols, e.g., utilizing solid-phase reversible immobilization (SPRI) on carboxylated paramagnetic beads to enrich for nucleic acid fragments of specific sizes, or other protocol known by those skilled in the art.
- SPRI solid-phase reversible immobilization
- selection occurs through preferential ligation and amplification of smaller fragments during PCR while preparing sequencing libraries, as well as a result of the preferential formation of clusters from smaller sequencing library fragments during sequencing (e.g., sequencing by synthesis, nanopore sequencing, or other sequencing method).
- Other strategies to select for smaller fragments are also consistent with the methods described herein and include, without limitation, isolating nucleic acid fragments of specific sizes after gel electrophoresis, the use of silica columns that bind nucleic acid fragments of specific sizes, and the use of other PCR strategies that more strongly enrich for smaller fragments. Any number of library preparation protocols may be used with the PTA methods described herein.
- Amplicons generated by PTA are in some instances ligated to adapters (optionally with removal of terminator nucleotides).
- amplicons generated by PTA comprise regions of homology generated from transposase-based fragmentation which are used as priming sites.
- libraries are prepared by fragmenting nucleic acids mechanically or enzymatically.
- libraries are prepared using tagmentation via transposomes.
- libraries are prepared via ligation of adapters, such as Y-adapters, universal adapters, or circular adapters.
- the non-complementary portion of a primer used in PTA can include sequences which can be used to further manipulate and/or analyze amplified sequences.
- Detection tags have sequences complementary to detection probes and are detected using their cognate detection probes. There may be one, two, three, four, or more than four detection tags on a primer. There is no fundamental limit to the number of detection tags that can be present on a primer except the size of the primer. In some instances, there is a single detection tag on a primer. In some instances, there are two detection tags on a primer. When there are multiple detection tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different detection probe. In some instances, multiple detection tags have the same sequence. In some instances, multiple detection tags have a different sequence.
- a sequence that can be included in the non-complementary portion of a primer is an “address tag” that can encode other details of the amplicons, such as the location in a tissue section.
- a cell barcode comprises an address tag.
- An address tag has a sequence complementary to an address probe. Address tags become incorporated at the ends of amplified strands. If present, there may be one, or more than one, address tag on a primer. There is no fundamental limit to the number of address tags that can be present on a primer except the size of the primer. When there are multiple address tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different address probe.
- the address tag portion can be any length that supports specific and stable hybridization between the address tag and the address probe.
- nucleic acids from more than one source can incorporate a variable tag sequence.
- This tag sequence can be up to 100 nucleotides in length, preferably 1 to 10 nucleotides in length, most preferably 4, 5 or 6 nucleotides in length and comprises combinations of nucleotides.
- a tag sequence is 1-20, 2-15, 3-13, 4-12, 5-12, or 1-10 nucleotides in length For example, if six base-pairs are chosen to form the tag and a permutation of four different nucleotides is used, then a total of 4096 nucleic acid anchors (e.g.
- tags identify the source of a sample or analyte. In some instances, tags uniquely identify every molecule in a population.
- Primers described herein may be present in solution or immobilized on a bead. In some instances, primers bearing sample barcodes and/or UMI sequences can be immobilized on a bead. In some instances, individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell. In some instances, lysates from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates.
- extracted nucleic acid from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell.
- the beads can be manipulated in any suitable manner as is known in the art.
- the beads may be any suitable size, including for example, microbeads, microparticles, nanobeads and nanoparticles.
- beads are magnetically responsive; in other embodiments beads are not significantly magnetically responsive.
- suitable beads include flow cytometry microbeads, polystyrene microparticles and nanoparticles, functionalized polystyrene microparticles and nanoparticles, coated polystyrene microparticles and nanoparticles, silica microbeads, fluorescent microspheres and nanospheres, functionalized fluorescent microspheres and nanospheres, coated fluorescent microspheres and nanospheres, color dyed microparticles and nanoparticles, magnetic microparticles and nanoparticles, superparamagnetic microparticles and nanoparticles (e.g., DYNABEADS® available from Invitrogen Group, Carlsbad, CA), fluorescent microparticles and nanoparticles, coated magnetic microparticles and nanoparticles, ferromagnetic microparticles and nanoparticles, coated ferromagnetic microparticles and nanoparticles, and those described in U.S.
- DYNABEADS® available from Invitrogen Group, Carlsbad, CA
- Beads may be pre-coupled with an antibody, protein or antigen, DNA/RNA probe or any other molecule with an affinity for a desired target.
- primers bearing sample barcodes and/or UMI sequences can be in solution.
- a plurality of partitions can be presented, wherein each partition in the plurality bears a sample barcode which is unique to a partition and the UMI which is unique to a molecule such that the UMI are repeated many times within a collection of partition
- individual cells are contacted with a partition having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell.
- lysates from individual cells are contacted with a partition having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates.
- extracted nucleic acid from individual cells are contacted with a partition having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell.
- PTA primers may comprise a sequence-specific or random primer, a cell barcode and/or a unique molecular identifier (UMI) (e.g., linear primer and or hairpin primer).
- UMI unique molecular identifier
- a primer comprises a sequence-specific primer.
- a primer comprises a random primer.
- a primer comprises a cell barcode.
- a primer comprises a sample barcode.
- a primer comprises a unique molecular identifier.
- primers comprise two or more cell barcodes. Such barcodes in some instances identify a unique sample source, or unique workflow.
- Such barcodes or UMIs are in some instances 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 25, 30, or more than 30 bases in length.
- Primers in some instances comprise at least 1000, 10,000, 50,000, 100,000, 250,000, 500,000, 10 6 , 10 7 , 10 8 , 10 9 , or at least 10 10 unique barcodes or UMIs.
- primers comprise at least 8, 16, 96, or 384 unique barcodes or UMIs.
- a standard adapter is then ligated onto the amplification products prior to sequencing; after sequencing, reads are first assigned to a specific cell based on the cell barcode.
- Suitable adapters that may be utilized with the PTA method include, e.g., xGen® Dual Index UMI adapters available from Integrated DNA Technologies (IDT). Reads from each cell is then grouped using the UMI and reads with the same UMI may be collapsed into a consensus read.
- the use of a cell barcode allows all cells to be pooled prior to library preparation, as they can later be identified by the cell barcode.
- the use of the UMI to form a consensus read in some instances corrects for PCR bias, improving the copy number variation (CNV) detection.
- sequencing errors may be corrected by requiring that a fixed percentage of reads from the same molecule have the same base change detected at each position. This approach has been utilized to improve CNV detection and correct sequencing errors in bulk samples.
- UMIs are used with the methods described herein, for example, U.S Pat. No. 8,835,358 discloses the principle of digital counting after attaching a random amplifiable barcode. Schmitt, et al and Fan et al. disclose similar methods of correcting sequencing errors.
- a library is generated for sequencing using primers.
- the library comprises fragments of 200-700 bases, 100-1000, 300-800, 300-550, 300-700, or 200-800 bases in length.
- the library comprises fragments of at least 50, 100, 150, 200, 300, 500, 600, 700, 800, or at least 1000 bases in length.
- the library comprises fragments of about 50, 100, 150, 200, 300, 500, 600, 700, 800, or about 1000 bases in length.
- the methods described herein may further comprise additional steps, including steps performed on the sample or template.
- samples or templates are in some cases subjected to one or more steps prior to PTA.
- samples comprising cells are subjected to a pre-treatment step.
- cells undergo lysis and proteolysis to increase chromatin accessibility using a combination of freeze-thawing, Triton X-100, Tween 20, and Proteinase K.
- Other lysis strategies may also be suitable for practicing the methods described herein. Such strategies include, without limitation, lysis using other combinations of detergent and/or lysozyme and/or protease treatment and/or physical disruption of cells such as sonication and/or alkaline lysis and/or hypotonic lysis.
- the primary template or target molecule(s) is subjected to a pre-treatment step.
- the primary template (or target) is denatured using sodium hydroxide, followed by neutralization of the solution.
- Other denaturing strategies may also be suitable for practicing the methods described herein. Such strategies may include, without limitation, combinations of alkaline lysis with other basic solutions, increasing the temperature of the sample and/or altering the salt concentration in the sample, addition of additives such as solvents or oils, other modification, or any combination thereof.
- additional steps include sorting, filtering, or isolating samples, templates, or amplicons by size.
- cells are lysed with mechanical (e.g., high pressure homogenizer, bead milling) or non-mechanical (physical, chemical, or biological).
- physical lysis methods comprise heating, osmotic shock, and/or cavitation.
- chemical lysis comprises alkali and/or detergents.
- biological lysis comprises use of enzymes. Combinations of lysis methods are also compatible with the methods described herein. Non-limited examples of lysis enzymes include recombinant lysozyme, serine proteases, and bacterial lysins.
- lysis with enzymes comprises use of lysozyme, lysostaphin, zymolase, cellulose, protease or glycanase.
- amplicon libraries are enriched for amplicons having a desired length.
- amplicon libraries are enriched for amplicons having a length of 50-2000, 25-1000, 50-1000, 75-2000, 100-3000, 150-500, 75-250, 170-500, 100-500, or 75-2000 bases.
- amplicon libraries are enriched for amplicons having a length no more than 75, 100, 150, 200, 500, 750, 1000, 2000, 5000, or no more than 10,000 bases.
- amplicon libraries are enriched for amplicons having a length of at least 25, 50, 75, 100, 150, 200, 500, 750, 1000, or at least 2000 bases.
- buffers or other formulations Such buffers are in some instances used for PTA, RT, or other method described herein.
- buffers in some instances comprise surfactants/detergent or denaturing agents (Tween-20, DMSO, DMF, pegylated polymers comprising a hydrophobic group, or other surfactant), salts (potassium or sodium phosphate (monobasic or dibasic), sodium chloride, potassium chloride, TrisHCl, magnesium chloride or sulfate, Ammonium salts such as phosphate, nitrate, or sulfate, EDTA), reducing agents (DTT, THP, DTE, beta-mercaptoethanol, TCEP, or other reducing agent) or other components (glycerol, hydrophilic polymers such as PEG).
- surfactants/detergent or denaturing agents Tween-20, DMSO, DMF, pegylated polymers comprising a hydrophobic group, or other surfactant
- salts
- buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. In some instances, buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. Buffers may comprise one or more crowding agents. In some instances, crowding reagents include polymers. In some instances, crowding reagents comprise polymers such as polyols. In some instances, crowding reagents comprise polyethylene glycol polymers (PEG). In some instances, crowding reagents comprise polysaccharides.
- crowding reagents include ficoll (e.g., ficoll PM 400, ficoll PM 70, or other molecular weight ficoll), PEG (e.g., PEG1000, PEG 2000, PEG4000, PEG6000, PEG8000, or other molecular weight PEG), dextran (dextran 6, dextran 10, dextran 40, dextran 70, dextran 6000, dextran 138k, or other molecular weight dextran).
- ficoll e.g., ficoll PM 400, ficoll PM 70, or other molecular weight ficoll
- PEG e.g., PEG1000, PEG 2000, PEG4000, PEG6000, PEG8000, or other molecular weight PEG
- dextran dextran
- the nucleic acid molecules amplified according to the methods described herein may be sequenced and analyzed using methods known to those of skill in the art.
- the sequencing methods which in some instances are used include, e.g., sequencing by hybridization (SBH), sequencing by ligation (SBL) (Shendure et al. (2005) Science 309: 1728), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter probe digestion, pyrosequencing, fluorescent in situ sequencing (FISSEQ), FISSEQ beads (U.S. Pat. No. 7,425,431), wobble sequencing (Int. Pat. Appl. Pub. No.
- allele-specific oligo ligation assays e.g., oligo ligation assay (OLA), single template molecule OLA using a ligated linear probe and a rolling circle amplification (RCA) readout, ligated padlock probes, and/or single template molecule OLA using a ligated circular padlock probe and a rolling circle amplification (RCA) readout
- high- throughput sequencing methods such as, e.g., methods using Roche 454, Illumina Solexa, AB- SOLiD, Helicos, Polonator platforms and the like, and light-based sequencing technologies (Landegren et al. (1998) Genome Res.
- the amplified nucleic acid molecules are shotgun sequenced. Sequencing of the sequencing library is in some instances performed with any appropriate sequencing technology, including but not limited to single-molecule realtime (SMRT) sequencing, Polony sequencing, sequencing by ligation, reversible terminator sequencing, proton detection sequencing, ion semiconductor sequencing, nanopore sequencing, electronic sequencing, pyrosequencing, Maxam-Gilbert sequencing, chain termination (e.g., Sanger) sequencing, +S sequencing, or sequencing by synthesis (array/colony -based or nanoball based).
- SMRT single-molecule realtime
- Polony sequencing sequencing by ligation
- reversible terminator sequencing proton detection sequencing
- ion semiconductor sequencing nanopore sequencing
- electronic sequencing pyrosequencing
- Maxam-Gilbert sequencing Maxam-Gilbert sequencing
- chain termination e.g., Sanger sequencing
- +S sequencing or sequencing by synthesis (array/colony -based or nanoball based).
- Sequencing libraries generated using the methods described herein may be sequenced to obtain a desired number of sequencing reads.
- libraries are generated from a single cell or sample comprising a single cell (alone or part of a multiomics workflow).
- libraries are sequenced to obtain at least 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or at least 10 million reads.
- libraries are sequenced to obtain no more than 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or no more than 10 million reads.
- libraries are sequenced to obtain about 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or about 10 million reads. In some instances, libraries are sequenced to obtain 0.1-10, 0.1-5, 0.1-1, 0.2-1, 0.3-1.5, 0.5-1, 1-5, or 0.5-5 million reads per sample. In some instances, the number of reads is dependent on the size of the genome. In some in instances samples comprising bacterial genomes are sequenced to obtain 0.5-1 million reads. In some instances, libraries are sequenced to obtain at least 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or at least 900 million reads.
- libraries are sequenced to obtain no more than 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or no more than 900 million reads. In some instances, libraries are sequenced to obtain about 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or about 900 million reads. In some in instances samples comprising mammalian genomes are sequenced to obtain 500-600 million reads. In some instances, the type of sequencing library (cDNA libraries or genomic libraries) are identified during sequencing. In some instances, cDNA libraries and genomic libraries are identified during sequencing with unique barcodes.
- cycle when used in reference to a polymerase-mediated amplification reaction is used herein to describe steps of dissociation of at least a portion of a double stranded nucleic acid (e.g., a template from an amplicon, or a double stranded template, denaturation), hybridization of at least a portion of a primer to a template (annealing), and extension of the primer to generate an amplicon.
- a double stranded nucleic acid e.g., a template from an amplicon, or a double stranded template, denaturation
- hybridization of at least a portion of a primer to a template annealing
- extension of the primer to generate an amplicon.
- the temperature remains constant during a cycle of amplification (e.g., an isothermal reaction).
- the number of cycles is directly correlated with the number of amplicons produced.
- the number of cycles for an isothermal reaction is controlled by the amount of time the reaction is allowed to proceed
- methods of genetic analysis using PTA determine if a cell (e.g., fetal cell) comprises a genetic abnormality.
- the methods described herein provide non-abnormal genetic information, such as the sex of the embryo.
- the methods described herein establish the presence or absence of sex chromosomes.
- the genetic abnormality includes aneuploidy, monogenic disorders, and structural rearrangements.
- genetic analysis is conducted on pre-implantation embryonic cells.
- genetic analysis comprises one or more of PGT-A, PGT-M, and PGT-SR genetic tests.
- the genetic abnormality comprises aneuploidy.
- aneuploidy comprises monosomy, trisomy, triploidy, deletions, duplications, or uniparental disomy. In some instances, aneuploidy occurs at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or at least 10 chromosomes. In some instances, aneuploidy occurs in about 1, 2, 3, 4, 5, 6, 7, 8, 9, or about 10 chromosomes. In some instances, aneuploidy occurs in no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or no more than 10 chromosomes. In some instances, aneuploidy occurs at one or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23.
- aneuploidy occurs at one or more of chromosomes 13, 18, or 21. In some instances aneuploidy occurs at one or more of chromosomes 6, 7, 11, 14, or 15.
- the genetic abnormality comprises one or more of an insertion, deletion or duplication. In some instances, the insertion, deletion or duplication is at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, or at least 20% of the total chromosome length.
- the insertion, deletion or duplication is about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, or about 20% of the total chromosome length. In some instances, the insertion, deletion or duplication is no more than 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, or no more than 20% of the total chromosome length.
- the insertion, deletion or duplication is l%-30%, l%-20%, 1%-15%, l%-10%, l%-5%, 2%-20%, 3%-25%, 4%-20%, 5%-l 5%, 5%-30%, 5%-20%, 10%-30%, or 15%-30% of the total chromosome length.
- the methods e.g., PTA
- a mutation is a difference between an analyzed sequence (e.g., using the methods described herein) and a reference sequence.
- Reference sequences are in some instances obtained from other organisms, other individuals of the same or similar species, other cells in the same organism, populations of organisms, or other areas of the same genome.
- mutations are identified on a plasmid or chromosome.
- a mutation is an SNV (single nucleotide variation), SNP (single nucleotide polymorphism), or CNV (copy number variation, or CNA/copy number aberration).
- a mutation is base substitution, insertion, or deletion.
- a mutation is a transition, transversion, nonsense mutation, silent mutation, synonymous or non-synonymous mutation, non-pathogenic mutation, missense mutation, or frameshift mutation (deletion or insertion).
- PTA results in higher detection sensitivity and/or lower rates of false positives for the detection of mutations when compared to methods such as in-silico prediction, ChlP-seq, GUIDE-seq, circle-seq, HTGTS (High-Throughput Genome-Wide Translocation Sequencing), IDLV (integrationdeficient lentivirus), Digenome-seq, FISH (fluorescence in situ hybridization), or DISCOVER- seq.
- a fetal genetic abnormality is detected with at sensitivity of at least 0.001%, 0.01%, 0.1%, 0.5%, 1%, 2%, 5%, 10%, or at least 20%.
- Genetic abnormalities may be linked to specific genetic diseases.
- methods described herein such as PTA are used to identify genetic diseases.
- the disease is caused by a chromosomal abnormality.
- the disease comprises Down syndrome, Patau syndrome, Klinefelter Syndrome, Turner Syndrome, or Edwards Syndrome.
- the disease is caused by a single gene defect.
- the disease comprises phenylketonuria (PKU), sickle-cell anemia, Beta Thalassemia, Tay-Sachs disease, Sandhoff disease, or cystic fibrosis (CF).
- the disease comprises achondroplasia, congenital adrenal hyperplasia, Cystic fibrosis, Down syndrome, fragile XD syndrome, Hemophilia A, Huntington's disease, Muscular dystrophy, Polycystic kidney disease, Sickle cell disease, Tay-Sachs disease, trisomy 21, trisomy 18, trisomy 13, Turner syndrome, spina bifida, anencephaly, or Thalassemia.
- a single-well integration of single-cell transcriptome and genome amplification where a standard PTA reaction was modified to include a reverse transcription (RT) step prior to singlecell genome amplification was designed and executed, and designated as multiomic enrichment (ResolveOME, Bioskryb Genomics, Inc.).
- RT reverse transcription
- ResolveOME Bioskryb Genomics, Inc.
- PTA amplifies the genomes of single cells immediately after the RT reaction is concluded in a single-well reaction.
- barcoded first-strand cDNA molecules were created that were affinity purified and pre-amplified prior to RNA-Seq sequencing library creation.
- the net result from the combined amplification reaction was a biotin labeled cDNA pool derived primarily from the cytosolic transcripts, available for streptavidin purification, and a pool of amplified genomic material from the single cell.
- magnetic beads with attached RT primers can be used for direct removal of the cDNA amplicon library.
- the cDNA fraction is separated from the amplified genome material whereby libraries from each pool were created.
- the resulting sequencing data offered the ability to define both genomic and transcriptomic plasticity at single-cell resolution. Specifically, the delineation of isoform expression, combined with ability to annotate the underlying structural variation and single nucleotide changes from the genome of the same cell, allowed the assessment of genomic “penetrance”, and the definition of mechanisms that drive single-cell fate.
- definition of clonal evolution at the SNV/CNV level in a primary patient sample was accomplished utilizing G&T-seq, yet was limited to a candidate gene survey of exome-level data whereby clusters where defined by 59 oncogenes and another studying employing G&T-seq limited their analysis to the RNA workflow of the method to take advantage of the low input requirement, without assessment of genomic level data.
- G&T-seq a candidate gene survey of exome-level data whereby clusters where defined by 59 oncogenes and another studying employing G&T-seq limited their analysis to the RNA workflow of the method to take advantage of the low input requirement, without assessment of genomic level data.
- RNA+DNA multiomics workflow highlighting heterogenous genomic variation and consequential phenotypic alterations in single cells that both are correlated with the development of resistance to a targeted therapeutic in a cell line model of acute myeloid leukemia, and in oncogenic mechanisms in primary breast cancer cells whereby the insights gained could not be inferred by a single dataset (genome or transcriptome) alone.
- Amplification product yield of RNA+DNA multiomics workflow highlighting heterogenous genomic variation and consequential phenotypic alterations in single cells that both are correlated with the development of resistance to a targeted therapeutic in a cell line model of acute myeloid leukemia, and in oncogenic mechanisms in primary breast cancer cells whereby the insights gained could not be inferred by a single dataset (genome or transcriptome) alone.
- RNA and DNA arms of the protocol were first assessed using metrics from the templateswitching RNA-Seq chemistry or PTA chemistry in isolation to compare to the metrics when the chemistries were unified in the combined multiomics protocol.
- Multiomics data with FACS-sorted NA12878 single cells was generated with purified total NA12878 RNA or genomic DNA as amplification controls. Approximately 1-1.5 pg of DNA amplification product from single cell genomes and approximately 100-200 ng of cDNA product representing the single cell transcriptome was obtained. Importantly, no-template control (NTC) reactions showed lack of detectable product and additionally there was negligible ( ⁇ 50 ng) yield in the DNA fraction from control RNA input using Qubit fluorometer (ThermoFisher). Low-level background amplification of the genomic DNA control input in the cDNA fraction was observed, due to known promiscuity of reverse transcriptase in the absence of mRNA template. By contrast, this background amplification does not occur in reactions with single cells as the genome material is sequestered in the non-lysed nucleus during the reverse transcription workflow of multiomics.
- the PTA method was modified for use in a multiomics workflow. After reverse transcription has completed, dUTP was added to the normal nucleotide mix (dATP, dCTP, dGTP, dTTP) during phi29 amplification (red dot), resulting in PTA amplification products derived from the original single-cell or low-input template DNA being “marked” with dUTP.
- dUTP normal nucleotide mix
- dATP, dCTP, dGTP, dTTP normal nucleotide mix
- red dot normal nucleotide mix
- PTA amplification products derived from the original single-cell or low-input template DNA being “marked” with dUTP.
- a UDG incubation step occurred on beads after affinity purification and washes of the cDNA, to digest the background dUTP-marked PTA product prior to preamplification of the cDNA (green dot).
- the cDNA libraries utilized a normal high-fidelity polymerase, however, the PTA-derived libraries representing the DNA arm of the multiomics workflow used a uracil tolerant polymerase in order to amplify the library ligation products of uracil-containing PTA product (yellow dot).
- the number of expressed genes detected was reduced following UDG treatment; indicating that transcript counts in the absence of UDG treatment were likely compounded by DNA (PTA) background.
- IGV visualization 700 kb region, harboring 3 genes of intergenic read background removal upon UDG scheme. Each row was a single-cell (NA12878) Multiomic RNA fraction library.
- DNA background reads was seen in the top two control RNA libraries when PTA was performed lacking dUTP, and these background reads progressively diminished as more dUTP is included during PTA.
- the ratio of nucleotides was 1 : 1 dUTP:dTTP; PTA reactions containing dUTP exclusively with no dTTP were slower kinetically.
- the DNA background removal benefits of increased dUTP in the PTA reaction (C) did not adversely affect allelic balance and SNV calling precision and sensitivity metrics.
- Uracil tolerant polymerases may be used with the methods described herein to amplify uracil-containing templates (e.g., with PTA).
- a uracil tolerant polymerase maintains at least 50, 60, 70, 80, 85, 90, 95, 97, or 99% polymerase activity when amplifying a template comprising uracil as compared to a template without uracil.
- a uracil tolerant polymerase is derived from archaea, yeast, or bacterial species.
- a uracil tolerant polymerase comprises DNA polymerases a and 5 from S. cerevisiae. and E. coli DNA polymerase III, PolA-type polymerases such as Taq, KAPA HiFi Uracil+ DNA Polymerase (Kapa biosystems, Q5U), KOD Multi & Epi DNA Polymerase, FastStart Taq (Roche), Taq2000 (Agilent Technologies), FailSafe Enzyme (Epicentre) or Thermo PhusionU.
- a uracil tolerant polymerase comprises a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% identity with DNA polymerases a and 5 from S.
- a uracil tolerant polymerase comprises a modification to one or more amino acid residues in the dUTP binding pocket.
- allelic balance was reviewed, (ability to represent both alleles through enrichment and a strength of genomic PTA methodology).
- ADO allelic drop out
- allelic balance is the proportion of known heterozygous loci that are called heterozygous following sequencing. Variants within these loci have allele frequencies between 10% and 90% at each locus.
- a review of allelic balance of the multiomics workflow showed 85.5% (+/-3.4%), which is closely comparable to the 88.2% (+/- 4%) for genomic DNA only workflow, across 10 replicates each.
- Genomic coverage at a range of depths did not significantly differ between the workflows. Lastly, it was critical to demonstrate that the allelic balance and coverage obtained from the multiomics workflow culminated in the ability to call SNVs with confidence. This example highlights individual multiomics NA12878 cells with a SNV calling sensitivity range of 0.90-0.95 and with precision >0.99, akin to genomic DNA-only data. Collectively, these data suggest that, despite the upstream reverse transcription chemistry modifications to generate transcriptome data, amplification performance of single-cell genomes by PTA persists in performance.
- RNA arm workflow In choosing a transcriptomic scheme to unite with PTA one goal was to be as comprehensive as possible in capturing the diversity of RNA-based modes of oncogenic and drug resistance mechanisms, and, equally as importantly, to enable the ascertainment of genomic lesions manifesting at the RNA level.
- a template- switching reverse transcription scheme was designed for the multiomics workflow that captured full-transcript information as opposed to either 5’ or 3’ end counting to enhance ability to detect isoforms and identify fusions. This chemistry enables even coverage across transcripts, where increased coverage of the 5’ region (top) which typically is affected by degradation (or reverse transcriptase performance) proportional to the distance from 3 ’-poly A, is shown. This confirms behavior of the templateswitching chemistry in the RNA arm workflow.
- HBRR Human Brain Reference RNA
- UHRR Universal Human Reference RNA
- Read and genomic feature mapping percentages were identified, as well as total genes discovered as criterion for evaluating sequencing quality.
- the dynamic range of expression and expression patterns in well- known housekeeping genes was also examined, and various markers of DNA contamination, sample degradation, and/or bias as a percentage of exonic (more than 55%), and intergenic mapping (less than 5 %) as characteristics of the multiomics RNA fraction were computed.
- Another important metric for measuring the quality of single cell experiments was the number of genes found (>0 counts) per cell.
- NA12878 cells there was an average of approximately 2500, whereas the average number of HBRR and UHRR genes discovered was around 6 and 7 thousand, respectively.
- MAD median absolute deviation
- CV percent coefficient of variation
- the relative types of other RNA species detected with the multiomics chemistry, including IncRNAs, snRNAs, and pseudogenes are shown. Relative proportions of features were concordant between the template-switching RT chemistry in isolation vs. in the combined RNA/DNA workflow in multiomics, and overall concordance was observed between purified RNA input template vs. single cells, with the exception that single cells revealed more intronic reads of protein coding genes than did the purified RNA input.
- mitochondrial read percentage was ⁇ 10%, with most cells averaging less than 5%, indicating that single-cell lysis was optimal for capturing mRNA and other polyadenylated transcripts and that the amplified cells were healthy.
- EXAMPLE 2 Use of uracil tolerant polymerase for improved multiomics
- cDNA was generated from single cell RNA using reverse transcription. cDNA amplicons were generated using biotinylated poly dT primers.
- the PTA method was used to amplify genomic DNA from the cell, wherein the mixture of dNTPs comprises uracil.
- cDNA was then purified from the mixture using streptavidin, and further treated with uracil DNA glycosylase (UDG) and DNA glycosylase- lyase Endonuclease VIII to remove any residual genomic amplicons from the cDNA.
- the genomic fragments generated from PTA were then purified, and both cDNA and genomic DNA fractions were converted into sequencing-ready libraries using adapter ligation.
- a uracil-tolerant polymerase was used to amplify the PTA-generated genomic fragments.
- the effect of the concentration of the ionic surfactant (in this example, SDS) contained in the storage buffer were evaluated with the PTA reaction (following the general methods of Examples 1 and 2) and results obtained from the sample after storage.
- cellular samples comprising one or more cells were sorted into containers containing storage buffer.
- the storage buffer used for each sample comprised 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and varying concentrations of SDS (ionic surfactant). The results are presented in FIG. 1.
- FIG. 1 presents amplification multicomponent plots of the samples.
- the sample of plot 101 comprised 0% SDS in its storage buffer.
- the sample of plot 102 comprised 0.005% SDS in its storage buffer.
- the sample of plot 103 comprised 0.01% SDS in its storage buffer.
- the sample of plot 104 comprised 0.05% SDS in its storage buffer.
- the sample of plot 105 comprised 0.1% SDS in its storage buffer.
- the sample of plot 106 comprised 0.5% SDS in its storage buffer.
- the best results were observed in plots 103 and 104, indicating that an SDS concentration of from about 0.01% to about 0.05% by volume was optimal for the storage buffer to optimize the amplification results of the sample stored in the storage buffer.
- the samples presented in plots 205 to 208 were neutralized with neutralizing buffers comprising varying concentrations of Tween (e.g., Tween- 20).
- concentration of Tween-20 in neutralizing buffers of plots 205 to 208 were 0.5%, 1%, 1.5%, and 2% respectively.
- the best results were observed in plot 206 (Tween-20 concentration of about 1% in the neutralizing buffer).
- plots 209 to 212 present amplification results of samples stored in storage buffers with varying concentrations of ammonium sulfate.
- concentrations of ammonium sulfate in the storage buffers of plots 209 to 212 were 5 mM, 10 mM, 15 mM, and 20 mM respectively. The best results were observed in plot 211 (ammonium sulfate concentration of 15 mM).
- FIG. 3 presents broad first pass experiment preamplification quality control data for samples stored in different buffer compositions.
- SB4 represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.05% SDS (ionic surfactant). The yield for the SB4 sample condition was found to be 10.736 nanograms (ng).
- SB4Tweenl represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 0.5%. The yield for the “SB4Tweenl” condition was found to be 242 ng.
- “SB4Tween2” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 1%. The yield for the “SB4Tween2” condition was found to be 277.2 ng.
- “SB4Tween3” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 1.5%.
- “SB4Tween3” was found to be 129.58 ng.
- “SB4Tween4” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 2%. The yield for the “SB4Tween4” condition was found to be 11.066 ng.
- SB4Tritonl represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 0.2%. The yield for the “SB4Tritonl” condition was found to be 9.636 ng.
- SB4Triton2 represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 0.5%. The yield for the SB4Tritonl condition was found to be 177.98 ng.
- SB4Triton3 represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 0.75%. The yield for the SB4Triton3 condition was found to be 77 ng.
- SB4Triton4 represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 1%. The yield for the SB4Triton4 condition was found to be 83.6 ng.
- SB4AS1 represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 5 mM. The yield for the SB4AS1 condition was found to be 9.658 ng.
- SB4AS2 represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 10 mM. The yield for the SB4AS1 condition was found to be 6.578 ng.
- SB4AS3 represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 15 mM. The yield for the SB4AS1 condition was found to be 3.63 ng.
- SB4AS4 represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 20 mM. The yield for the SB4AS1 condition was found to be 8.074 ng.
- FIG. 4A presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of ionic surfactant and different compositions of neutralizing buffer on which reverse transcription or pre-amplification has been performed.
- the samples presented in plots 401 and 402 were stored in storage buffer SB4.
- “SB4” represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.05% SDS (ionic surfactant).
- the samples presented in plots 403 and 404 were stored in storage buffer SB5.
- “SB5” represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.1% SDS.
- the samples of plots 401 and 402 were then neutralized with a neutralizing buffer comprising Tween-20.
- the samples of plots 402 and 404 were neutralized with a neutralizing buffer comprising Triton-X. The most preferrable results were observed in plot 401 (SB4 storage buffer (the composition) neutralized with a neutralizing buffer (the second composition) comprising Tween-20).
- FIG. 4B presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of ionic surfactant and different compositions of neutralizing buffer which have been processed using V2DNATM Amplification Kit from ILLUMINATM.
- the samples presented in plot 405 and 406 were stored in storage buffer SB4.
- “SB4” represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.05% SDS (ionic surfactant).
- the samples of plots 407 and 408 were stored in storage buffer SB5.
- SB5 represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.1% SDS.
- the samples of plots 405 and 407 were then neutralized with a neutralizing buffer comprising Tween-20.
- the samples of plots 406 and 408 were neutralized with a neutralizing buffer comprising Triton-X. The most preferrable results were observed in plot 405 (SB4 storage buffer (the composition) neutralized with a neutralizing buffer (the second composition) comprising Tween-20).
- FIG. 5A presents reverse transcription pre-amplification of RNA in samples stored in storage buffer compositions SB4 and SB5.
- the recipes of the storage buffers SB4 and SB5 are consistent with the rest of the figures and are also provided in this figure. Reverse transcription was performed using reverse transcription (RT) buffers “5xRwa” and “5xRxa”.
- FIG. 5B presents primary template amplification of DNA in samples stored in storage buffer compositions SB4 and SB5. Reverse transcription was performed using reverse transcription (RT) buffers 5xRwa and 5xRxa. Buffer recipes are provided in FIG. 5A.
- FIG. 6 presents genomics metrics quantified for samples stored in different buffer compositions for varying durations of storage time.
- Buffer recipes for storage buffers SB4 and SB5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the rest of the figures and provided in FIG. 5A.
- “PropExonic” is the relative ratio or proportion of reads that map to exons, relative to introns or intergenic regions.
- “Protein Coding Transcripts” is the number of transcripts in the subsampled reads that map to protein coding regions.
- PreSeq Counts is the expected number of bases covered greater than 1 for the theoretical larger experiment. “PreSeq Counts” may be used as a measure of unique reads that map to regions of the genome, estimating the total and uniform coverage of a single genome.
- “Gini Coefficient” ranges from 0 to 1 where 0 is perfect uniformity and 1 is perfect non -uniformity in genomic coverage.
- FIG. 7 presents RNA sequencing results for samples stored in different buffer conditions for varying durations of time. Buffer recipes for storage buffers SB4 and SB5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the previous figures and also provided in FIG. 5A. These results indicated that the storage buffer SB4 resulted in more consistent, less intronic results.
- FIG. 8 presents pre-amplification yields obtained from samples stored under different conditions using the methods and compositions of the present disclosure.
- Buffer recipes for storage buffers SB4 and SB5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the rest of the figures and also provided in FIG. 5A. The results indicated that, when SB4 was used as storage buffer, Tween-20 was needed to be used in the neutralizing buffer before pre-amplification, in order to get accurate and uncompromised amplification and sequencing results.
- FIG. 9 presents genomics metrics quantified for samples stored in different buffer compositions and reverse transcription buffers.
- Buffer recipes for storage buffers SB4 and SB 5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the rest of the figures and also provided in FIG. 5A.
- CB represents conventional cell buffer not containing ionic surfactant and protease.
- FT represents a sample that has experienced at least one round of freeze and thaw. The results indicated that the SB4 storage buffer can yield better recovery of transcripts compared to other storage buffers/conditions.
- FIG. 10 presents reverse transcription pre-amplification sequencing data performed on samples stored in different buffer conditions for varying durations of time.
- SB4 represents a storage buffer with the recipe disclosed in FIG. 5A.
- CB represents conventional “cell buffer”.
- Cell buffer does not contain ionic surfactant and protease.
- FT indicates the sample has experienced freeze and thaw.
- 5xRTw and 5xRTa represent reverse transcription buffers with recipes disclosed in FIG. 5A. The results indicated that, compared to conventional cell buffer (CB), samples stored in SB4 with or without experiencing freeze thaw (FT), resulted in better recovery of transcripts and genes.
- FIG. 11 presents genome-wide amplification results and coverage in different buffer conditions. Buffer recipes are consistent with the rest of the figures and also provided in FIG. 5A.
- the sample stored in SB4 storage buffer resulted in uniform genome-wide transcript, while samples stored in CB (e.g., conventional cell buffer not containing the ionic surfactant and protease) resulted in biased transcription curves.
- CB e.g., conventional cell buffer not containing the ionic surfactant and protease
- FIG. 12 presents results of expression analysis for different storage conditions. The results indicated that when stored in conventional buffers, the cells were stressed leading to compromised expression data. Cells stored in SB4 were found to be in better condition and resulted in expression data that were not deteriorated upon storage of the cell samples. For example, storage in SB4 resulted in cells that were not stressed and yielded uncompromised genomic analysis data.
- FIGs. 13A-13F present sequencing metrics quantified for different storage conditions.
- CB represent conventional cell buffer.
- SB4 represents storage buffer with the recipe disclosed in FIG. 5A.
- the definitions of the analytical metrics are consistent with those generally known in the art. The definitions of the metrics are consistent with the rest of the figures.
- Examples 1-3 Following the general procedures of Examples 1-3, hundreds or thousands of cells are harvested (picked, isolated by flow cytometry, etc.) and placed into individual wells with storage buffer SB4. The cells are then stored or shipped for 3-15 days to an analysis facility, in some cases with a cold plate to maintain a lower than ambient temperature. The temperature of the cells may fluctuate during shipping, and may include exposure to temperatures of -20C to 30C. The cells are then subjected to a PTA genomic analysis (e.g., ResolveDNA from Bioskryb Genomics), expression/RNA analysis, or a combined DNA/RNA multiomics workflow (e.g., ResolveOME from Bioskryb Genomics) using a neutralization buffer described in Example 1.
- a PTA genomic analysis e.g., ResolveDNA from Bioskryb Genomics
- expression/RNA analysis e.g., expression/RNA analysis
- a combined DNA/RNA multiomics workflow e.g., ResolveOME from Bioskryb Genomics
- Use of the storage buffer is expected to result in less stress to cells (fewer expression artifacts from cell processing/storage), higher quality pre-amplification DNA/RNA, and improved or comparable sequencing metrics (PreSeq, protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera and Gini index to cell buffer CB1.
- PreSeq protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera and Gini index to cell buffer CB1.
- cells treated with storage buffer SB4 and stored are expected to result in similar outcome metrics (pre-amplification quality) and sequencing metrics as cells which were harvested and processed the same day.
- nucleic samples from single cells were amplified using either conventional cell buffer (CB) or storage buffer (SB, “SB5”, modified to contain 0.2% ionic surfactant).
- CB cell buffer
- SB storage buffer
- NTC no template control
- DNA DNA
- RNA RNA
- HG001 B cells stored at varying temperatures (-20°C, 6-10°C, and RT (room temperature)) and durations (1 versus 7 days at RT) display higher and more consistent ResolveOME DNA and cDNA yields in SB compared to CB (cell buffer). Improvements from SB are robust and consistent across three different lots (1, 2, and 3). SB and CB differed by a significant improvement in the total number of expressed protein coding genes detected.
- Stability buffer also provided insights with clinically derived biopsies using housekeeping gene analysis. Housekeeping gene analysis demonstrated a unique signature for clinical samples could be identified when clinical samples are sorted into stability buffer. Comparing clinical samples in stability buffer to the same sample without stability buffer demonstrated the improved recovery of genes. Additionally, clinical samples not in stability buffer generally looked more like cell lines than clinical samples in stability buffer. FIG. 15. [0152] Gene signatures were also analyzed against both buffer systems. The enhanced recovery of mRNA from clinical samples stored in stability buffer extended beyond housekeeping genes. The top five (FIG. 16A) and top ten (FIG. 16B) differentially expressed genes can be seen in a sub population of the clinical samples in PBS. When clinical samples are sorted into stability buffer (SB) this signature can be seen for most clinical samples.
- FIG. 16A The enhanced recovery of mRNA from clinical samples stored in stability buffer extended beyond housekeeping genes. The top five (FIG. 16A) and top ten (FIG. 16B) differentially expressed genes can be seen in a sub population of the clinical samples
- cfDNA was amplified using PTA with or without the stability buffer, and with or without a non-ionic surfactant (NIS).
- NIS non-ionic surfactant
- the fragmented/degraded cell free DNA was amplified in the presence of stability buffer so long as presence of the non-ionic surfactant (NIS) was maintained in the downstream reaction.
- the stability buffer could be added to the reaction at concentrations lOx previously demonstrated.
- Example 3 The general procedures of Example 3 are followed with modification: additional storage buffer formulations shown in Table 1 are evaluated.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Dentistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Environmental Sciences (AREA)
- Physics & Mathematics (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Provided herein are methods, systems, and compositions for cell storage and cell analysis. In some embodiments, compositions such as buffers are provided that enhance the stability of samples comprising one or more cells and/or constituents of one or more cells under storage conditions. The methods and compositions are compatible with samples comprising small numbers of cells and single cells and reduce or prevent sample loss, cell loss from samples, and adverse effect on screening results caused by storage, temperature fluctuations, and sample shipment. Methods and compositions may enable sample storage for genomics and multiomics analysis and primary template amplification.
Description
METHODS, SYSTEMS, AND COMPOSITIONS FOR CELL STORAGE AND ANALYSIS
CROSS REFERENCE
[0001] This application claims the benefit of U.S. Provisional No. 63/591,715, filed on October 19, 2023, which is incorporated herein by reference in its entirety.
BACKGROUND
[0002] Cellular analysis, and particularly, population-based analysis of cells in small clusters or single cell analysis has vast applications across the life sciences field, diagnostics, and drug discovery. Such analyses may include genomics and multiomics analysis. In many cases, preexisting methods, systems, and reagents for cell storage and analysis are inefficient such that they may limit the applications of the cell analysis methods or adversely affect the accuracy of the results.
SUMMARY
[0003] There is an unmet need for methods, systems, and compositions for cell storage and analysis to increase the stability of the samples and components thereof and preserve the quality of the stored samples over a prolonged period and a broad range of storage conditions. This can improve the results of cell analysis, such as genomics and multiomics analysis performed on the stored samples which may optionally be shipped during storage as well. Provided herein are methods and compositions such as buffers and reagents for storing samples comprising one or more cells, in some cases, a small number of cells or single cells. In some cases, the one or more cells may be sorted into predetermined sample volumes with predetermined cell densities ahead of storage.
[0004] In an aspect, provide herein is a composition comprising: (a) a salt; (b) an ionic surfactant at a concentration from about 0.00001 to about 1 volume percent; (c) a protease at a concentration from about 0.01 units to about 5 units per milliliter; (d) a reducing agent; and (e) a chelator. In some embodiments, the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof. In some cases, the protease is thermolabile. A thermolabile protease can be neutralized by heat, in some cases prior to whole genome amplification or sequencing. In some embodiments, the protease comprises or is Proteinase K. In some cases, the Proteinase K is thermolabile. In some embodiments, the salt comprises or is Tris-HCl. In some embodiments, the reducing agent comprises or is dithiothreitol (DTT). In some embodiments, the chelator
comprises or is ethylenediaminetetraacetic acid (EDTA). In some embodiments, the composition is used for performing at least one of: storing the cell or genomic materials of the cell, and preparing the genomic materials of the cell for amplification. The genomic materials of the cells may be amplified in the downstream. In some embodiments, sequencing may be performed subsequently. In some embodiments, the composition is used for storing the cell or genomic materials of the cell and preparing the genomic materials of the cell for amplification. Amplification and sequencing may follow. In some embodiments, storing comprises storing for at least about 1 hour, 4 hours, 8 hours, 12 hours, 24 hours, at least about 48 hours, at least about 72 hours or longer. In some cases, storage may be performed for at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 11 days, 12 days, 13 days, 14 days, 15 days, 16 days, 17 days, 18 days, 19 days, 20 days, or longer. In some embodiments, the storage temperature is from about -20 to about 30 degrees Celsius (C). In some embodiments, the sample is shipped during storage. In some embodiments, the temperature changes or fluctuates during storage. In some embodiments, the sample and/or components thereof remain substantially stable during storage despite shipment or changes or fluctuations in storage conditions such as temperature, humidity, and pressure. In some embodiments, the composition is configured for storing a sample comprising one or more cells or constituents of one or more cells. In some embodiments, the one or more cells comprise at most about 20, at most about 15, or at most about 10 cells. In some embodiments, the one or more cells is a single cell. In some embodiments, the one or more cells comprise live cells. In some embodiments, the one or more cells comprise fixed cells. In some embodiments, the composition further comprises a second salt different from the salt, wherein the second salt stimulates the activity of the protease, the ionic surfactant, or both. The composition may retain the stability of the nucleic acid molecules of the one or more cells. In some embodiments, the composition may lyse the one or more cells and extract genomics contents of the one or more cells. The genomic contents of the one or more cells (e.g., lysed cells) may remain stable in the composition during storage. In some embodiments, the protease comprises Proteinase K, Proteocut K, Nargarse, optionally wherein the protease is thermolabile. In some embodiments, the salt comprises Tris-HCl, HEPES, or TES. In some embodiments, the composition further comprises a salt selected from CaCh, MgCh, KC1, MgSC , K2SO4, orNaHCCE. In some embodiments, the reducing agent comprises dithiothreitol (DTT), tris(2-carboxyethyl)phosphine, (TCEP), or P-mercapto ethanol. In some embodiments, the chelator comprises ethylenediaminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA). In some embodiments, one or more of the following is true: the salt has a concentration of 10-100 mM; the ionic surfactant has a concentration volume percent of 0.001 to 0.1; the protease has a concentration of about 0.01 units to about 5 units per
milliliter; the reducing agent has a concentration of 0.1-10 mM; and the chelator has a concentration of 0.1 to 10 mM. In some embodiments, the ionic surfactant has a concentration volume percent of 0.01 to 0.5.
[0005] In an aspect, provided herein is a kit comprising the composition of any one of the preceding embodiments. In some embodiments, the kit comprises a second composition different from the composition. In some embodiments, the second composition comprises a neutralizing buffer comprising a component that substantially neutralizes the ionic surfactant. The composition may be referred to as storage buffer, and the second composition may be referred to as neutralizing buffer. In some embodiments, the kit further comprises instructions for storing a cell and preparing the genomic contents of the cell for amplification, using the composition and the second composition, wherein the composition is used to store the cell and the second composition is used to neutralize the composition, wherein neutralizing the composition improves the results of a downstream amplification performed on the cell. In some embodiments, the second composition comprises a non-ionic surfactant, zwitterionic surfactant, charge-neutral surfactant, or any combination thereof. In some embodiments, the non-ionic surfactant may comprise a polysorbate, a poly(ethylene glycol) derivative, or both. In some embodiments, the non-ionic surfactant comprises Tween (e.g., Tween-20, Tween-40, Tween-60, or Tween-80) or Triton (e.g., Triton-X-100).
[0006] In an aspect provided herein is a method of cell analysis, comprising one or more steps of: (a) providing or obtaining a sample comprising one or more cells stored in a composition comprising one or more components of: (i) a salt; (ii) an ionic surfactant; (iii) a protease; (iv) a reducing agent; and (v) a chelator. The method may further comprise (b) amplifying genomic materials of the one or more cells; and (c) performing genomic analysis on the genomic materials of the one or more cells. In an aspect provided herein is a method of cell analysis, comprising: (a) providing or obtaining a sample comprising one or more cells stored in a composition comprising: (i) a salt; (ii) an ionic surfactant; (iii) a protease; (iv) a reducing agent; and (v) a chelator. The method may further comprise (b) amplifying genomic materials of the one or more cells; and (c) performing genomic analysis on the genomic materials of the one or more cells. In some embodiments, the one or more cells comprises from 1 to about 10 cells. In some embodiments, the sample has a single cell. In some embodiments, the sample comprises at most about 1000, at most about 800, at most about 600, at most about 400, at most about 200, at most about 100, at most about 80, at most about 60, at most about 40, at most about 20, at most about 10, at most about 8, at most about 6, at most about 4, a smaller number of cells, or a single cell. In some embodiments, the method does not require column filtration and is amenable to retrieval of the genomic materials of the one or more cells from the sample. The sample loss
caused by the methods and compositions may be substantially minimal. In some cases, certain washing protocols, or multiple washing steps may lead to sample loss or cell loss from samples. Samples containing small numbers of cells or single cells may be sensitive to filtration and/or washing steps. In some embodiments, the one or more cells is a single cell, and genomic analysis is single cell genomics or multiomics, wherein the composition is configured for storing the single cell and preparing the genomic contents of the single cell for amplification. In some embodiments, the method further comprises sorting the cells before storing the sample in the composition. In some embodiments, the method further comprises neutralizing the composition with a second composition. In some embodiments, the second composition comprises a neutralizing buffer capable of neutralizing the ionic surfactant. The composition may be referred to as storage buffer and the second composition may be referred to as neutralizing buffer. In some cases, non-ionic surfactant may comprise a polysorbate, a poly(ethylene glycol) derivative, or both. In some embodiments, the neutralizing buffer comprises Tween (e.g., Tween-20, Tween-40, Tween-60, or Tween-80) or Triton (e.g., Triton-X-100). In some embodiments, the ionic surfactant comprises from about 0.00001 to about 1 volume percent of the composition. In some embodiments, the ionic surfactant comprises from about 0.03 to about 0.09 volume percent of the composition. In some embodiments, the protease comprises from about 0.01 units to about 5 units per milliliter of the composition. In some embodiments, the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof. In some embodiments, the protease comprises or is Proteinase K. In some cases, the protease is thermolabile. A thermolabile protease will allow for neutralizing or halting the activity of the protease (e.g., proteinase K) when intended. For example, this can be performed before whole genome amplification or sequencing (e.g., RNA sequencing). In some embodiments, the salt comprises or is Tris-HCl. In some embodiments, the chelator comprises or is ethylenediaminetetraacetic acid (EDTA). In some embodiments, the composition further comprises a second salt different from the salt, wherein the second salt is capable of stimulating the activity of the protease, the ionic surfactant, or both. In some embodiments, the second salt comprises or is a divalent cation. In some embodiments, the second salt comprises or is calcium chloride. In some embodiments, the method further comprises lysing the one or more cells and extracting the genomic materials thereof. Genomic materials may comprise the whole genomes of the one or more cells. In some embodiments, genomic materials comprise nucleic acid molecules. In some embodiments, nucleic acid molecules comprise ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or both. The method may further comprise performing genomic analysis on the one or more cells.
Genomic analysis may comprise or be single cell genomics or multiomics. Genomic analysis may comprise amplification. Amplification may comprise or be isothermal amplification. In some embodiments, genomic analysis comprises primary template amplification (PTA). In some embodiments, the methods, systems, and compositions for cell storage are compatible with single cell samples or samples with small cell numbers, long term storage and shipment, fluctuation or changes in the storage conditions such as temperature, pressure, humidity, and other conditions, amplification (e.g., isothermal amplification) of genomic materials of the cells upon or after storage, and any combination thereof, such as integrated workflows comprising any combination or all of the aforementioned procedures. In some embodiments, the protease comprises Proteinase K, Proteocut K, Nargarse, optionally wherein the protease is thermolabile. In some embodiments, the salt comprises Tris-HCl, HEPES, or TES. In some embodiments, the composition further comprises a salt selected from CaCh, MgCh, KC1, MgSCU, K2SO4, or NaHCCE. In some embodiments, the reducing agent comprises dithiothreitol (DTT), tris(2- carboxyethyl)phosphine, (TCEP), or P-mercapto ethanol. In some embodiments, the chelator comprises ethylenedi aminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA). In some embodiments, one or more of the following is true: the salt has a concentration of 10- 100 mM; the ionic surfactant has a concentration volume percent of 0.001 to 0.1; the protease has a concentration of about 0.01 units to about 5 units per milliliter; the reducing agent has a concentration of 0.1-10 mM; and the chelator has a concentration of 0.1 to 10 mM. In some embodiments, the ionic surfactant has a concentration volume percent of 0.01 to 0.5.
[0007] Provided herein are methods of cell analysis, comprising: providing a single cell stored in a buffer for a time period of at least 1 day; lysing the single cell; amplifying mRNA transcripts and genomic DNA from the cell to generate cDNA and genomic DNA libraries, respectively; and sequencing mRNA transcripts and genomic DNA from the cell to obtain one or more sequencing metrics, wherein one or more of: the yield of pre-amplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of preamplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day. In some instances, sequencing metrics comprises one or more Picard Metrics.
[0008] In some instances, sequencing metrics comprises one or more of PreSeq, protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera, and Gini index. In some instances, the single cell is an
embryonic cell. In some instances, the single cell is a human embryonic cell. In some instances, providing comprises shipping or storing. In some instances, wherein the time period is at least 5 days. In some instances, the time period is at least 10 days. In some instances, the time period is 3-15 days. In some instances, the single cell was stored at a storage temperature. In some instances, the storage temperature is from about -20 to about 30 degrees Celsius (°C), optionally with a variation of from about 0 to about 20%. In some instances, the storage temperature changes or fluctuates. In some instances, one or more of the following is true: the yield of preamplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day. In some instances, one or more of the following is true: the yield of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day. In some instances, the single cell is suspended in a composition provided herein.
[0009] Provided herein are methods of amplifying a fragmented or degraded nucleic acid sample comprising: (a) providing or obtaining a sample comprising fragmented or degraded nucleic acids; (b) suspending the fragmented or degraded nucleic acids in a composition described herein; (c) amplifying genomic materials of the one or more cells, wherein a non-ionic surfactant is maintained during amplification; and (d) performing genomic analysis on the genomic materials of the one or more cells. Further provided herein are methods wherein the nucleic acid sample comprises ctDNA. Further provided herein are methods wherein the nucleic acid sample comprises cfDNA. Further provided herein are methods wherein the composition is present at 10X concentration. Further provided herein are methods wherein a total yield of amplicons is within 5, 10, 15, 20, 25, 30, 40, 50, or within 60 % compared to amplification without the composition. Further provided herein are methods wherein the non-ionic surfactant comprises Tween or Triton.
[0010] Provided herein are compositions as described in Tables 1-8 or FIG. 5A.
INCORPORATION BY REFERENCE
[0011] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
[0013] FIG. 1 presents amplification multicomponent plots for samples stored in storage buffer compositions (the composition) of the present disclosure with varying concentrations of an ionic surfactant; the x-axis is labeled cycle from 0 to 24 at 2 unit intervals, and the y-axis is labeled fluorescence from -10,000 to 50,000 at 5000 unit intervals;
[0014] FIG. 2 presents amplification multicomponent plots for samples neutralized with different compositions of neutralizing buffer (the second composition); the x-axis is labeled cycle from 0 to 24 at 2 unit intervals, and the y-axis is labeled fluorescence from -10,000 to 50,000 at 5000 unit intervals;
[0015] FIG. 3 presents broad first pass experiment pre-amplification quality control data in the form of electrophoresis gels for samples stored in different buffer compositions and neutralized with different neutralizing buffers;
[0016] FIG. 4A presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of an ionic surfactant and different compositions of a neutralizing buffer on which reverse transcription or preamplification has been performed; the x-axis is labeled cycle from 0 to 24 at 2 unit intervals, and the y-axis is labeled fluorescence from 0 to 600,000 at 50,000 unit intervals;
[0017] FIG. 4B presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of an ionic surfactant and different compositions of a neutralizing buffer which have been processed using V2 DNA Amplification Kit from Illumina; the x-axis is labeled cycle from 0 to 70 at 10 unit intervals, and the y-axis is labeled fluorescence from 0 to 700,000 at 50,000 unit intervals;
[0018] FIG. 5A presents reverse transcription pre-amplification of RNA in samples stored in different storage buffer compositions and reverse transcribed with different reverse transcription buffers; the bar graph shows results (left to right for each of day 1, 2, and 3): SB4 5xRwa, SB4
5xRxa, SB5 5xRwa, and SB5 5xRxa; the x-axis is labeled storage condition (days 1, 2, 3) and the y-axis is labeled ng/microliter from 0 to 80 at 20 unit intervals;
[0019] FIG. 5B presents data on primary template amplification (PTA) of DNA in samples stored in different storage buffer compositions and amplified using different transcription buffers; the bar graph shows results (left to right for each of day 1, 2, and 3): SB4 5xRwa, SB4 5xRxa, SB 5 5xRwa, and SB 5 5xRxa; each grouping of 3 bars on the left of each graph represent results for buffer SB4 5xRwa, and each grouping of 3 bars on the right of each graph represent results for buffer SB5 5xRwa; the x-axis is labeled storage condition (days 1, 2, 3) and the y-axis is labeled ng/microliter from 0 to 25 at 5 unit intervals;
[0020] FIG. 6 presents multiomics (DNA and mRNA) metrics quantified for samples stored in different buffer compositions for predetermined durations of time; upper left: proportion of exonic sequences (y-axis labeled from 0.0 to 1.0 at 0.2 unit intervals; upper right: Gini coefficient (y-axis labeled from 0.00 to 0.05 at 0.01 unit intervals; lower left: protein coding transcripts (y-axis labeled protein coding transcripts from 0 to 15,000 at 5,000 unit intervals); lower right: PreSeq counts (y-axis labeled counts from 0 to 6xl09 at 2xl09 unit intervals;
[0021] FIG. 7 presents RNA sequencing results for samples stored in different storage buffers for varying durations of time and amplified with different transcription buffers; the proportion of genomic origins mapped to exonic, intronic, or intergenic regions are shown on the right graph; [0022] FIG. 8 presents pre-amplification yields obtained from samples stored under different conditions using the methods and compositions of the present disclosure; the x-axis represents different samples, and the y-axis is pre-amp yield (ng) from 0 to 140 at 20 unit intervals;
[0023] FIG. 9 presents transcriptomics metrics quantified for samples stored in different buffer compositions for varying durations of time and amplified with different reverse transcription buffers; buffer 5xRa is shown in the left bar and 5xRwa is the right bar for each condition; upper left: y-axis is labeled protein coding transcripts from 0 to 10,000 at 5,000 unit intervals; upper middle: y-axis is labeled proportion exonic sequences from 0.0 to 1.0 at 0.2 unit intervals; upper right: y-axis is labeled ratio transcript body from 0.0 to 1.5 at 0.5 unit intervals; lower left: y- axis is labeled protein coding transcripts from 0 to 10,000 at 5,000 unit intervals; lower middle: y-axis is labeled proportion exonic sequences from 0.0 to 1.0 at 0.2 unit intervals; lower right: y- axis is labeled ratio transcript body from 0.0 to 1.5 at 0.5 unit intervals; data in the top row of graphs was obtained by operator 1 and data in the bottom row of graphs was obtained by operator 2;
[0024] FIG. 10 presents reverse transcription pre-amplification sequencing data performed on samples stored in different buffer conditions for varying durations of time and amplified with
different reverse transcription buffers; the proportion of genomic origins mapped to exonic, intronic, or intergenic regions are shown on the right graph;
[0025] FIG. 11 presents a transcript coverage rainbow plot resulting from genome-wide amplification and RNA sequencing using a conventional cell buffer (“CB”) and a buffer presented herein (“SB4”) demonstrating more uniform transcript coverage upon using the buffer of the present disclosure (SB4); the x-axis is labeled 5’-3’ body percentile from 0 to 100 at 20 unit intervals, and the y-axis is labeled coverage from 0.0 to 1.0 at 0.2 unit intervals;
[0026] FIG. 12 presents a heat map of results of expression analysis for different storage conditions; the legend shows expression levels from -1 (purple) to 1 (yellow);
[0027] FIG. 13A presents PTA DNA yield (ng) for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis represents yield (ng) from 0 to 1000 at 200 unit intervals;
[0028] FIG. 13B presents PreSeq counts for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis represents counts from 0 to 6xl09 at 2xl09 unit intervals;
[0029] FIG. 13C presents preamp cDNA yields for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis yield (ng) from 0 to 500 at 100 unit intervals;
[0030] FIG. 13D presents protein coding transcripts detected for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis yield (ng) from 0 to 8000 at 1000 unit intervals;
[0031] FIG. 13E presents the proportion of exonic sequences detected for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis is proportion from 0.0 to 1.0 at 0.2 unit intervals; and [0032] FIG. 13F presents the ratio of transcript bodies detected for experiments with different storage conditions at day 3 (left bar) and 11 days (right bar); the x-axis represents different samples, and the y-axis is labeled yield (ng) from 0.0 to 1.5 at 0.5 unit intervals.
[0033] FIG. 14A depicts a graph showing changes in DNA total yield vs. various analyte, buffer, and lot combinations. The x-axis represents different samples and the y-axis represents DNA total yield (ng) from 0 to 2500 at 500 unit intervals (the dotted line represents 300 ng). Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
[0034] FIG. 14B depicts a graph showing changes in cDNA total yield vs. various analyte, buffer, and lot combinations. The x-axis represents different samples and the y-axis represents cDNA total yield (ng) from 0 to 1500 at 500 unit intervals (the dotted line represents 150 ng).
Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
[0035] FIG. 14C depicts a graph showing changes PreSeq count vs. various analyte, buffer, and lot combinations. The x-axis represents different samples and the y-axis represents counts from 0 to 6xl09 at 2xl09 unit intervals (the dotted line represents 3.5xl09). Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
[0036] FIG. 14D depicts a graph showing changes in the proportion of exonic reads vs. various analyte, buffer, and lot combinations. The x-axis represents different samples and the y-axis represents counts from 0.0 to 1.5 at 0.5 unit intervals (the dotted line represents 0.6). Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day).
[0037] FIG. 14E depicts a graph showing changes in Protein Coding Genes vs. various analyte, buffer, and lot combinations. The x-axis represents different samples and the y-axis represents the number of protein coding genes from 0 to 8000 at 2000 unit intervals (the dotted line represents 2000 genes). Each series represents a different storage condition (square: -20°C; circle: 6-10°C; triangle: room temperature, 1 day; inverted triangle: room temperature, 1 day). [0038] FIG. 15 depicts a series of graphs showing recovery of genes in B-cell line or clinical samples for conditions using no buffer (cell line, NA), PBS, or SB (stability buffer). The genes tested (left to right, top to bottom) were: GAPDH, ACTB, ACTG1, RPL36, HINT1, TBP, PPIA, HPRT1, UBC, RPL13A, PGK1, EIF3K, RPLP0, GUSB, CLTC, HMBS, EEF1A1, and MALATE
[0039] FIG. 16A depicts a heatmap of the top five mRNA expression levels for clinical samples stored in standard PBS buffer vs. SB. Expression levels are shown from -1 (purple) to 0 (black) to 1 (yellow). Genes listed on the left side (top to bottom) are ATP2B1, RABL6, LGMN, CREG1, PTMS, MTND2P28, MT-ND4L, MT-ATP8, MTCO1P12, MTND1P23, DNMT3L, FTH1, MT-ATP8, RPL26, and H4C3. Rows 1-6 are labeled (top to bottom): sample media, percent.rb, percent.mt, nCount_RNA, nFeature_RNA, and run_tag2.
[0040] FIG. 16B depicts a heatmap of the top ten mRNA expression levels for clinical samples stored in standard PBS buffer vs. SB. Expression levels are shown from -1 (purple) to 0 (black) to 1 (yellow). Genes listed on the left side (top to bottom) are ATP2B1, WAC, RABL6, LGMN, CREG1, EMC2, PTMS, PRMT2, GTF3C3, ZNF888, MT-ND1, MTND2P28, MT-CYB, MT- ND4L, MT-ND6, MT-ATP8, MT-ND5, MTCO1P12, MTDATP6P1, MTND1P23, RPS5, RPS27, DNMT3L, MTATP6P1, DNAJC15, FTH1, MT-ATP8, RPL26, H4C3, and NDUFAB1.
Rows 1-6 are labeled (top to bottom): sample media, percent.rb, percent.mt, nCount RNA, nFeature_RNA, and run_tag2.
[0041] FIG. 17 depicts a graph showing cfDNA amplification yields after PTA for samples stored in no stability buffer vs. stability buffer (SB). The x-axis depicts either the presence (+) or absence (-) of the non-ionic surfactant; the y-axis represents total yield (ng) from 0 to 1500 at 500 unit intervals. Columns 1 and 2 did not use stability buffer; columns 3 and 4 used 10X stability buffer.
DETAILED DESCRIPTION OF THE INVENTION
[0042] There is an unmet need for methods, systems, and compositions for cell storage and analysis. In some cases, cell analysis methods and systems may comprise miniaturized and/or high-throughput cell analysis (e.g., single cell analysis). Such methods may comprise genomics, proteomics, and multiomics analysis which in some cases may be performed on samples comprising a relatively small number of cells, such as less than about 1000 cells, about 800 cell, about 600 cell, about 400 cells, about 200 cell, about 100 cell, about 80 cell, about 60 cells, about 40 cells, about 20 cells, about 10 cells or smaller number of cells, in some cases single cells. In some cases, cells at a predetermined cell density and sample volume may be sorted prior to storage in the composition of the present disclosure. Sorting may comprise sorting the one or more cells (e.g., any cell number indicated above) in containers.
[0043] In some cases, the samples may comprise one or more cells and/or constituents of one or more cells such as genomic materials of cells. Genomic materials of cells may comprise cellular nucleic acid molecules, such as deoxyribonucleic acid (DNA) molecules, ribonucleic acid (RNA) molecules, or both. In some instances, composition of the present disclosure may facilitate maintaining the stability of the genomic materials of cells during storage in the composition. Storage may comprise long-term storage and/or shipment. Alternatively or in addition, the composition may comprise or maintain cells that are fixed. For example, in some cases, the one or more cells may be fixed (e.g., chemically fixed using an alcohol, an aldehyde, another chemical, or any combination thereof). The one or more cells may be fixed to a substrate or a surface. In some cases, the composition may further lyse the cells, extract, or preserve the genomic materials of the cells during storage. In some cases, storage may comprise storage for at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, or longer. In some cases, storage may comprise storage in a temperature of at least about -20 °C, -10 °C, 0 °C, 5 °C, 10 °C, 20 °C, 25 °C, or higher. In some cases, temperature might change or fluctuate. Storage methods and compositions that reduce the temperature-sensitivity of the samples or make them more stable under a wide range of temperature and storage conditions can be highly
beneficial, and can reduce the risk of compromising the samples or adversely affecting the results of cell analysis performed on the samples. In some cases, storage may comprise storage in a temperature of -40 to -20, -30 to -20, -30 to -10, -30 to 0, -20 to 0, -15 to 0, -10 to 0, 0 to 5, 0 to 5, 0 to 10, 0 to 20, 5 to 10, 5 to 15, 5 to 20, or 10-30 degrees C.
[0044] Samples containing such small numbers of cells or single cells may need to be handled with care and may need or benefit from cell handling, cell manipulation, or storage methods and compositions that can minimize sample loss and cell loss from samples during sample preparation, storage, potential shipment, and potential temperature variation, while maintaining the stability of the samples and components thereof such as genomics materials of cells. In some cases, such samples may comprise one or more cells (e.g., a small number of cells or single cells). The cells may be sorted in the samples prior to storage in the composition. Cell sorting may be done using any suitable cell sorting technique such as flow cytometry, fluorescence- activated cell sorting (FACS), or sorting using a miniaturized or microfluidic system/device. In some cases, the composition may lyse the cells, extract, and preserve the genomic materials of the cells, after the cells are sorted in the composition.
[0045] In some aspects, provided herein are methods, systems, and compositions that enable handling, storing, or optionally shipping samples comprising one or more cells (e.g., single cells) or constituents of one or more cells. Such methods, systems, or compositions can result in significantly improved stability of the samples and components thereof, such that once the sample is used for analysis upon or after storage and/or shipment, the results are found to be substantially uncompromised by storage and/or shipment. In some aspects, provided herein are reagents and buffers for cell storage, and methods of use thereof such as to prepare, store, or preserve samples comprising a small number of cells or single cells. The samples may further be used for genomics, proteomics, or multiomics using a variety of techniques. Genomic analysis may comprise amplification, which may, in some cases, comprise isothermal amplification, or in some cases, comprise primary template amplification (PTA). Any combination of analyses may be performed on the samples stored and optionally shipped using the cell storage methods and compositions of the present disclosure.
[0046] In some cases, there is a need to keep nucleic acid molecules stable in the sample for long-term storage and potential shipment. Presence of enzymes that can degrade nucleic acid molecules such as DNA and RNA (e.g., DNAse and/or RNAse enzymes) may interfere with the stability of the samples. The buffers and compositions of the present disclosure may facilitate stabilizing nucleic acid molecules, in some cases by inhibiting the nucleic acid molecule degrading enzymes such as DNAse or RNAse. In some cases, one function of the composition
of the present disclosure may be RNAse inhibition, DNAse inhibition, or both. The composition and its contents are elaborated on in detail throughout the present disclosure.
[0047] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of any embodiment. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0048] Unless specifically stated or obvious from context, as used herein, the term “about” in reference to a number or range of numbers is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.
[0049] The term “nucleic acid” encompasses multi-stranded, as well as single-stranded molecules. In double- or triple-stranded nucleic acids, the nucleic acid strands need not be coextensive (i.e., a double- stranded nucleic acid need not be double-stranded along the entire length of both strands). Nucleic acid templates described herein may be any size depending on the sample (from small cell-free DNA fragments to entire genomes), including but not limited to 50-300 bases, 100-2000 bases, 100-750 bases, 170-500 bases, 100-5000 bases, 50-10,000 bases, or 50-2000 bases in length. In some instances, templates are at least 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000 50,000, 100,000, 200,000, 500,000, 1,000,000 or more than 1,000,000 bases in length. Methods described herein provide for the amplification of nucleic acid acids, such as nucleic acid templates. Methods described herein additionally provide for the generation of isolated and at least partially purified nucleic acids and libraries of nucleic acids. In some instances, methods described herein provide for extracted nucleic acids (e.g., extracted from tissues, cells, or media). Nucleic acids include but are not limited to those comprising DNA, RNA, circular RNA, mtDNA (mitochondrial DNA), cfDNA (cell free DNA), cfRNA (cell free RNA), siRNA (small interfering RNA), cffDNA (cell free fetal DNA), circular DNA, extrachromosomal DNAs (ecDNAs), mRNA, tRNA, rRNA, miRNA (microRNA), synthetic polynucleotides, polynucleotide analogues, any other nucleic acid consistent with the specification, or any combinations thereof. The length of polynucleotides, when provided, are described as the number of bases and abbreviated, such as nt (nucleotides), bp (bases), kb (kilobases), or Gb (gigabases).
[0050] In some aspects, provided herein is a composition, such as a storage buffer. In some cases, more than one buffer, such as one buffer, two buffers, three buffer, or more may be provided or used. In some cases, one or more compositions or one or more buffers may be provided as part of a kit and be accompanied by instructions for use. Provided herein are also methods of cell handling, storage, and analysis using the compositions of the present disclosure. [0051] In an aspect, provide herein is a composition (e.g., a storage buffer) comprising: (a) a salt; (b) an ionic surfactant; (c) a protease; (d) a reducing agent; and (e) a chelator. In some cases, the ionic surfactant in the composition may be at a concentration from about 0.00001 to about 1 volume percent. In some cases, the ionic surfactant in the composition may be at a concentration of at least about 0.00001 %, at least about 0.0001 %, at least about 0.001%, at least about 0.01%, at least about 0.02%, at least about 0.03%, at least about 0.04%, at least about 0.05%, at least about 0.06%, at least about 0.07%, at least about 0.08%, at least about 0.09%, at least about 0.1%, at least about 0.2%, at least about 0.3%, at least about 0.4%, at least about 0.5%, at least about 1% by volume or greater. In some cases, the concentration of the ionic surfactant in the composition is at most about 1%, at most about 0.5%, at most about 0.4%, at most about 0.3%, at most about 0.2%, at most about 0.1%, at most about 0.09%, at most about 0.08%, at most about 0.07%, at most about 0.07%, at most about 0.06%, at most about 0.05%, at mots about 0.04%, at most about 0.03%, at most about 0.02%, at most about 0.01% by volume, or less. In some examples, the concentration of the ionic surfactant in the composition may be from about 0.01% to about 0.09% by volume. In some examples, the concentration of the ionic surfactant in the composition may be from about 0.02% to about 0.08% by volume. In some examples, the concentration of the ionic surfactant in the composition may be from about 0.005% to about 0.02% by volume. In some examples, the concentration of the ionic surfactant in the composition may be from about 0.008% to about 0.015% by volume. In some examples, the concentration of the ionic surfactant in the composition may be about 0.01% by volume. In some examples, the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof. In some instances, the amount of ionic surfactant is represented as a range of any of the values above.
[0052] In some examples, the protease may be at a concentration from about 0.01 units to about 5 units per milliliter. In some examples, the protease may be at a concentration of at least about 0.01 units, at least about 0.02 units per milliliter, at least about 0.03 units per milliliter, at least about 0.04 units per milliliter, at least about 0.05 units per milliliter, at least about 0.06 units per milliliter, at least about 0.07 units per milliliter, at least about 0.08 units per milliliter, at least
about 0.09 units per milliliter, at least about 0.1 units per milliliter, at least about 0.5 units per milliliter, at least about 1 unit per milliliter, at least about 2 units per milliliter, at least about 3 units per milliliter, at least about 4 units per milliliter or more. In some examples, the protease may be Proteinase K. The protease (e.g., proteinase K) may be thermolabile.
[0053] In some examples, the protease may be at a concentration of at most about 5 units per milliliter, at most about 4 units per milliliter, at most about 3 units per milliliter, at most about 2 units per milliliter, at most about 1 units per milliliter, at most about 0.5 units per milliliter, at most about 0.1 units per milliliter, at most about 0.09 units per milliliter, at most about 0.08 units per milliliter, at most about 0.07 units per milliliter, at most about 0.06 units per milliliter, at most about 0.05 units per milliliter at most about 0.04 units per milliliter, at most about 0.03 units per milliliter, at most about 0.02 units per milliliter or less. In some examples, the protease may be Proteinase K. The protease (e.g., proteinase K) may be thermolabile.
[0054] In some examples, the concentration of the protease may be from about 50 micro grams per milli-Liter (pg/mL) to about 1000 pg/mL. In some examples, the concentration of the protease may be at least about 50 pg/mL, at least about 100 pg/mL, at least about 200 pg/mL, at least about 300 pg/mL, at least about 400 pg/mL, at least about 500 pg/mL, at least about 600 pg/mL, at least about 700 pg/mL, at least about 800 pg/mL or more. In some examples, the concentration of the protease may be at most about 800 pg/mL, at most about 700 pg/mL, at most about 600 pg/mL, at most about 500 pg/mL, at most about 400 pg/mL, at most about 300 pg/mL, at most about 200 pg/mL, or less. In some examples, the concentration of the protease may be about 200 micro grams per milli-Liter (pg/mL). In some cases, the protease may be thermolabile. In some examples, the protease may be Proteinase K. In some cases, the Proteinase K may be thermolabile.
[0055] In some examples, the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof. In some examples, the protease comprises or is Proteinase K. In some examples, the salt comprises or is Tris-HCL. In some examples, the reducing agent comprises or is
Dithiothreitol (DTT). In some examples, the chelator comprises or is Ethylenediaminetetraacetic acid (EDTA).
[0056] In an example, the composition (e.g., storage buffer) comprises sodium dodecyl sulfate (SDS) at a concentration from about 0.005% to about 0.06% by volume, Proteinase K at a concentration from about 0.01 units to about 5 units per milliliter or , Proteinase K at a concentration from about 50 micro grams per milli-Liter (pg/mL) to about 1000 pg/mL, Tris- HCL, Dithiothreitol (DTT), and EDTA. In an example, the concentration of Proteinase K may
be about 200 micro grams per milli-Liter (pg/mL).The concentrations of Tris-HCL, Dithiothreitol (DTT), and EDTA may be any suitable amount. In an example, the concentration of Tris-HCL may be about 50 milli-Molar (mM). In an example, the concentration of EDTA may be about 1 milli-Molar (mM). In some examples, the pH of the buffer may be from about 7 to about 9. In an example, the pH of the buffer may be approximately 8.5. In an example, the pH of the buffer may be approximately 8. In an example, the pH of the buffer may be approximately 7.5.
[0057] In some examples, the composition is used for performing at least one of lysing the cell, storing the cell and/or genomic materials of the cell, and preparing the sample for downstream amplification of the genomic materials of the cell. In some cases, amplification may comprise or be isothermal amplification. In some cases, amplification may comprise primary template amplification (PTA). In some examples, the composition (e.g., the storage buffer) is used for lysing a cell, extracting the genomic materials of the cell, storing the cell or genomic materials of the cell, and preparing the sample for amplification of the genomic materials of the cell. In some examples, a sample comprising one or more cells or a single cell is prepared and stored in the composition. In some cases, optionally, the sample may be shipped from one place to another during storage.
[0058] In some examples, storing comprises storing for at least about 1 hour, 4 hours, 8 hours, 12 hours, 24 hours, at least about 48 hours, at least about 72 hours or longer. In some cases, storage may comprise storage for at least 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, or longer. In some examples, the cells are stored for about 1-5, 1-10, 1-12, 1-18, 1-24, 3-8, 3-12, 6-12, 6-24, 12-24, 12-36, 12-48, 12-72, 24-36, 24-48, 24-72, or 48-72 hours. In some examples, the cells are stored for about 1-3, 1-7, 1-10, 2-5, 2-8, 3-5, 3-10, 5-7, 5- 10, or 5-14 days. In some examples, the cells are stored for about one day or about seven days as provided for example in FIGs. 14A-14E. In some cases, storage may comprise storage in a temperature of at least about -20 °C, -10 °C, 0 °C, 5 °C, 10 °C, 20 °C, 25 °C, or higher. In some examples, the cells are stored at about -20 °C to 0 °C, -20 °C to 10 °C, 0 °C to 10 °C, or 0 °C to 25 °C. In some examples, the cells are stored at about -20 °C, about 6-10 °C, or at room temperature, as provided for example in FIGs. 14A-14E. In some cases, storage comprises longterm storage. In some cases, sample may be shipped during storage. In some examples, the temperature changes or fluctuates during storage. In some examples, the sample and/or components thereof remain substantially stable during storage despite shipment or changes or fluctuations in storage conditions such as temperature, humidity, and pressure.
[0059] In some examples, the composition (e.g., storage buffer) is configured for storing a sample comprising one or more cells or constituents of one or more cells. In some examples, the
cells are stored using a stability buffer, as provided herein. In some examples, the stability buffer stabilized nucleic acids across varying cell amounts, temperatures, and time, as show for example in FIGs. 14A-14E. In some examples, the samples (e.g., cells or genomic material therein) in the stability buffer demonstrate improved recovery of genes compared to samples stored not in the stability buffer (e.g., FIG. 15. In some examples, the samples (e.g., cells or genomic material therein) in the stability buffer demonstrate enhanced recovery of mRNA compared to samples stored not in the stability buffer (e.g., FIGs. 16A-B). In some examples, the one or more cells comprise at most about 20, at most about 15, or at most about 10 cells. In some examples, the one or more cells is a single cell.
[0060] In some cases, a temperature variation of at least about 1%, at least about 5%, at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60% or greater may occur during storage and/or shipment. Such temperature changes or variation may in some cases be non-ideal or unintentional. The sample stored in the composition of the present disclosure may be substantially insensitive and/or resilient to such change(s) in temperature. For example, the properties of the sample, the stability of the cell constituents such as genomic materials of the cells, or other sample properties may not be compromised as a result of such temperature variation during storage. For example, cellular nucleic acids such as DNA and RNA may remain stable, such that upon genomics, transcriptomics, or multiomics analysis, accurate results can be obtained. For example, the accuracy of genomics, transcriptomics, and/or multiomics data are not affected by storage. As such, the composition provided herein may be more optimal for cell storage or shipment compared to pre-existing formulations. Such compositions can significantly reduce waste and inefficiencies in the experimental workflows performed on the samples of the present disclosure, such as isothermal amplification. The compositions may also increase the duration for which the sample can be stored, the distances the sample can be shipped, and the range of conditions that the sample can sustain during storage and shipment. For example, a stability buffer as described herein can stabilize nucleic acids across varying cell amount, temperatures, or time. The composition may lyse the cell and extract the genomic contents of the cell. In some instances, the composition may further preserve the genomic contents of the cells during storage. During storage (e.g., long term storage as described anywhere herein), the stability of the genomic materials of the cells may be preserved by the composition. For example, the chromatin, DNA, RNA, or any combination thereof of the one or more cells may remain stable during storage and/or potential shipment.
[0061] In some examples, the composition (e.g., storage buffer) further comprises a second salt different from the salt. In some examples, the second salt stimulates the activity of the protease.
In some examples, the second salt may comprise or be a divalent cation such as calcium chloride (CaCh), magnesium chloride (MnCh), or cobalt chloride (C0CI2). The composition may comprise any suitable amount of the second salt, in some examples, at least about 0.1 mM, at least about 0.2 mM, at least about 0.3 mM, at least about 0.4 mM, at least about 0.5 mM, at least about 0.6 mM, at least about 0.7 mM or more.
[0062] In an aspect, provided herein is a kit comprising the composition of any one of the preceding embodiments. In some examples, the kit comprises a second composition (e.g., a neutralizing buffer) different from the composition (e.g., the storage buffer). In some examples, the second composition comprises a neutralizing buffer comprising a component that substantially neutralizes the ionic surfactant. In some examples, the kit further comprises instructions for storing a cell. In some examples, the kit further comprises instructions for preparing the genomic contents of the cell for downstream amplification using the composition and the second composition. In some examples, the composition is used to store the cell and the second composition is used to neutralize the composition. In some examples, neutralizing the composition improves the results of a downstream amplification performed on the cell. In some examples, the second composition comprises a non-ionic surfactant, a zwitterionic surfactant, a charge-neutral surfactant, or any combination thereof. In some cases, non-ionic surfactant may comprise a polysorbate, a polyethylene glycol) derivative, or both. In some cases, non-ionic surfactant may comprise a polysorbate, a poly(ethylene glycol) derivative, or both. In some examples, the non-ionic surfactant comprises Tween (e.g., Tween-20, Tween-40, Tween-60, Tween-80) or Triton (e.g., Triton-X-100). One function of the non-ionic surfactant may be neutralizing the ionic surfactant. The composition may be referred to as storage buffer and the second composition may be referred to as neutralizing buffer. In some embodiments, the non- ionic surfactant is maintained in one or more downstream reactions. For example, a presence of a non-ionic surfactant can be maintained when amplifying cfDNA. In some examples, the cfDNA is fragmented or degraded. In some examples, the cfDNA is amplified in the presence of a stability buffer and the non-ionic surfactant, for example, as shown in FIG. 17.
[0063] In some embodiments, the kit further comprises additional buffers. For example, the kit may further comprise reagents for performing amplification, such as isothermal amplification or PTA. The kit may comprise reagents for performing genomics and/or multiomics analysis. The kit may comprise ResolveOME™ reagents from BioSkryb Genomics. The kit may comprise reverse transcription (RT) buffers. A reverse transcription buffer may comprise a salt (e.g., Tris- HCL), a divalent cation, DTT, Tween, Triton, dATP, dCTP, dTTP, dGTP, Ammonium Sulfate, or any combination thereof. In an example, the reverse transcription buffer may comprise a salt such as Tris-HCL at a concentration of from about 30 mM to about 80 mM, for example 60 mM.
The RT buffer may comprise the divalent cation at a concentration from about 8 mM to about 20 mM, in an example 12 mM. The RT buffer may comprise DTT at a concentration from about 10 mM to about 50 mM, in an example 20 mM. The RT buffer may comprise a non-ionic surfactant, such as a polysorbate, such as Tween (e.g., Tween-20) at a concentration from about 1% to about 10%, in an example, about 5%. The RT buffer may further comprise Triton at a concentration from about 0% to about 5%, in an example 0.5%. The RT buffer may comprise dATP, dCTP, dTTP, dGTP, GTP, or any combination thereof, each at a concentration of from about 1 mM to about 10 mM, or from about 3 mM to about 6 mM, in an example, from about 4mM to about 5mM. The RT buffer may further comprise Ammonium Sulfate at a concentration from about 10 mM to 100 mM, at least about 20 mM, at least about 30 mM, at least about 40 mM, at least about 50 mM, at least about 60 mM, or higher, in an example, 75 mM. Examples of RT buffer recipes are provided in FIG. 5A.
[0064] In an aspect provided herein is a method of cell analysis, comprising: (a) providing or obtaining a sample comprising one or more cells stored in a composition comprising: (i) a salt; (ii) an ionic surfactant; (iii) a protease; (iv) a reducing agent; and (v) a chelator. The method may further comprise (b) amplifying genomic materials of the one or more cells; and (c) performing genomic analysis on the genomic materials of the one or more cells. The composition may be any composition described anywhere throughout the present disclosure. Genomic analysis may comprise performing isothermal amplification.
[0065] In some examples, the one or more cells comprises from 1 to about 10 cells. In some examples, the sample has a single cell. In some examples, the sample comprises at most about 1000, at most about 800, at most about 600, at most about 400, at most about 200, at most about 100, at most about 80, at most about 60, at most about 40, at most about 20, at most about 10, at most about 8, at most about 6, at most about 4, a smaller number of cells, or a single cell. In some examples, the method does not require column filtration and is amenable to retrieval of the genomic materials of the one or more cells from the sample. The sample loss caused by the methods and compositions may be substantially minimal. In some cases, certain washing protocols, or multiple washing steps may lead to sample loss or cell loss from samples. Samples containing small numbers of cells or single cells may be sensitive to filtration and/or washing steps. The compositions of the present disclosure may make it possible to prepare and store the sample without column filtration. This may reduce or prevent sample loss or cell loss from the sample. This may improve the results of genomic analysis performed on the sample. This may enable sample storage for longer periods, retrieval of genomic contents from the sample with minimal to no compromise thereto, facilitate safe shipment of the sample, and reduce the sensitivity of the sample to storage conditions such as temperature, humidity, and pressure. The
compositions may work particularly well for samples containing small numbers of cells or single cells as described elsewhere herein, filling existing gap in preceding technologies.
[0066] In some examples, the one or more cells is a single cell, and genomic analysis is single cell genomics or multiomics, wherein the composition is configured for storing the single cell and preparing the genomic contents of the single cell for downstream amplification.
Amplification may be isothermal amplification. In some examples, the method further comprises sorting the cells prior to storage. In some examples, the method further comprises neutralizing the composition with a second composition. In some examples, the second composition comprises a neutralizing buffer capable of neutralizing the ionic surfactant. In some cases, nonionic surfactant may comprise a polysorbate, a polyethylene glycol) derivative, or both. In some examples, the neutralizing buffer comprises Tween (e.g., Tween-20, Tween-40, Tween-60, or Tween-80) or Triton (e.g., Triton-X-100). The composition may be referred to as storage buffer and the second composition may be referred to as neutralizing buffer. In some embodiments, the non-ionic surfactant is maintained in one or more downstream reactions. For example, a presence of a non-ionic surfactant can be maintained when amplifying cfDNA. In some examples, the cfDNA is fragmented or degraded. In some examples, the cfDNA is amplified in the presence of a stability buffer and the non-ionic surfactant, for example, as shown in FIG. 17. [0067] In some examples, the ionic surfactant comprises from about 0.00001 to about 1 volume percent of the composition. In some examples, the ionic surfactant comprises from about 0.03 to about 0.09 volume percent of the composition. In some examples, the ionic surfactant comprises from about 0.01 to about 0.05 volume percent of the composition. In some examples, the protease comprises from about 0.01 units to about 5 units per milliliter of the composition. In some examples, the ionic surfactant comprises or is sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof. In some examples, the protease comprises or is Proteinase K. In some examples, the salt comprises or is Tris-HCl, TES, or HEPES. In some examples, the chelator comprises or is Ethylenediaminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA). In some examples, the composition further comprises a second salt different from the salt, wherein the second salt is capable of stimulating the activity of the ionic surfactant. In some examples, the second salt comprises or is a divalent cation. In some examples, the second salt comprises or is calcium chloride.
[0068] In some examples, the method further comprises lysing the one or more cells and extracting the genomic materials thereof. In some cases, cells may be lysed after storage. Genomic materials may comprise the whole genomes of the one or more cells. In some
examples, genomic materials comprise nucleic acid molecules. In some examples, nucleic acid molecules comprise ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or both. The method may further comprise performing genomic analysis on the one or more cells. Genomic analysis may comprise or be single cell genomics or multiomics. In some examples, genomic analysis comprises isothermal amplification, such as primary template amplification (PTA).
[0069] In some examples, the methods, systems, and compositions for cell storage are compatible with single cell samples or samples with small cell numbers, long term storage and shipment, fluctuation or changes in the storage conditions such as temperature, pressure, humidity, and other conditions, downstream amplification of genomic materials of the cells upon or after storage, and any combination thereof, such as integrated workflows comprising any combination or all of the aforementioned procedures. The methods may be performed using any composition, buffer recipe, and kit described anywhere herein.
[0070] Use of the cell storage and neutralization buffers provided herein in some instances provide comparable sample quality and sequencing metrics after storge which are comparable to freshly harvested cells. Provided herein are methods of cell analysis, comprising one or more steps of: providing a single cell stored in a buffer for a time period of at least 1 day; lysing the single cell; amplifying mRNA transcripts and genomic DNA from the cell to generate cDNA and genomic DNA libraries using a neutralization buffer provided herein, respectively; and sequencing mRNA transcripts and genomic DNA from the cell to obtain one or more sequencing metrics, wherein one or more of: the yield of pre-amplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of preamplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day. In some instances, the storage temperature changes or fluctuates. In some instances, sequencing metrics comprises one or more Picard Metrics. In some instances, sequencing metrics comprises one or more of PreSeq, protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera, and Gini index. In some instances, one or more of the following is true: the yield of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within 5x of values obtained when
compared to storage conditions having a time period of less than 1 day. In some instances, one or more of the following is true: the yield of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day. In some instances, the single cell is suspended in a composition provided herein. In some instances, one or more of the following is true: the yield of pre-amplification of cDNA or genomic DNA comprise values about the same as values obtained when compared to storage conditions having a time period of less than 1 day; the average fragment size of preamplification of cDNA or genomic DNA comprise values about the same as values obtained when compared to storage conditions having a time period of less than 1 day; and the sequencing metrics comprise values about the same as values obtained when compared to storage conditions having a time period of less than 1 day. In some instances, the time period is at least 10 days. In some instances, the time period is 1-30, 1-20, 1-15, 1-10, 3-20, 3-15, 3-12, 3- 10, 3-8, 3-5, 5-20, 5-15, 5-10, 8-30, 8-20, 8-15, 8-12, or 10-20 days. In some instances, the single cell is an embryonic cell. In some instances, the single cell is a human embryonic cell. In some instances, providing comprises shipping or storing. In some instances, wherein the time period is at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or at least 20 days. In some instances, the single cell was stored at a storage temperature. In some instances, the storage temperature is from about -20 to about 30 degrees Celsius (°C), optionally with a variation of from about 0 to about 20%.
[0071] Methods provided herein in some instances result in improved downstream processing outcomes. In some instances, improvements are obtained vs. a standard cell buffer, such as PBS. In some instances, use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75% increase in total yield of DNA obtained from a PTA reaction, relative to use without the storage buffer. In some instances, use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75% increase in total yield of cDNA obtained from a multiomics PTA workflow, relative to use without the storage buffer. In some instances, use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75% increase in the number of detected protein coding genes obtained from a multiomics PTA workflow, relative to use without the storage buffer. In some instances, use of storage buffers provided herein result in an increase of 5, 10, 15, 20, 25, 30, 50, or at least a 75%
increase in the proportion of exonic genes from a multiomics PTA workflow, relative to use without the storage buffer.
[0072] Provided herein are methods of amplifying a fragmented or degraded nucleic acid sample comprising one or more steps of: (a) providing or obtaining a sample comprising fragmented or degraded nucleic acids; (b) suspending the fragmented or degraded nucleic acids in a composition described herein; (c) amplifying genomic materials of the one or more cells, wherein a non-ionic surfactant is maintained during amplification; and (d) performing genomic analysis on the genomic materials of the one or more cells. Further provided herein are methods wherein the nucleic acid sample comprises ctDNA. Further provided herein are methods wherein the nucleic acid sample comprises cfDNA. Further provided herein are methods wherein the composition is present at 10X concentration. Further provided herein are methods wherein a total yield of amplicons is within 5, 10, 15, 20, 25, 30, 40, 50, or within 60 % compared to amplification without the composition.
Multiomics
[0073] Provided herein are methods for multiomics sample preparation and/or analysis. In some instances, a method comprises one or more steps of isolating a single cell from a population of cells, wherein the single cell comprises RNA and genomic DNA; amplifying the RNA by RT- PCR to generate a cDNA library; isolating the cDNA from the genomic DNA; contacting the genomic DNA with at least one amplification primer, at least one nucleic acid polymerase, and a mixture of nucleotides; isolating the cDNA from a genomic library, and sequencing the cDNA library and the genomic DNA library. In some instances, the mixture of nucleotides comprises at least one nucleotide configured for digestion (or removal, or reaction) by an enzyme or chemical process. In some instances, the mixture of nucleotides comprises dUTP. In some instances, the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the polymerase to generate a genomic DNA library. In some instances, a terminator nucleotide comprises an irreversible terminator. In some instances, an irreversible terminator inhibits or is resistant to 3’ to 5’ exonuclease activity. In some instances, a multi omic experiment comprises measuring expression levels of a panel of mRNAs.
[0074] Methods described herein (e.g., PTA) may be used as a replacement for any number of other known methods in the art which are used for single cell sequencing (multiomics or the like). PTA may substitute genomic DNA sequencing methods such as MDA, PicoPlex, DOP- PCR, MALBAC, or target-specific amplifications. In some instances, PTA replaces the standard genomic DNA sequencing method in a multiomics method including DR-seq (Dey et al., 2015), G&T seq (MacAulay et al., 2015), scMT-seq (Hu et al., 2016), sc-GEM (Cheow et al., 2016), scTrio-seq (Hou et al., 2016), simultaneous multiplexed measurement of RNA and proteins
(Darmanis et al., 2016), scCOOL-seq (Guo et al., 2017), CITE-seq (Stoeckius et al., 2017), REAP-seq (Peterson et al., 2017), scNMT-seq (Clark et al., 2018), or SIDR-seq (Han et al., 2018). In some instances, a method described herein comprises PT A and a method of polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of non-polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of total (polyadenylated and non-polyadenylated) mRNA transcripts.
[0075] In some instances, PTA is combined with a standard RNA sequencing method to obtain genome and transcriptome data. In some instances, a multiomics method described herein comprises PTA and one of the following: Drop-seq (Macosko, et al. 2015), mRNA-seq (Tang et al., 2009), InDrop (Klein et al., 2015), MARS-seq (Jaitin et al., 2014), Smart-seq2 (Hashimshony, et al., 2012; Fish et al., 2016), CEL-seq (Jaitin et al., 2014), STRT-seq (Islam, et al., 2011), Quartz-seq (Sasagawa et al., 2013), CEL-seq2 (Hashimshony, et al. 2016), cytoSeq (Fan et al., 2015), SuPeR-seq (Fan et al., 2011), RamDA-seq (Hayashi, et al. 2018), MATQ-seq (Sheng et al., 2017), or SMARTer (Verboom et al., 2019).
[0076] Various reaction conditions and mixes may be used for generating cDNA libraries for transcriptome analysis. In some instances, an RT reaction mix is used to generate a cDNA library. In some instances, the RT reaction mixture comprises a crowding reagent, at least one primer, a template switching oligonucleotide (TSO), a reverse transcriptase, and a dNTP mix. In some instances, an RT reaction mix comprises an RNAse inhibitor. In some instances, an RT reaction mix comprises one or more surfactants. In some instances, an RT reaction mix comprises Tween-20 and/or Triton-X. In some instances, an RT reaction mix comprises Betaine. In some instances, an RT reaction mix comprises one or more salts. In some instances, an RT reaction mix comprises a magnesium salt (e.g., magnesium chloride) and/or tetramethylammonium chloride. In some instances, an RT reaction mix comprises gelatin. In some instances, an RT reaction mix comprises PEG (PEG1000, PEG2000, PEG4000, PEG6000, PEG8000, or PEG of other length).
[0077] Multi omic methods described herein may provide both genomic and RNA transcript information from a single cell (e.g., a combined or dual protocol). In some instances, genomic information from the single cell is obtained from the PTA method, and RNA transcript information is obtained from reverse transcription to generate a cDNA library. In some instances, a whole transcript method is used to obtain the cDNA library. In some instances, 3’ or 5’ end counting is used to obtain the cDNA library. In some instances, cDNA libraries are not obtained using UMIs. In some instances, a multiomic method provides RNA transcript information from the single cell for at least 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or at
least 15,000 genes. In some instances, a multi omic method provides RNA transcript information from the single cell for about 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or about 15,000 genes. In some instances, a multiomic method provides RNA transcript information from the single cell for 100-12,000 1000-10,000, 2000-15,000, 5000-15,000, 10,000-20,000, 8000- 15,000, or 10,000-15,000 genes.
[0078] In some examples, samples (e.g., cells) that are stored or analyzed according to the present disclosure express one or more genes. The one or more genes can comprise housekeeping genes that demonstrate a signature of a clinical sample. In some examples, the one or more genes can comprise GAPDH, ACTB, ACTG1, RPL36, HINT1, TBP, PPIA, HPRT1, UBC, RPL13A, PGK1, EIF3K, RPLP0, GUSB, CLTC, HMBS, EEF1A1, MALAT1, or any combination thereof (e.g., FIG. 15). In some examples, the one or more genes can comprise ATP2B1, RABL6, LGMN, CREG1, PTMS, MTND2P28, MT-ND4L, MT-ATP8, MTCO1P12, MTND1P23, DNMT3L, FTH1, MT-ATP8, RPL26, and H4C3, or any combination thereof (e.g., FIG. 16A). In some examples, the one or more genes can comprise ATP2B1, WAC, RABL6, LGMN, CREG1, EMC2, PTMS, PRMT2, GTF3C3, ZNF888, MT-ND1, MTND2P28, MT- CYB, MT-ND4L, MT-ND6, MT-ATP8, MT-ND5, MTCO1P12, MTDATP6P1, MTND1P23, RPS5, RPS27, DNMT3L, MTATP6P1, DNAJC15, FTH1, MT-ATP8, RPL26, H4C3, NDUFAB1, or any combination thereof (e.g., FIG. 16B). In some instances, recovery of the one or more genes are improved if the sample is stored or prepared using a stability buffer of the present disclosure.
[0079] In some instances, a multiomic method provides genomic sequence information for at least 80%, 90%, 92%, 95%, 97%, 98%, or at least 99% of the genome of the single cell. In some instances, a multiomic method provides genomic sequence information for about 80%, 90%, 92%, 95%, 97%, 98%, or about 99% of the genome of the single cell. RNA may be amplified in the multiomics methods described herein. In some instances, RNA is amplified to isolate mRNA transcripts. In some instances, template-switching polynucleotides are used. In some instances, amplification of RNA uses labeled primers. In some instances, a label comprises biotin. In some instances, at least some of the cDNA polynucleotides are isolated with affinity binding to the label. In some instances, multiomics methods comprise amplification of RNA to generate a cDNA library. In some instances, a cDNA library is generated having at least 10, 20, 30, 50, 75, 100, 125, 150, 175, 200, 225, 250, 300, 350, 400, 500, 600, 700, 800, 900 or at least 1,000 ng of DNA. In some instances, a cDNA library is generated having 10-500, 20-500, 30-500, 50-500, 50-400, 50-300, 100-500, 100-400, 100-300, 100-200, 200-500, 300-500, 400-750, 500-1,000, 600-1,200, 800-1,500, or 1,000-1,500 ng of DNA. In some instances, at least some polynucleotides in the cDNA library comprise a barcode. In some instances, the cDNA
comprises polynucleotides corresponding to at least 100, 500, 1000, 1500, 2000, 2500, 3000, 3500, or at least 4000 genes. In some instances, the cDNA comprises a 5’ to 3’ transcript bias of 0.5-1.5, 0.6-1.5, 0.7-1.5, 0.8-1.5, 0.9-1.5, 0.8-1.5, 1-1.5, 1-2.0, 1.2-2.0, 0.5-2.0.
[0080] Multi omic methods may comprise analysis of single cells from a population of cells. In some instances, at least 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or at least 8000 cells are analyzed. In some instances, about 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or about 8000 cells are analyzed. In some instances, 5-100, 10-100, 50-500, 100-500, 100-1000, 50-5000, 100- 5000, 500-1000, 500-10000, 1000-10000, or 5000-20,000 cells are analyzed.
[0081] Multi omic methods may generate yields of genomic DNA from the PTA reaction based on the type of single cell. In some instances, the amount of DNA generated from a single cell is about 0.1, 1, 1.5, 2, 3, 5, or about 10 micrograms. In some instances, the amount of DNA generated from a single cell is about 0.1, 1, 1.5, 2, 3, 5, or about 10 femtograms. In some instances, the amount of DNA generated from a single cell is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 micrograms. In some instances, the amount of DNA generated from a single cell is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 femtograms. In some instances, the amount of DNA generated from a single cell is about 0.1-10, 1-10, 1.5-10, 2-20, 2-50, 1-3, or 0.5-3.5 micrograms. In some instances, the amount of DNA generated from a single cell is about 0.1-10, 1-10, 1.5-10, 2-20, 2-4, 1-3, or 0.5-4 femtograms. In some instances, the amount of DNA generated from a single cell is about 0.5-2.5, 0.5-3, 0.5-5, 0.2-5, 1-2.5, or 1-5 ng of DNA. In some instances, the amount of DNA generated from a single cell is at least 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 4, or at least 5 ng of DNA.
[0082] DNA libraries may comprise an allelic balance. In some instances, the allelic balance is 50-100, 60-100, 70-100, 80-100, 60-95, 70-95, 80-95, 85-95, 90-95, 90-98, 90-99, 85-99, or 95- 99 percent. In some instances, the allelic balance is at least 50, 60, 70, 80, 83, 85, 87, 90, 92, 95, 98, or at least 99 percent.
[0083] DNA libraries may comprise a sensitivity for one or more SNVs. In some instances, the sensitivity is 0.50-1, 0.60-1, 0.70-1, 0.80-1, 0.60-0.95, 0.70-0.95, 0.80-0.95, 0.85-0.95, 0.90- 0.95, 0.90-0.98, 0.90-0.99, 0.85-0.99, or 0.95-0.99. In some instances, the sensitivity is at least 0.50, 0.60, 0.70, 0.80, 0.83, 0.85, 0.87, 0.90, 0.92, 0.95, 0.98, or at least 0.99.
[0084] DNA libraries may comprise a precision for one or more SNVs. In some instances, the precision is 0.50-1, 0.60-1, 0.70-1, 0.80-1, 0.60-0.95, 0.70-0.95, 0.80-0.95, 0.85-0.95, 0.90-0.95, 0.90-0.98, 0.90-0.99, 0.85-0.99, or 0.95-0.99. In some instances, the precision is at least 0.50, 0.60, 0.70, 0.80, 0.83, 0.85, 0.87, 0.90, 0.92, 0.95, 0.98, or at least 0.99.
Methylome analysis
[0085] Described herein are methods comprising PTA, wherein sites of methylated DNA in single cells are determined using the PTA method. In some instances, methylome analysis comprises identifying the location of methylated bases (e.g., methylC, hydroxymethylC). In some instances, these methods further comprise parallel analysis of the transcriptome, methylome, and/or proteome of the same cell. Methods of detecting methylated genomic bases include selective restriction with methylation-sensitive restriction enzymes or endonucleases, followed by processing with the PTA method. Sites cut by such enzymes are determined from sequencing, and methylated bases are identified. In another instance, bisulfite treatment of genomic DNA libraries converts unmethylated cytosines to uracil. Libraries are then in some instances amplified with methylation-specific primers which selectively anneal to methylated sequences. Alternatively, non-methylation-specific PCR is conducted, followed by one or more methods to discriminate between bisulfite-reacted bases, including direct pyrosequencing, MS- SnuPE, HRM, COBRA, MS-SSCA, or base-specific cleavage/MALDI-TOF. In some instances, genomic DNA samples are split for parallel analysis of the genome (or an enriched portion thereof) and methylome analysis. In some instances, analysis of the genome and methylome comprises enrichment of genomic fragments (e.g., exome, or other targets) or whole genome sequencing. In some instances, methylated bases in a genomic sample are identified by (a) conversion of a methylated base to a different base, or (b) conversion of a non-methylated base to a different base. Such conversions in some instances are performed on whole genomes or genomic fragments. The resulting sequences are then compared to a reference sequence (obtained without conversion/treatment) to identify which bases are methylated. In some instances, a conversion method (or process) comprises treatment with a deamination reagent. In some instances, a conversion method comprises treatment with bisulfate. In some instances, one or more enzymes are used to selectively discriminate between methylated and unmethylated bases. In some instances, enzymes comprises TET (ten eleven translocation) family enzymes. In some instances, a TET family enzyme comprises TET2. In some instances, enzymes comprise T4-BGT. In some instances, a conversion method comprises treatment with a reagent to protect methylcytosines (e.g., TET2 for oxidation), followed by treatment with an enzyme to deaminate unprotected cytosines (e.g., APOBEC). Additional reagents which differentiate methylated and non-methylated bases are also consistent with the methods disclosed herein. In some instances, unmethylated cytosines are converted to uracil. In some instances, amplification of these uracil- containing modified genomes results in conversion of uracil to thymine. In some instances, amplification comprises use of uracil tolerant polymerases described herein. In some instances, adapters described herein are modified to replace cytosines with methylcytosines or other base
which resists conversion. In some instances, the methods may comprise single cell bisulfite sequencing and reduced representation bisulfite sequencing.
Bioinformatics
[0086] The data obtained from single-cell analysis methods such as whole genome amplification, in some case isothermal amplification, and in some cases PTA described herein may be compiled into a database. Described herein are methods and systems of bioinformatic data integration. Data from the proteome, genome, transcriptome, methylome or other data is in some instances combined/integrated into a database and analyzed. Bioinformatic data integration methods and systems in some instances comprise one or more of protein detection (FACS and/or NGS), mRNA detection, and/or genome variance detection. In some instances, this data is correlated with a disease state or condition. In some instances, data from a plurality of single cells is compiled to describe properties of a larger cell population, such as cells from a specific sample, region, organism, or tissue. In some instances, protein data is acquired from fluorescently labeled antibodies which selectively bind to proteins on a cell. In some instances, a method of protein detection comprises grouping cells based on fluorescent markers and reporting sample location post-sorting. In some instances, a method of protein detection comprises detecting sample barcodes, detecting protein barcodes, comparing to designed sequences, and grouping cells based on barcode and copy number. In some instances, protein data is acquired from barcoded antibodies which selectively bind to proteins on a cell. In some instances, transcriptome data is acquired from sample and RNA specific barcodes. In some instances, a method of mRNA detection comprises detecting sample and RNA specific barcodes, aligning to genome, aligning to RefSeq/Encode, reporting Exon/Intro/Intergenic sequences, analyzing exon-exon junctions, grouping cells based on barcode and expression variance and clustering analysis of variance and top variable genes. In some instances, genomic data is acquired from sample and DNA specific barcodes. In some instances, a method of genome variance detection comprises detecting sample and DNA specific barcodes, aligning to the genome, determine genome recovery and SNV mapping rate, filtering reads on exon-exon junctions, generating variant call file (VCF), and clustering analysis of variance and top variable mutations.
Mutations
[0087] In some instances, the methods (e.g., multiomic PTA) described herein result in higher detection sensitivity and/or lower rates of false positives for the detection of mutations. In some instances a mutation is a difference between an analyzed sequence (e.g., using the methods described herein) and a reference sequence. Reference sequences are in some instances obtained from other organisms, other individuals of the same or similar species, populations of
organisms, or other areas of the same genome. In some instances, mutations are identified on a plasmid or chromosome. In some instances, a mutation is an SNV (single nucleotide variation), SNP (single nucleotide polymorphism), or CNV (copy number variation, or CNA/copy number aberration). In some instances, a mutation is base substitution, insertion, or deletion. In some instances, a mutation is a transition, transversion, nonsense mutation, silent mutation, synonymous or non-synonymous mutation, non-pathogenic mutation, missense mutation, or frameshift mutation (deletion or insertion). In some instances, PTA results in higher detection sensitivity and/or lower rates of false positives for the detection of mutations when compared to methods such as in-silico prediction, ChlP-seq, GUIDE-seq, circle-seq, HTGTS (High- Throughput Genome-Wide Translocation Sequencing), IDLV (integration-deficient lentivirus), Digenome-seq, FISH (fluorescence in situ hybridization), or DISCOVER-seq.
Primary Template Amplification (PTA) and Genomics Analysis
[0088] The methods and compositions of the present disclosure may be used for genomics screens. For example, cells may be prepared and sorted at predetermined cell densities and sample quantities prior to storage in a composition of the present disclosure (e.g., a storage buffer). The sample may be stored, in some cases for the long term, and optionally shipped from one place to another. The composition may keep the sample, the cells, and genomic contents of the cells stable. A second composition (e.g., a neutralizing buffer) different from the composition (e.g., the storage buffer), may be used to neutralize the composition. For example, a storage buffer may comprise an ionic surfactant which can help reduce DNAse and RNAse activity in the stored sample, and the neutralizing buffer may comprise a non-ionic surfactant which may neutralize the ionic surfactant in the storage buffer. The cells may then be prepared and amplified. Amplification may comprise or be isothermal amplification. In some cases, amplification may be primary template amplification.
[0089] Described herein are nucleic acid amplification methods, such as “Primary Template- Directed Amplification (PTA).” In some instances, PTA is combined with other analysis workflows for genomic and/or multi-omic analysis. With the PTA method, amplicons are preferentially generated from the primary template (“direct copies”) using a polymerase (e.g., a strand displacing polymerase). Consequently, errors are propagated at a lower rate from daughter amplicons during subsequent amplifications compared to MDA. In some examples, this method can amplify low DNA input including the genomes of single cells with high coverage breadth and uniformity in an accurate and reproducible manner. In some instances, PTA enables kinetic control of an amplification reaction. In some instances, PTA results in a pseudo-linear amplification reaction (rather than exponential amplification). Moreover, the terminated amplification products can undergo direction ligation after removal of the
terminators, allowing for the attachment of a cell barcode to the amplification primers so that products from all cells can be pooled after undergoing parallel amplification reactions. In some instances, template nucleic acids are not bound to a bead. In some instances, direct copies of template nucleic acids are not bound to a bead. In some instances, one or more primers are not bound to a bead. In some instances, no primers are not bound to a bead. In some instances, a primer is attached to a first bead, and a template nucleic acid is attached to a second bead, wherein the first and the second bead are not the same. In some instances, PTA is used to analyze single cells from a larger population of cells. In some instances, PTA is used to analyze more than one cell from a larger population of cells, or an entire population of cells. The methods and systems of the present disclosure can be used to co-encapsulate a cell with one or more beads delivering the components of the reaction such as a primer into a reaction chamber of the device. The device comprises a plurality of reaction chambers. A sub-population or each of the plurality of reaction chambers may encapsulate a single cell, a first bead, and in some cases a second bead, wherein each of the first bead and the second bead deliver a reagent (e.g., a primer, a probe, or another reagent) to the reaction chamber to participate in the reaction with the cell.
[0090] Described herein are methods employing nucleic acid polymerases with strand displacement activity for amplification. In some instances, such polymerases comprise strand displacement activity and low error rate. In some instances, such polymerases comprise strand displacement activity and proofreading exonuclease activity, such as 3 ’->5’ proofreading activity. In some instances, nucleic acid polymerases are used in conjunction with other components such as reversible or irreversible terminators, or additional strand displacement factors. In some instances, the polymerase has strand displacement activity, but does not have exonuclease proofreading activity. For example, in some instances such polymerases include bacteriophage phi29 ( 29) polymerase, which also has very low error rate that is the result of the 3’->5’ proofreading exonuclease activity (see, e.g., U.S. Pat. Nos. 5,198,543 and 5,001,050). In some instances, examples of strand displacing nucleic acid polymerases include, e.g., genetically modified phi29 (<E>29) DNA polymerase, KI enow Fragment of DNA polymerase I (Jacobsen et al., Eur. J. Biochem. 45:623-627 (1974)), phage M2 DNA polymerase (Matsumoto et al., Gene 84:247 (1989)), phage phiPRDl DNA polymerase (Jung et al., Proc. Natl. Acad. Sci. USA 84:8287 (1987); Zhu and Ito, Biochim. Biophys. Acta. 1219:267-276 (1994)), Bst DNA polymerase (e.g., Bst large fragment DNA polymerase (Exo(-) Bst; Aliotta et al., Genet. Anal. (Netherlands) 12: 185-195 (1996)), exo(-)Bca DNA polymerase (Walker and Linn, Clinical Chemistry 42: 1604-1608 (1996)), Bsu DNA polymerase, VentRDNA polymerase including VentR(exo-) DNA polymerase (Kong et al., J. Biol. Chem. 268: 1965-1975 (1993)),
Deep Vent DNA polymerase including Deep Vent (exo-) DNA polymerase, IsoPol DNA polymerase, DNA polymerase I, Therminator DNA polymerase, T5 DNA polymerase (Chatterjee et al., Gene 97: 13-19 (1991)), Sequenase (U.S. Biochemicals), T7 DNA polymerase, T7-Sequenase, T7 gp5 DNA polymerase, PRDI DNA polymerase, T4 DNA polymerase (Kaboord and Benkovic, Curr. Biol. 5: 149-157 (1995)). Additional strand displacing nucleic acid polymerases are also compatible with the methods described herein. The ability of a given polymerase to carry out strand displacement replication can be determined, for example, by using the polymerase in a strand displacement replication assay (e.g., as disclosed in U.S. Pat. No. 6,977,148). Such assays in some instances are performed at a temperature suitable for optimal activity for the enzyme being used, for example, 32°C for phi29 DNA polymerase, from 46°C to 64°C for exo(-) Bst DNA polymerase, or from about 60°C to 70°C for an enzyme from a hyperthermophylic organism. Another useful assay for selecting a polymerase is the primerblock assay described in Kong et al., J. Biol. Chem. 268: 1965-1975 (1993). The assay consists of a primer extension assay using an M13 ssDNA template in the presence or absence of an oligonucleotide that is hybridized upstream of the extending primer to block its progress. Other enzymes capable of displacement the blocking primer in this assay are in some instances useful for the disclosed method. In some instances, polymerases incorporate dNTPs and terminators at approximately equal rates. In some instances, the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are about 1 : 1, about 1.5: 1, about 2: 1, about 3: 1 about 4: 1 about 5: 1, about 10: 1, about 20: 1 about 50: 1, about 100: 1, about 200: 1, about 500: 1, or about 1000: 1. In some instances, the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are 1 : 1 to 1000: 1, 2:1 to 500: 1, 5: 1 to 100: 1, 10: 1 to 1000: 1, 100: 1 to 1000: 1, 500: 1 to 2000: 1, 50: 1 to 1500: 1, or 25: 1 to 1000: 1.
[0091] A polynucleotide mixture used herein for PTA may comprise dNTPs. In some instances, dNTPs comprise one or more of dA, dT, dG, and dC. In some instances, the concentration of dNTPs is no more than 10, 8, 7, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 mM. In some instances, the concentration of dNTPs is 0.5-10, 0.5-5, 0.5-3, 0.5-2.5, 0.5-2, 0.5-1.5, 0.5-1, 0.1-5, 0.1-3, 0.1-3, 1-3, 0.5-2.5, or 1-2 mM. Such mixtures in some instances also comprise one or more terminators.
[0092] A polynucleotide mixture used herein for PTA may comprise terminators. In some instances, terminators comprise ddNTPs. In some instances, terminators comprise irreversible terminators. In some instances, irreversible terminators comprise alpha-thio dideoxynucleotides. In some instances, the concentration of terminators is no more than 1, 0.8, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, or no more than 0.001 mM. In some instances, the concentration of
dNTPs is 0.05-1, 0.05-0.5, 0.05-0.3, 0.05-0.25, 0.05-0.2, 0.05-0.15, 0.05-0.1, 0.01-0.5, 0.01-0.3, 0.01-0.3, 0.1-0.3, 0.05-0.25, or 0.1-0.2 mM.
[0093] Described herein are methods of amplification wherein strand displacement can be facilitated through the use of a strand displacement factor, such as, e.g., helicase. Such factors are in some instances used in conjunction with additional amplification components, such as polymerases, terminators, or other component. In some instances, a strand displacement factor is used with a polymerase that does not have strand displacement activity. In some instances, a strand displacement factor is used with a polymerase having strand displacement activity. Without being bound by theory, strand displacement factors may increase the rate that smaller, double stranded amplicons are reprimed. In some instances, any DNA polymerase that can perform strand displacement replication in the presence of a strand displacement factor is suitable for use in the PTA method, even if the DNA polymerase does not perform strand displacement replication in the absence of such a factor. Strand displacement factors useful in strand displacement replication in some instances include (but are not limited to) BMRF1 polymerase accessory subunit (Tsurumi et al., J. Virology 67(12):7648-7653 (1993)), adenovirus DNA-binding protein (Zijderveld and van der Vliet, J. Virology 68(2): 1158-1164 (1994)), herpes simplex viral protein ICP8 (Boehmer and Lehman, J. Virology 67(2):711-715 (1993); Skaliter and Lehman, Proc. Natl. Acad. Sci. USA 91(22): 10665-10669 (1994)); single-stranded DNA binding proteins (SSB; Rigler and Romano, J. Biol. Chem. 270:8910-8919 (1995)); phage T4 gene 32 protein (Villemain and Giedroc, Biochemistry 35: 14395-14404 (1996);T7 helicase- primase; T7 gp2.5 SSB protein; Tte-UvrD (from Thermoanaerobacter tengcongensis), calf thymus helicase (Siegel et al., J. Biol. Chem. 267: 13629-13635 (1992)); bacterial SSB (e.g., E. coli SSB), Replication Protein A (RPA) in eukaryotes, human mitochondrial SSB (mtSSB), and recombinases, (e.g., Recombinase A (RecA) family proteins, T4 UvsX, T4 UvsY, Sak4 of Phage HK620, Rad51, Dmcl, or Radb). Combinations of factors that facilitate strand displacement and priming are also consistent with the methods described herein. For example, a helicase is used in conjunction with a polymerase. In some instances, the PTA method comprises use of a singlestrand DNA binding protein (SSB, T4 gp32, or other single stranded DNA binding protein), a helicase, and a polymerase (e.g., SauDNA polymerase, Bsu polymerase, Bst2.0, GspM, GspM2.0, GspSSD, or other suitable polymerase). In some instances, reverse transcriptases are used in conjunction with the strand displacement factors described herein. In some instances, reverse transcriptases are used in conjunction with the strand displacement factors described herein. In some instances, amplification is conducted using a polymerase and a nicking enzyme (e.g., “NEAR”), such as those described in US 9,617,586. In some instances, the nicking
enzyme is Nt.BspQI, Nb.BbvCi, Nb.BsmI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nt.BbvCI, Nt.BstNBI, Nt.CviPII, Nb.BpulOI, or Nt.BpulOI.
[0094] Described herein are amplification methods comprising use of terminator nucleotides, polymerases, and additional factors or conditions. For example, such factors are used in some instances to fragment the nucleic acid template(s) or amplicons during amplification. In some instances, such factors comprise endonucleases. In some instances, factors comprise transposases. In some instances, mechanical shearing is used to fragment nucleic acids during amplification. In some instances, nucleotides are added during amplification that may be fragmented through the addition of additional proteins or conditions. For example, uracil is incorporated into amplicons; treatment with uracil D-glycosylase fragments nucleic acids at uracil-containing positions. Additional systems for selective nucleic acid fragmentation are also in some instances employed, for example an engineered DNA glycosylase that cleaves modified cytosine-pyrene base pairs. (Kwon, et al. Chem Biol. 2003, 10(4), 351)
[0095] Described herein are amplification methods comprising use of terminator nucleotides, which terminate nucleic acid replication thus decreasing the size of the amplification products. Such terminators are in some instances used in conjunction with polymerases, strand displacement factors, or other amplification components described herein. In some instances, terminator nucleotides reduce or lower the efficiency of nucleic acid replication. Such terminators in some instances reduce extension rates by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Such terminators in some instances reduce extension rates by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%- 80%. In some instances, terminators reduce the average amplicon product length by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Terminators in some instances reduce the average amplicon length by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%-80%. In some instances, amplicons comprising terminator nucleotides form loops or hairpins which reduce a polymerase’s ability to use such amplicons as templates. Use of terminators in some instances slows the rate of amplification at initial amplification sites through the incorporation of terminator nucleotides (e.g., dideoxynucleotides that have been modified to make them exonuclease-resistant to terminate DNA extension), resulting in smaller amplification products. By producing smaller amplification products than the currently used methods (e.g., average length of 50-2000 nucleotides in length for PTA methods as compared to an average product length of >10,000 nucleotides for MDA methods) PTA amplification products in some instances undergo direct ligation of adapters without the need for fragmentation, allowing for efficient incorporation of cell barcodes and unique molecular identifiers (UMI).
[0096] Terminator nucleotides are present at various concentrations depending on factors such as polymerase, template, or other factors. For example, the amount of terminator nucleotides in some instances is expressed as a ratio of non-terminator nucleotides to terminator nucleotides in a method described herein. Such concentrations in some instances allow control of amplicon lengths. In some instances, the ratio of terminator to non-terminator nucleotides is modified for the amount of template present or the size of the template. In some instances, the ratio of ratio of terminator to non-terminator nucleotides is reduced for smaller samples sizes (e.g., femtogram to picogram range). In some instances, the ratio of non-terminator to terminator nucleotides is about 2:l, 5: 1, 7:1, 10: 1, 20: 1, 50: 1, 100: 1, 200: 1, 500: 1, 1000: 1, 2000: 1, or 5000: 1. In some instances the ratio of non-terminator to terminator nucleotides is 2: 1-10: 1, 5: 1-20: 1, 10: 1-100: 1, 20: 1-200:1, 50: 1-1000: 1, 50: 1-500: 1, 75: 1-150: 1, or 100: 1-500: 1. In some instances, at least one of the nucleotides present during amplification using a method described herein is a terminator nucleotide. Each terminator need not be present at approximately the same concentration; in some instances, ratios of each terminator present in a method described herein are optimized for a particular set of reaction conditions, sample type, or polymerase. Without being bound by theory, each terminator may possess a different efficiency for incorporation into the growing polynucleotide chain of an amplicon, in response to pairing with the corresponding nucleotide on the template strand. For example, in some instances a terminator pairing with cytosine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with thymine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with guanine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with adenine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with uracil is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. Any nucleotide capable of terminating nucleic acid extension by a nucleic acid polymerase in some instances is used as a terminator nucleotide in the methods described herein. In some instances, a reversible terminator is used to terminate nucleic acid replication. In some instances, a non-reversible terminator is used to terminate nucleic acid replication. In some instances, non-limited examples of terminators include reversible and non-reversible nucleic acids and nucleic acid analogs, such as, e.g., 3’ blocked reversible terminator comprising nucleotides, 3’ unblocked reversible terminator comprising nucleotides, terminators comprising 2’ modifications of deoxynucleotides, terminators comprising modifications to the nitrogenous base of
deoxynucleotides, or any combination thereof. In one embodiment, terminator nucleotides are dideoxynucleotides. Other nucleotide modifications that terminate nucleic acid replication and may be suitable for practicing the invention include, without limitation, any modifications of the r group of the 3’ carbon of the deoxyribose such as inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof. In some instances, terminators are polynucleotides comprising 1, 2, 3, 4, or more bases in length. In some instances, terminators do not comprise a detectable moiety or tag (e.g., mass tag, fluorescent tag, dye, radioactive atom, or other detectable moiety). In some instances, terminators do not comprise a chemical moiety allowing for attachment of a detectable moiety or tag (e.g., “click” azide/alkyne, conjugate addition partner, or other chemical handle for attachment of a tag). In some instances, all terminator nucleotides comprise the same modification that reduces amplification to at region (e.g., the sugar moiety, base moiety, or phosphate moiety) of the nucleotide. In some instances, at least one terminator has a different modification that reduces amplification. In some instances, all terminators have a substantially similar fluorescent excitation or emission wavelengths. In some instances, terminators without modification to the phosphate group are used with polymerases that do not have exonuclease proofreading activity. Terminators, when used with polymerases which have 3 ’->5’ proofreading exonuclease activity (such as, e.g., phi29) that can remove the terminator nucleotide, are in some instances further modified to make them exonuclease-resistant. For example, dideoxynucleotides are modified with an alpha-thio group that creates a phosphorothioate linkage which makes these nucleotides resistant to the 3 ’->5’ proofreading exonuclease activity of nucleic acid polymerases. Such modifications in some instances reduce the exonuclease proofreading activity of polymerases by at least 99.5%, 99%, 98%, 95%, 90%, or at least 85%. Examples of other terminator nucleotide modifications providing resistance to the 3’->5’ exonuclease activity include in some instances: nucleotides with modification to the alpha group, such as alpha-thio dideoxynucleotides creating a phosphorothioate bond, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' Fluoro bases, 3' phosphorylation, 2'-O-Methyl modifications (or other 2’-O-alkyl modification), propyne-modified bases (e.g., deoxycytosine, deoxyuridine), L-DNA nucleotides, L-RNA nucleotides, nucleotides with inverted linkages (e.g., 5’ -5’ or 3 ’-3 ’), 5’ inverted bases (e.g., 5’ inverted 2’,3’-dideoxy dT), methylphosphonate backbones, and trans nucleic acids. In some instances, nucleotides with modification include base-modified nucleic acids comprising free 3’ OH groups (e.g., 2-nitrobenzyl alkylated HOMedU triphosphates, bases comprising modification with large chemical groups, such as beads or other large moiety). In some
instances, a polymerase with strand displacement activity but without 3 ’->5 ’exonuclease proofreading activity is used with terminator nucleotides with or without modifications to make them exonuclease resistant. Such nucleic acid polymerases include, without limitation, Bst DNA polymerase, Bsu DNA polymerase, Deep Vent (exo-) DNA polymerase, Klenow Fragment (exo-) DNA polymerase, Therminator DNA polymerase, and VentR (exo-).
[0097] Described herein are amplicon libraries resulting from amplification of at least one target nucleic acid molecule. Such libraries are in some instances generated using the methods described herein, such as those using terminators. Such methods comprise use of strand displacement polymerases or factors, terminator nucleotides (reversible or irreversible), or other features and embodiments described herein. In some instances, reversible terminators are capable of removal by an exonuclease (e.g., or polymerase having exonuclease activity). In some instances, irreversible terminators are not capable of substantial removal by an exonuclease (e.g., or polymerase having exonuclease activity). In some instances, amplicon libraries generated by use of terminators described herein are further amplified in a subsequent amplification reaction (e.g., PCR). In some instances, subsequent amplification reactions do not comprise terminators. In some instances, amplicon libraries comprise polynucleotides, wherein at least 50%, 60%, 70%, 80%, 90%, 95%, or at least 98% of the polynucleotides comprise at least one terminator nucleotide. In some instances, the amplicon library comprises the target nucleic acid molecule from which the amplicon library was derived. The amplicon library comprises a plurality of polynucleotides, wherein at least some of the polynucleotides are direct copies (e.g., replicated directly from a target nucleic acid molecule, such as genomic DNA, RNA, or other target nucleic acid). For example, at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 15% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least 50% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, 3%-5%, 3-10%, 5%-10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule. In some instances, at least some of the polynucleotides are direct copies of the target nucleic acid molecule, or daughter (a first copy of the target nucleic acid) progeny. For example, at least 5%, 10%, 20%, 30%, 40%, 50%, 60%,
70%, 80%, 90%, 95% or more than 95% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 5% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 10% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 20% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, at least 30% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, 3%-5%, 3%-10%, 5%-10%, 10%-20%, 20%-30%, 30%-40%, 5%-30%, 10%-50%, or 15%-75% of the amplicon polynucleotides are direct copies of the at least one target nucleic acid molecule or daughter progeny. In some instances, direct copies of the target nucleic acid are 50- 2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length. In some instances, daughter progeny are 1000-5000, 2000-5000, 1000-10,000, 2000-5000, 1500-5000, 3000-7000, or 2000-7000 bases in length. In some instances, the average length of PTA amplification products is 25-3000 nucleotides in length, 50-2500, 75-2000, 50-2000, 25-1000, 50-1000, 500-2000, or 50-2000 bases in length. In some instance, amplicons generated from PTA are no more than 5000, 4000, 3000, 2000, 1700, 1500, 1200, 1000, 700, 500, or no more than 300 bases in length. In some instance, amplicons generated from PTA are 1000-5000, 1000-3000, 200-2000, 200-4000, 500-2000, 750-2500, or 1000-2000 bases in length. Amplicon libraries generated using the methods described herein in some instances comprise at least 1000, 2000, 5000, 10,000, 100,000, 200,000, 500,000 or more than 500,000 amplicons comprising unique sequences. In some instances, the library comprises at least 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 2000, 2500, 3000, or at least 3500 amplicons. In some instances, at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of less than 1000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of no more than 2000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, at least 5%, 10%, 15%, 20%, 25%, 30% or more than 30% of amplicon polynucleotides having a length of 3000-5000 bases are direct copies of the at least one target nucleic acid molecule. In some instances, the ratio of direct copy amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000:1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1. In some instances, the ratio of direct copy amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000: 1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1, wherein the direct copy amplicons are no more than 700-1200 bases in length. In
some instances, the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000: 1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1. In some instances, the ratio of direct copy amplicons and daughter amplicons to target nucleic acid molecules is at least 10: 1, 100: 1, 1000: 1, 10,000: 1, 100,000: 1, 1,000,000: 1, 10,000,000: 1, or more than 10,000,000: 1, wherein the direct copy amplicons are 700-1200 bases in length, and the daughter amplicons are 2500-6000 bases in length. In some instances, the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50- 1000, about 150-2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule. In some instances, the library comprises about 50-10,000, about 50-5,000, about 50-2500, about 50-1000, about 150- 2000, about 250-3000, about 50-2000, about 500-2000, or about 500-1500 amplicons which are direct copies of the target nucleic acid molecule or daughter amplicons. The number of direct copies may be controlled in some instances by the number of amplification cycles. In some instances, no more than 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or 3 cycles are used to generate copies of the target nucleic acid molecule. In some instances, about 30, 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, or about 3 cycles are used to generate copies of the target nucleic acid molecule. In some instances, 3, 4, 5, 6, 7, or 8 cycles are used to generate copies of the target nucleic acid molecule. In some instances, 2-4, 2-5, 2-7, 2-8, 2-10, 2-15, 3-5, 3-10, 3-15, 4-10, 4-15, 5-10 or 5-15 cycles are used to generate copies of the target nucleic acid molecule. Amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further amplification. In some instances, such additional steps precede a sequencing step. In some instances, the cycles are PCR cycles. In some instances, the cycles represent annealing, extension, and denaturation. In some instances, the cycles represent annealing, extension, and denaturation which occur under isothermal or essentially isothermal conditions.
[0098] Methods described herein may additionally comprise one or more enrichment or purification steps. In some instances, one or more polynucleotides (such as cDNA, PTA amplicons, or other polynucleotide) are enriched during a method described herein. In some instances, polynucleotide probes are used to capture one or more polynucleotides. In some instances, probes are configured to capture one or more genomic exons. In some instances, a library of probes comprises at least 1000, 2000, 5000, 10,000, 50,000, 100,000, 200,000, 500,000, or more than 1 million different sequences. In some instances, a library of probes comprises sequences capable of binding to at least 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000 or more than 10,000 genes. In some instances, probes comprise a moiety for capture by a bead, such as biotin. In some instances, an enrichment step occurs after a PTA step. In some
instances, an enrichment step occurs before a PTA step. In some instances, probes are configured to bind genomic DNA libraries. In some instances, probes are configured to bind cDNA libraries.
[0099] Amplicon libraries of polynucleotides generated from the PTA methods and compositions (terminators, polymerases, etc.) described herein in some instances have increased uniformity. Uniformity, in some instances, is described using a Lorenz curve, or other such method. Such increases in some instances lead to lower sequencing reads needed for the desired coverage of a target nucleic acid molecule (e.g., genomic DNA, RNA, or other target nucleic acid molecule). For example, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 80% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 60% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 70% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, no more than 50% of a cumulative fraction of polynucleotides comprises sequences of at least 90% of a cumulative fraction of sequences of the target nucleic acid molecule. In some instances, uniformity is described using a Gini index (wherein an index of 0 represents perfect equality of the library and an index of 1 represents perfect inequality). In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, 0.50, 0.45, 0.40, or 0.30. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50. In some instances, amplicon libraries described herein have a Gini index of no more than 0.40. Such uniformity metrics in some instances are dependent on the number of reads obtained. For example, no more than 100 million, 200 million, 300 million, 400 million, or no more than 500 million reads are obtained. In some instances, the read length is about 50, 75, 100, 125, 150, 175, 200, 225, or about 250 bases in length. In some instances, uniformity metrics are dependent on the depth of coverage of a target nucleic acid. For example, the average depth of coverage is about 10X, 15X, 20X, 25X, or about 30X. In some instances, the average depth of coverage is 10-3 OX, 20-5 OX, 5-40X, 20-60X, 5-20X, or 10-20X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein about 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50,
wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein no more than 300 million reads was obtained. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is about 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is at least 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.55, wherein the average depth of sequencing coverage is no more than 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.50, wherein the average depth of sequencing coverage is no more than 15X. In some instances, amplicon libraries described herein have a Gini index of no more than 0.45, wherein the average depth of sequencing coverage is no more than 15X. Uniform amplicon libraries generated using the methods described herein are in some instances subjected to additional steps, such as adapter ligation and further PCR amplification. In some instances, such additional steps precede a sequencing step.
[0100] Primers comprise nucleic acids used for priming the amplification reactions described herein. Such primers in some instances include, without limitation, random deoxynucleotides of any length with or without modifications to make them exonuclease resistant, random ribonucleotides of any length with or without modifications to make them exonuclease resistant, modified nucleic acids such as locked nucleic acids, DNA or RNA primers that are targeted to a specific genomic region, and reactions that are primed with enzymes such as primase. In the case of whole genome PTA, it is preferred that a set of primers having random or partially random nucleotide sequences be used. In a nucleic acid sample of significant complexity, specific nucleic acid sequences present in the sample need not be known and the primers need not be designed to be complementary to any particular sequence. Rather, the complexity of the nucleic acid sample results in a large number of different hybridization target sequences in the sample, which will be complementary to various primers of random or partially random sequence. The complementary portion of primers for use in PTA are in some instances fully randomized, comprise only a portion that is randomized, or be otherwise selectively randomized.
The number of random base positions in the complementary portion of primers in some instances, for example, is from 20% to 100% of the total number of nucleotides in the complementary portion of the primers. In some instances, the number of random base positions in the complementary portion of primers is 10% to 90%, 15-95%, 20%-100%, 30%-100%, 50%- 100%, 75-100% or 90-95% of the total number of nucleotides in the complementary portion of the primers. In some instances, the number of random base positions in the complementary portion of primers is at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or at least 90% of the total number of nucleotides in the complementary portion of the primers. Sets of primers having random or partially random sequences are in some instances synthesized using standard techniques by allowing the addition of any nucleotide at each position to be randomized. In some instances, sets of primers are composed of primers of similar length and/or hybridization characteristics. In some instances, the term "random primer” refers to a primer which can exhibit four-fold degeneracy at each position. In some instances, the term "random primer” refers to a primer which can exhibit three-fold degeneracy at each position. Random primers used in the methods described herein in some instances comprise a random sequence that is 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more bases in length. In some instances, primers comprise random sequences that are 3-20, 5-15, 5-20, 6-12, or 4-10 bases in length. Primers may also comprise non-extendable elements that limit subsequent amplification of amplicons generated thereof. For example, primers with non-extendable elements in some instances comprise terminators. In some instances, primers comprise terminator nucleotides, such as 1, 2, 3, 4, 5, 10, or more than 10 terminator nucleotides. Primers need not be limited to components which are added externally to an amplification reaction. In some instances, primers are generated in-situ through the addition of nucleotides and proteins which promote priming. For example, primase-like enzymes in combination with nucleotides is in some instances used to generate random primers for the methods described herein. Primase-like enzymes in some instances are members of the DnaG or AEP enzyme superfamily. In some instances, a primase- like enzyme is TthPrimPol. In some instances, a primase-like enzyme is T7 gp4 helicase- primase. Such primases are in some instances used with the polymerases or strand displacement factors described herein. In some instances, primases initiate priming with deoxyribonucleotides. In some instances, primases initiate priming with ribonucleotides. In some instances, primers are irreversible primers. In some instances, irreversible primers comprise phosphonothioate linkages.
[0101] The PTA amplification can be followed by selection for a specific subset of amplicons. Such selections are in some instances dependent on size, affinity, activity, hybridization to probes, or other known selection factor in the art. In some instances, selections precede or
follow additional steps described herein, such as adapter ligation and/or library amplification. In some instances, selections are based on size (length) of the amplicons. In some instances, smaller amplicons are selected that are less likely to have undergone exponential amplification, which enriches for products that were derived from the primary template while further converting the amplification from an exponential into a quasi-linear amplification process. In some instances, amplicons comprising 50-2000, 25-5000, 40-3000, 50-1000, 200-1000, 300- 1000, 400-1000, 400-600, 600-2000, or 800-1000 bases in length are selected. Size selection in some instances occurs with the use of protocols, e.g., utilizing solid-phase reversible immobilization (SPRI) on carboxylated paramagnetic beads to enrich for nucleic acid fragments of specific sizes, or other protocol known by those skilled in the art. Optionally or in combination, selection occurs through preferential ligation and amplification of smaller fragments during PCR while preparing sequencing libraries, as well as a result of the preferential formation of clusters from smaller sequencing library fragments during sequencing (e.g., sequencing by synthesis, nanopore sequencing, or other sequencing method). Other strategies to select for smaller fragments are also consistent with the methods described herein and include, without limitation, isolating nucleic acid fragments of specific sizes after gel electrophoresis, the use of silica columns that bind nucleic acid fragments of specific sizes, and the use of other PCR strategies that more strongly enrich for smaller fragments. Any number of library preparation protocols may be used with the PTA methods described herein. Amplicons generated by PTA are in some instances ligated to adapters (optionally with removal of terminator nucleotides). In some instances, amplicons generated by PTA comprise regions of homology generated from transposase-based fragmentation which are used as priming sites. In some instances, libraries are prepared by fragmenting nucleic acids mechanically or enzymatically. In some instances, libraries are prepared using tagmentation via transposomes. In some instances, libraries are prepared via ligation of adapters, such as Y-adapters, universal adapters, or circular adapters. [0102] The non-complementary portion of a primer used in PTA can include sequences which can be used to further manipulate and/or analyze amplified sequences. An example of such a sequence is a “detection tag”. Detection tags have sequences complementary to detection probes and are detected using their cognate detection probes. There may be one, two, three, four, or more than four detection tags on a primer. There is no fundamental limit to the number of detection tags that can be present on a primer except the size of the primer. In some instances, there is a single detection tag on a primer. In some instances, there are two detection tags on a primer. When there are multiple detection tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different detection
probe. In some instances, multiple detection tags have the same sequence. In some instances, multiple detection tags have a different sequence.
[0103] Another example of a sequence that can be included in the non-complementary portion of a primer is an “address tag” that can encode other details of the amplicons, such as the location in a tissue section. In some instances, a cell barcode comprises an address tag. An address tag has a sequence complementary to an address probe. Address tags become incorporated at the ends of amplified strands. If present, there may be one, or more than one, address tag on a primer. There is no fundamental limit to the number of address tags that can be present on a primer except the size of the primer. When there are multiple address tags, they may have the same sequence or they may have different sequences, with each different sequence complementary to a different address probe. The address tag portion can be any length that supports specific and stable hybridization between the address tag and the address probe. In some instances, nucleic acids from more than one source can incorporate a variable tag sequence. This tag sequence can be up to 100 nucleotides in length, preferably 1 to 10 nucleotides in length, most preferably 4, 5 or 6 nucleotides in length and comprises combinations of nucleotides. In some instances, a tag sequence is 1-20, 2-15, 3-13, 4-12, 5-12, or 1-10 nucleotides in length For example, if six base-pairs are chosen to form the tag and a permutation of four different nucleotides is used, then a total of 4096 nucleic acid anchors (e.g. hairpins), each with a unique 6 base tag can be made. In some instances, tags identify the source of a sample or analyte. In some instances, tags uniquely identify every molecule in a population. [0104] Primers described herein may be present in solution or immobilized on a bead. In some instances, primers bearing sample barcodes and/or UMI sequences can be immobilized on a bead. In some instances, individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell. In some instances, lysates from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates. In some instances, extracted nucleic acid from individual cells are contacted with one or more beads having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell. The beads can be manipulated in any suitable manner as is known in the art. The beads may be any suitable size, including for example, microbeads, microparticles, nanobeads and nanoparticles. In some embodiments, beads are magnetically responsive; in other embodiments beads are not significantly magnetically responsive. Examples of suitable beads include flow cytometry microbeads, polystyrene microparticles and nanoparticles, functionalized polystyrene microparticles and nanoparticles, coated polystyrene microparticles and nanoparticles, silica microbeads, fluorescent microspheres
and nanospheres, functionalized fluorescent microspheres and nanospheres, coated fluorescent microspheres and nanospheres, color dyed microparticles and nanoparticles, magnetic microparticles and nanoparticles, superparamagnetic microparticles and nanoparticles (e.g., DYNABEADS® available from Invitrogen Group, Carlsbad, CA), fluorescent microparticles and nanoparticles, coated magnetic microparticles and nanoparticles, ferromagnetic microparticles and nanoparticles, coated ferromagnetic microparticles and nanoparticles, and those described in U.S. Pat. Appl. Pub. No. US20050260686, US20030132538, US20050118574, 20050277197, 20060159962. Beads may be pre-coupled with an antibody, protein or antigen, DNA/RNA probe or any other molecule with an affinity for a desired target. In some embodiments, primers bearing sample barcodes and/or UMI sequences can be in solution. In certain embodiments, a plurality of partitions can be presented, wherein each partition in the plurality bears a sample barcode which is unique to a partition and the UMI which is unique to a molecule such that the UMI are repeated many times within a collection of partition In some embodiments, individual cells are contacted with a partition having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell. In some embodiments, lysates from individual cells are contacted with a partition having a unique set of sample barcodes and/or UMI sequences in order to identify the individual cell lysates. In some embodiments, extracted nucleic acid from individual cells are contacted with a partition having a unique set of sample barcodes and/or UMI sequences in order to identify the extracted nucleic acid from the individual cell.
[0105] PTA primers may comprise a sequence-specific or random primer, a cell barcode and/or a unique molecular identifier (UMI) (e.g., linear primer and or hairpin primer). In some instances, a primer comprises a sequence-specific primer. In some instances, a primer comprises a random primer. In some instances, a primer comprises a cell barcode. In some instances, a primer comprises a sample barcode. In some instances, a primer comprises a unique molecular identifier. In some instances, primers comprise two or more cell barcodes. Such barcodes in some instances identify a unique sample source, or unique workflow. Such barcodes or UMIs are in some instances 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 25, 30, or more than 30 bases in length. Primers in some instances comprise at least 1000, 10,000, 50,000, 100,000, 250,000, 500,000, 106, 107, 108, 109, or at least 1010 unique barcodes or UMIs. In some instances, primers comprise at least 8, 16, 96, or 384 unique barcodes or UMIs. In some instances, a standard adapter is then ligated onto the amplification products prior to sequencing; after sequencing, reads are first assigned to a specific cell based on the cell barcode. Suitable adapters that may be utilized with the PTA method include, e.g., xGen® Dual Index UMI adapters available from Integrated DNA Technologies (IDT). Reads from each cell is then grouped using the UMI and reads with the
same UMI may be collapsed into a consensus read. The use of a cell barcode allows all cells to be pooled prior to library preparation, as they can later be identified by the cell barcode. The use of the UMI to form a consensus read in some instances corrects for PCR bias, improving the copy number variation (CNV) detection. In addition, sequencing errors may be corrected by requiring that a fixed percentage of reads from the same molecule have the same base change detected at each position. This approach has been utilized to improve CNV detection and correct sequencing errors in bulk samples. In some instances, UMIs are used with the methods described herein, for example, U.S Pat. No. 8,835,358 discloses the principle of digital counting after attaching a random amplifiable barcode. Schmitt, et al and Fan et al. disclose similar methods of correcting sequencing errors. In some instances, a library is generated for sequencing using primers. In some instances, the library comprises fragments of 200-700 bases, 100-1000, 300-800, 300-550, 300-700, or 200-800 bases in length. In some instances, the library comprises fragments of at least 50, 100, 150, 200, 300, 500, 600, 700, 800, or at least 1000 bases in length. In some instances, the library comprises fragments of about 50, 100, 150, 200, 300, 500, 600, 700, 800, or about 1000 bases in length.
[0106] The methods described herein may further comprise additional steps, including steps performed on the sample or template. Such samples or templates are in some cases subjected to one or more steps prior to PTA. In some instances, samples comprising cells are subjected to a pre-treatment step. For example, cells undergo lysis and proteolysis to increase chromatin accessibility using a combination of freeze-thawing, Triton X-100, Tween 20, and Proteinase K. Other lysis strategies may also be suitable for practicing the methods described herein. Such strategies include, without limitation, lysis using other combinations of detergent and/or lysozyme and/or protease treatment and/or physical disruption of cells such as sonication and/or alkaline lysis and/or hypotonic lysis. In some instances, the primary template or target molecule(s) is subjected to a pre-treatment step. In some instances, the primary template (or target) is denatured using sodium hydroxide, followed by neutralization of the solution. Other denaturing strategies may also be suitable for practicing the methods described herein. Such strategies may include, without limitation, combinations of alkaline lysis with other basic solutions, increasing the temperature of the sample and/or altering the salt concentration in the sample, addition of additives such as solvents or oils, other modification, or any combination thereof. In some instances, additional steps include sorting, filtering, or isolating samples, templates, or amplicons by size. In some instances, cells are lysed with mechanical (e.g., high pressure homogenizer, bead milling) or non-mechanical (physical, chemical, or biological). In some instances, physical lysis methods comprise heating, osmotic shock, and/or cavitation. In some instances, chemical lysis comprises alkali and/or detergents. In some instances, biological
lysis comprises use of enzymes. Combinations of lysis methods are also compatible with the methods described herein. Non-limited examples of lysis enzymes include recombinant lysozyme, serine proteases, and bacterial lysins. In some instances, lysis with enzymes comprises use of lysozyme, lysostaphin, zymolase, cellulose, protease or glycanase. For example, after amplification with the methods described herein, amplicon libraries are enriched for amplicons having a desired length. In some instances, amplicon libraries are enriched for amplicons having a length of 50-2000, 25-1000, 50-1000, 75-2000, 100-3000, 150-500, 75-250, 170-500, 100-500, or 75-2000 bases. In some instances, amplicon libraries are enriched for amplicons having a length no more than 75, 100, 150, 200, 500, 750, 1000, 2000, 5000, or no more than 10,000 bases. In some instances, amplicon libraries are enriched for amplicons having a length of at least 25, 50, 75, 100, 150, 200, 500, 750, 1000, or at least 2000 bases.
[0107] Methods and compositions described herein may comprise buffers or other formulations. Such buffers are in some instances used for PTA, RT, or other method described herein. Such buffers in some instances comprise surfactants/detergent or denaturing agents (Tween-20, DMSO, DMF, pegylated polymers comprising a hydrophobic group, or other surfactant), salts (potassium or sodium phosphate (monobasic or dibasic), sodium chloride, potassium chloride, TrisHCl, magnesium chloride or sulfate, Ammonium salts such as phosphate, nitrate, or sulfate, EDTA), reducing agents (DTT, THP, DTE, beta-mercaptoethanol, TCEP, or other reducing agent) or other components (glycerol, hydrophilic polymers such as PEG). In some instances, buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. In some instances, buffers are used in conjunction with components such as polymerases, strand displacement factors, terminators, or other reaction component described herein. Buffers may comprise one or more crowding agents. In some instances, crowding reagents include polymers. In some instances, crowding reagents comprise polymers such as polyols. In some instances, crowding reagents comprise polyethylene glycol polymers (PEG). In some instances, crowding reagents comprise polysaccharides. Without limitation, examples of crowding reagents include ficoll (e.g., ficoll PM 400, ficoll PM 70, or other molecular weight ficoll), PEG (e.g., PEG1000, PEG 2000, PEG4000, PEG6000, PEG8000, or other molecular weight PEG), dextran (dextran 6, dextran 10, dextran 40, dextran 70, dextran 6000, dextran 138k, or other molecular weight dextran).
[0108] The nucleic acid molecules amplified according to the methods described herein may be sequenced and analyzed using methods known to those of skill in the art. Examples of the sequencing methods which in some instances are used include, e.g., sequencing by hybridization (SBH), sequencing by ligation (SBL) (Shendure et al. (2005) Science 309: 1728), quantitative incremental fluorescent nucleotide addition sequencing (QIFNAS), stepwise ligation and
cleavage, fluorescence resonance energy transfer (FRET), molecular beacons, TaqMan reporter probe digestion, pyrosequencing, fluorescent in situ sequencing (FISSEQ), FISSEQ beads (U.S. Pat. No. 7,425,431), wobble sequencing (Int. Pat. Appl. Pub. No. W02006/073504), multiplex sequencing (U.S. Pat. Appl. Pub. No. US2008/0269068; Porreca et al., 2007, Nat. Methods 4:931), polymerized colony (POLONY) sequencing (U.S. Patent Nos. 6,432,360, 6,485,944 and 6,511,803, and Int. Pat. Appl. Pub. No. W02005/082098), nanogrid rolling circle sequencing (ROLONY) (U.S. Pat. No. 9,624,538), allele-specific oligo ligation assays (e.g., oligo ligation assay (OLA), single template molecule OLA using a ligated linear probe and a rolling circle amplification (RCA) readout, ligated padlock probes, and/or single template molecule OLA using a ligated circular padlock probe and a rolling circle amplification (RCA) readout), high- throughput sequencing methods such as, e.g., methods using Roche 454, Illumina Solexa, AB- SOLiD, Helicos, Polonator platforms and the like, and light-based sequencing technologies (Landegren et al. (1998) Genome Res. 8:769-76; Kwok (2000) Pharmacogenomics 1 :95-100; and Shi (2001) Clin. Chem.47: 164-172). In some instances, the amplified nucleic acid molecules are shotgun sequenced. Sequencing of the sequencing library is in some instances performed with any appropriate sequencing technology, including but not limited to single-molecule realtime (SMRT) sequencing, Polony sequencing, sequencing by ligation, reversible terminator sequencing, proton detection sequencing, ion semiconductor sequencing, nanopore sequencing, electronic sequencing, pyrosequencing, Maxam-Gilbert sequencing, chain termination (e.g., Sanger) sequencing, +S sequencing, or sequencing by synthesis (array/colony -based or nanoball based).
[0109] Sequencing libraries generated using the methods described herein (e.g., PTA or RNAseq) may be sequenced to obtain a desired number of sequencing reads. In some instances, libraries are generated from a single cell or sample comprising a single cell (alone or part of a multiomics workflow). In some instances, libraries are sequenced to obtain at least 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or at least 10 million reads. In some instances, libraries are sequenced to obtain no more than 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or no more than 10 million reads. In some instances, libraries are sequenced to obtain about 0.1, 0.2, 0.4, 0.5, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.5, 2, 5, or about 10 million reads. In some instances, libraries are sequenced to obtain 0.1-10, 0.1-5, 0.1-1, 0.2-1, 0.3-1.5, 0.5-1, 1-5, or 0.5-5 million reads per sample. In some instances, the number of reads is dependent on the size of the genome. In some in instances samples comprising bacterial genomes are sequenced to obtain 0.5-1 million reads. In some instances, libraries are sequenced to obtain at least 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or at least 900 million reads. In some instances, libraries are sequenced to obtain no more than 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or no more than 900 million reads. In some
instances, libraries are sequenced to obtain about 2, 4, 10, 20, 50, 100, 200, 300, 500, 700, or about 900 million reads. In some in instances samples comprising mammalian genomes are sequenced to obtain 500-600 million reads. In some instances, the type of sequencing library (cDNA libraries or genomic libraries) are identified during sequencing. In some instances, cDNA libraries and genomic libraries are identified during sequencing with unique barcodes. [0110] The term “cycle” when used in reference to a polymerase-mediated amplification reaction is used herein to describe steps of dissociation of at least a portion of a double stranded nucleic acid (e.g., a template from an amplicon, or a double stranded template, denaturation), hybridization of at least a portion of a primer to a template (annealing), and extension of the primer to generate an amplicon. In some instances, the temperature remains constant during a cycle of amplification (e.g., an isothermal reaction). In some instances, the number of cycles is directly correlated with the number of amplicons produced. In some instances, the number of cycles for an isothermal reaction is controlled by the amount of time the reaction is allowed to proceed.
[OHl] Provided herein are methods of genetic analysis using PTA. In some instances, methods determine if a cell (e.g., fetal cell) comprises a genetic abnormality. In some instances, the methods described herein provide non-abnormal genetic information, such as the sex of the embryo. In some instances, the methods described herein establish the presence or absence of sex chromosomes. In some instances, the genetic abnormality includes aneuploidy, monogenic disorders, and structural rearrangements. In some instances, genetic analysis is conducted on pre-implantation embryonic cells. In some instances, genetic analysis comprises one or more of PGT-A, PGT-M, and PGT-SR genetic tests. In some instances, the genetic abnormality comprises aneuploidy. In some instances, aneuploidy comprises monosomy, trisomy, triploidy, deletions, duplications, or uniparental disomy. In some instances, aneuploidy occurs at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or at least 10 chromosomes. In some instances, aneuploidy occurs in about 1, 2, 3, 4, 5, 6, 7, 8, 9, or about 10 chromosomes. In some instances, aneuploidy occurs in no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or no more than 10 chromosomes. In some instances, aneuploidy occurs at one or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23. In some instances aneuploidy occurs at one or more of chromosomes 13, 18, or 21. In some instances aneuploidy occurs at one or more of chromosomes 6, 7, 11, 14, or 15. In some instances, the genetic abnormality comprises one or more of an insertion, deletion or duplication. In some instances, the insertion, deletion or duplication is at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, or at least 20% of the total chromosome length. In some instances, the insertion, deletion or duplication is about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, or about 20% of the total chromosome length. In some instances, the insertion,
deletion or duplication is no more than 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 15%, or no more than 20% of the total chromosome length. In some instances, the insertion, deletion or duplication is l%-30%, l%-20%, 1%-15%, l%-10%, l%-5%, 2%-20%, 3%-25%, 4%-20%, 5%-l 5%, 5%-30%, 5%-20%, 10%-30%, or 15%-30% of the total chromosome length. [0112] In some instances, the methods (e.g., PTA) described herein result in higher detection sensitivity and/or lower rates of false positives for the detection of fetal genetic abnormalities. In some instances a mutation is a difference between an analyzed sequence (e.g., using the methods described herein) and a reference sequence. Reference sequences are in some instances obtained from other organisms, other individuals of the same or similar species, other cells in the same organism, populations of organisms, or other areas of the same genome. In some instances, mutations are identified on a plasmid or chromosome. In some instances, a mutation is an SNV (single nucleotide variation), SNP (single nucleotide polymorphism), or CNV (copy number variation, or CNA/copy number aberration). In some instances, a mutation is base substitution, insertion, or deletion. In some instances, a mutation is a transition, transversion, nonsense mutation, silent mutation, synonymous or non-synonymous mutation, non-pathogenic mutation, missense mutation, or frameshift mutation (deletion or insertion). In some instances, PTA results in higher detection sensitivity and/or lower rates of false positives for the detection of mutations when compared to methods such as in-silico prediction, ChlP-seq, GUIDE-seq, circle-seq, HTGTS (High-Throughput Genome-Wide Translocation Sequencing), IDLV (integrationdeficient lentivirus), Digenome-seq, FISH (fluorescence in situ hybridization), or DISCOVER- seq. In some instances, a fetal genetic abnormality is detected with at sensitivity of at least 0.001%, 0.01%, 0.1%, 0.5%, 1%, 2%, 5%, 10%, or at least 20%.
[0113] Genetic abnormalities may be linked to specific genetic diseases. In some instances, methods described herein such as PTA are used to identify genetic diseases. In some instances, the disease is caused by a chromosomal abnormality. In some instances, the disease comprises Down syndrome, Patau syndrome, Klinefelter Syndrome, Turner Syndrome, or Edwards Syndrome. In some instances, the disease is caused by a single gene defect. In some instances, the disease comprises phenylketonuria (PKU), sickle-cell anemia, Beta Thalassemia, Tay-Sachs disease, Sandhoff disease, or cystic fibrosis (CF). In some instances the disease comprises achondroplasia, congenital adrenal hyperplasia, Cystic fibrosis, Down syndrome, fragile XD syndrome, Hemophilia A, Huntington's disease, Muscular dystrophy, Polycystic kidney disease, Sickle cell disease, Tay-Sachs disease, trisomy 21, trisomy 18, trisomy 13, Turner syndrome, spina bifida, anencephaly, or Thalassemia.
EXAMPLES
EXAMPLE 1: Design and execution of a nuiltionucs workflow Overview
[0114] Discovering genomic variation in the absence of information about transcriptional consequence of that variation or, conversely, a transcriptional signature without understanding underlying genomic contributions, hinders understanding of molecular mechanisms of disease. To assess this genomic and transcriptomic coordination, a multiomics method was developed to extract this information out of the individual cell. The workflow unifies template-switching fulltranscript RNA-Seq chemistry and whole genome amplification (WGA), followed by affinity purification of first-strand cDNA and subsequent separation of the RNA/DNA fractions for sequencing library preparation. In the multiomics methodology the attributes of primary template- directed amplification (PTA) are leveraged to enable accurate assessment of singlenucleotide variation as a DNA feature — which is not achieved with other workflows to assess DNA + RNA information in the same cell.
[0115] A single-well integration of single-cell transcriptome and genome amplification where a standard PTA reaction was modified to include a reverse transcription (RT) step prior to singlecell genome amplification was designed and executed, and designated as multiomic enrichment (ResolveOME, Bioskryb Genomics, Inc.). In this workflow, PTA amplifies the genomes of single cells immediately after the RT reaction is concluded in a single-well reaction. Using template switch-based reverse transcription, barcoded first-strand cDNA molecules were created that were affinity purified and pre-amplified prior to RNA-Seq sequencing library creation. The net result from the combined amplification reaction was a biotin labeled cDNA pool derived primarily from the cytosolic transcripts, available for streptavidin purification, and a pool of amplified genomic material from the single cell. In alternative embodiments, magnetic beads with attached RT primers can be used for direct removal of the cDNA amplicon library. At the conclusion of the genome amplification reaction the cDNA fraction is separated from the amplified genome material whereby libraries from each pool were created. The resulting sequencing data offered the ability to define both genomic and transcriptomic plasticity at single-cell resolution. Specifically, the delineation of isoform expression, combined with ability to annotate the underlying structural variation and single nucleotide changes from the genome of the same cell, allowed the assessment of genomic “penetrance”, and the definition of mechanisms that drive single-cell fate.
[0116] Prior multiomic efforts pioneered the pairing of genomic and transcriptomic information from the same single cell but have the primary shortcoming of incomplete genome coverage and associated non-uniformity of coverage — leaving uncovered genomic valleys that may harbor
deleterious single nucleotide variants that would remain undetected. Indeed, multiple displacement amplification (MDA) drives the genomic amplification of G&T-seq and DR- Seq has genomic amplification uniformity comparable to that of MALBAC, both of which are outperformed by PTA in terms of genomic coverage, allelic balance and SNV calling metrics. In one example, definition of clonal evolution at the SNV/CNV level in a primary patient sample was accomplished utilizing G&T-seq, yet was limited to a candidate gene survey of exome-level data whereby clusters where defined by 59 oncogenes and another studying employing G&T-seq limited their analysis to the RNA workflow of the method to take advantage of the low input requirement, without assessment of genomic level data. Thus, addressed herein is an unmet need to add genome-wide, high sensitivity and high precision SNV calling capability to a joint DNA/RNA single-cell methodology. Further, the importance of these measurements is demonstrated, whereby single nucleotide variation fundamentally affects cell state and tumor progression.
[0117] Provided herein are the utility of these unified “-omic” layers, highlighting heterogenous genomic variation and consequential phenotypic alterations in single cells that both are correlated with the development of resistance to a targeted therapeutic in a cell line model of acute myeloid leukemia, and in oncogenic mechanisms in primary breast cancer cells whereby the insights gained could not be inferred by a single dataset (genome or transcriptome) alone. Amplification product yield of RNA+DNA multiomics workflow
[0118] Prior to demonstrating biological utility of the multiomics method described herein, in a cell line drug resistance model and in a primary patient sample, the technical performance of the methodology using a benchmark cell line 1000 Genomes cell line, NA12878 was examined. The RNA and DNA arms of the protocol were first assessed using metrics from the templateswitching RNA-Seq chemistry or PTA chemistry in isolation to compare to the metrics when the chemistries were unified in the combined multiomics protocol.
[0119] Multiomics data with FACS-sorted NA12878 single cells was generated with purified total NA12878 RNA or genomic DNA as amplification controls. Approximately 1-1.5 pg of DNA amplification product from single cell genomes and approximately 100-200 ng of cDNA product representing the single cell transcriptome was obtained. Importantly, no-template control (NTC) reactions showed lack of detectable product and additionally there was negligible (<50 ng) yield in the DNA fraction from control RNA input using Qubit fluorometer (ThermoFisher). Low-level background amplification of the genomic DNA control input in the cDNA fraction was observed, due to known promiscuity of reverse transcriptase in the absence of mRNA template. By contrast, this background amplification does not occur in reactions with
single cells as the genome material is sequestered in the non-lysed nucleus during the reverse transcription workflow of multiomics.
PTA modifications
[0120] The PTA method was modified for use in a multiomics workflow. After reverse transcription has completed, dUTP was added to the normal nucleotide mix (dATP, dCTP, dGTP, dTTP) during phi29 amplification (red dot), resulting in PTA amplification products derived from the original single-cell or low-input template DNA being “marked” with dUTP. A UDG incubation step occurred on beads after affinity purification and washes of the cDNA, to digest the background dUTP-marked PTA product prior to preamplification of the cDNA (green dot). For library preparation, the cDNA libraries utilized a normal high-fidelity polymerase, however, the PTA-derived libraries representing the DNA arm of the multiomics workflow used a uracil tolerant polymerase in order to amplify the library ligation products of uracil-containing PTA product (yellow dot). The number of expressed genes detected was reduced following UDG treatment; indicating that transcript counts in the absence of UDG treatment were likely compounded by DNA (PTA) background. IGV visualization (700 kb region, harboring 3 genes) of intergenic read background removal upon UDG scheme. Each row was a single-cell (NA12878) Multiomic RNA fraction library. DNA background reads was seen in the top two control RNA libraries when PTA was performed lacking dUTP, and these background reads progressively diminished as more dUTP is included during PTA. The ratio of nucleotides was 1 : 1 dUTP:dTTP; PTA reactions containing dUTP exclusively with no dTTP were slower kinetically. The DNA background removal benefits of increased dUTP in the PTA reaction (C) did not adversely affect allelic balance and SNV calling precision and sensitivity metrics.
[0121] Some polymerases stall or have reduced efficiency when amplifying templates comprising uracil. Uracil tolerant polymerases may be used with the methods described herein to amplify uracil-containing templates (e.g., with PTA). In some instances, a uracil tolerant polymerase maintains at least 50, 60, 70, 80, 85, 90, 95, 97, or 99% polymerase activity when amplifying a template comprising uracil as compared to a template without uracil. In some instances a uracil tolerant polymerase is derived from archaea, yeast, or bacterial species. In some instances a uracil tolerant polymerase comprises DNA polymerases a and 5 from S. cerevisiae. and E. coli DNA polymerase III, PolA-type polymerases such as Taq, KAPA HiFi Uracil+ DNA Polymerase (Kapa biosystems, Q5U), KOD Multi & Epi DNA Polymerase, FastStart Taq (Roche), Taq2000 (Agilent Technologies), FailSafe Enzyme (Epicentre) or Thermo PhusionU. In some instances, a uracil tolerant polymerase comprises a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% identity with DNA polymerases a and 5 from S. cerevisiae, and E. coli DNA polymerase III, PolA-type polymerases such as Taq, KAPA
HiFi Uracil+ DNA Polymerase (Kapa biosystems, Q5U), KOD Multi & Epi DNA Polymerase, FastStart Taq (Roche), Taq2000 (Agilent Technologies), FailSafe Enzyme (Epicentre) or Thermo PhusionU. In some instance a uracil tolerant polymerase comprises a modification to one or more amino acid residues in the dUTP binding pocket.
Comparative genomic performance of Multiomics Workflow
[0122] As default practice prior to passing single cell samples to deep sequencing for SNV analysis low- pass QC sequencing was performed, and as part of the analysis pipeline, an estimation of library complexity with the PreSeq count algorithm determined. QC standards set for genomic DNA only (product solution for PTA) are >3.0E9 PreSeq count value upon low- pass sequencing, an empirically-defined proxy for genomic coverage and uniformity that predicts high-depth sequencing will yield strong allelic balance and high sensitivity and precision of single nucleotide variant calling. The average PreSeq count of single cells from was 3.76E9 with a standard deviation of +/- 2.27E8. The overall robust performance of single cells and genomic DNA controls warranted subsequent deep sequencing for metric comparison of classical PTA to PTA from the multi omic workflow.
[0123] Upon high-depth sequencing (2X150bp, down-sampling to 4.5E8 total reads, ~20x genome depth) and processing through our pipeline, allelic balance was reviewed, (ability to represent both alleles through enrichment and a strength of genomic PTA methodology). The inverse of allelic drop out (ADO) is allelic balance, which is the proportion of known heterozygous loci that are called heterozygous following sequencing. Variants within these loci have allele frequencies between 10% and 90% at each locus. A review of allelic balance of the multiomics workflow showed 85.5% (+/-3.4%), which is closely comparable to the 88.2% (+/- 4%) for genomic DNA only workflow, across 10 replicates each. Genomic coverage at a range of depths did not significantly differ between the workflows. Lastly, it was critical to demonstrate that the allelic balance and coverage obtained from the multiomics workflow culminated in the ability to call SNVs with confidence. This example highlights individual multiomics NA12878 cells with a SNV calling sensitivity range of 0.90-0.95 and with precision >0.99, akin to genomic DNA-only data. Collectively, these data suggest that, despite the upstream reverse transcription chemistry modifications to generate transcriptome data, amplification performance of single-cell genomes by PTA persists in performance.
Comparative transcriptomic performance of multiomics workflow
[0124] In choosing a transcriptomic scheme to unite with PTA one goal was to be as comprehensive as possible in capturing the diversity of RNA-based modes of oncogenic and drug resistance mechanisms, and, equally as importantly, to enable the ascertainment of genomic lesions manifesting at the RNA level. A template- switching reverse transcription scheme was
designed for the multiomics workflow that captured full-transcript information as opposed to either 5’ or 3’ end counting to enhance ability to detect isoforms and identify fusions. This chemistry enables even coverage across transcripts, where increased coverage of the 5’ region (top) which typically is affected by degradation (or reverse transcriptase performance) proportional to the distance from 3 ’-poly A, is shown. This confirms behavior of the templateswitching chemistry in the RNA arm workflow. The distribution of read depth across gene bodies of a set of housekeeping genes was present, with all exons equally represented. Feature quantification of the defined transcriptome was obtained, highlighting the ability to identify a variety of transcript bodies. Progression of the performance is shown in this figure from what is observed in a bulk dataset (bar 1, aggregated datasets) vs. features such as bulk isolation (bars 2 and 4) against library prep methods: standalone mRNA-stranded (bars 2 and 3) and multiomics combined library prep (bars 4 and 5). Most notably, increased 5’ coding and intronic regions in the multiomics chemistry was observed overall, with intergenic background routinely below 5% of aligned reads, providing a broader space for isoform detection.
[0125] As further performance benchmarking of cell quality post mapping to reference transcriptome, performance patterns were established of common metrics with well characterized Human Brain Reference RNA (HBRR) and Universal Human Reference RNA (UHRR) as additions to the NA12878 cell line and displayed composite features. Read and genomic feature mapping percentages were identified, as well as total genes discovered as criterion for evaluating sequencing quality. The dynamic range of expression and expression patterns in well- known housekeeping genes was also examined, and various markers of DNA contamination, sample degradation, and/or bias as a percentage of exonic (more than 55%), and intergenic mapping (less than 5 %) as characteristics of the multiomics RNA fraction were computed. Another important metric for measuring the quality of single cell experiments was the number of genes found (>0 counts) per cell. For NA12878 cells there was an average of approximately 2500, whereas the average number of HBRR and UHRR genes discovered was around 6 and 7 thousand, respectively. Lastly, median absolute deviation (MAD) and percent coefficient of variation (CV) scores were calculated on normalized CPM values for general use housekeeping genes for cross-tissue studies. These metrics measure reproducibility and are robust approaches to measuring sample variability. Overall, comparable monotonous expression metrics across housekeeping genes of examined, as well as MAD values ranging from 0.25 to 1 for our HBRR and UHRR benchmarks were observed, suggesting these genes exhibit little variability in expression across cells. NA12878, demonstrated slightly more irregularity, which without being bound by theory may imply higher variability or unsuitable housekeeping genes. Correspondingly, CV rates varied from 14 to 30 percent, despite NA12878 exhibiting more
variation. For each cell, the dynamic range of expressed genes was around 1300 (HBRR), 1400 (UHRR), and 1900 (NA12878) CPM.
[0126] Multi omics full-transcript performance vs. an amalgam of publicly-available bulk RNA- Seq and 3’ end-counting datasets (See Methods) were evaluated, highlighting the increased 5’ UTR and gene body coverage that occurs by definition relative to 3’ end-counting. The relative types of other RNA species detected with the multiomics chemistry, including IncRNAs, snRNAs, and pseudogenes are shown. Relative proportions of features were concordant between the template-switching RT chemistry in isolation vs. in the combined RNA/DNA workflow in multiomics, and overall concordance was observed between purified RNA input template vs. single cells, with the exception that single cells revealed more intronic reads of protein coding genes than did the purified RNA input. In all single cells analyzed, mitochondrial read percentage was <10%, with most cells averaging less than 5%, indicating that single-cell lysis was optimal for capturing mRNA and other polyadenylated transcripts and that the amplified cells were healthy.
EXAMPLE 2: Use of uracil tolerant polymerase for improved multiomics
[0127] Following the general methods of Example 1, cDNA was generated from single cell RNA using reverse transcription. cDNA amplicons were generated using biotinylated poly dT primers. Next, the PTA method was used to amplify genomic DNA from the cell, wherein the mixture of dNTPs comprises uracil. cDNA was then purified from the mixture using streptavidin, and further treated with uracil DNA glycosylase (UDG) and DNA glycosylase- lyase Endonuclease VIII to remove any residual genomic amplicons from the cDNA. The genomic fragments generated from PTA were then purified, and both cDNA and genomic DNA fractions were converted into sequencing-ready libraries using adapter ligation. A uracil-tolerant polymerase was used to amplify the PTA-generated genomic fragments.
EXAMPLE 3: Storage and Neutralization Buffer
[0128] The effect of the concentration of the ionic surfactant (in this example, SDS) contained in the storage buffer were evaluated with the PTA reaction (following the general methods of Examples 1 and 2) and results obtained from the sample after storage. In this example, cellular samples comprising one or more cells were sorted into containers containing storage buffer. The storage buffer used for each sample comprised 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and varying concentrations of SDS (ionic surfactant). The results are presented in FIG. 1.
[0129] FIG. 1 presents amplification multicomponent plots of the samples. The sample of plot 101 comprised 0% SDS in its storage buffer. The sample of plot 102 comprised 0.005% SDS in its storage buffer. The sample of plot 103 comprised 0.01% SDS in its storage buffer. The
sample of plot 104 comprised 0.05% SDS in its storage buffer. The sample of plot 105 comprised 0.1% SDS in its storage buffer. The sample of plot 106 comprised 0.5% SDS in its storage buffer. The best results were observed in plots 103 and 104, indicating that an SDS concentration of from about 0.01% to about 0.05% by volume was optimal for the storage buffer to optimize the amplification results of the sample stored in the storage buffer.
[0130] The effect of the second composition (neutralizing buffer) on amplification results was tested, and the results are shown in FIG. 2. Amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of ionic surfactant and different compositions of neutralizing buffer were evaluated. The concentration of Triton-X in the neutralizing buffer (the second composition) was varied among plots 201 to 204. The samples in plots 201 to 204 were neutralized with second compositions/neutralizing buffers comprising 0.2%, 0.5%, 0.75%, and 1% Triton-X respectively. The best results were obtained in plot 202 (Triton-X concentration of about 0.5% in the neutralizing buffer).
[0131] With continued reference to FIG. 2, the samples presented in plots 205 to 208 were neutralized with neutralizing buffers comprising varying concentrations of Tween (e.g., Tween- 20). The concentration of Tween-20 in neutralizing buffers of plots 205 to 208 were 0.5%, 1%, 1.5%, and 2% respectively. The best results were observed in plot 206 (Tween-20 concentration of about 1% in the neutralizing buffer).
[0132] With continued reference to FIG. 2, plots 209 to 212 present amplification results of samples stored in storage buffers with varying concentrations of ammonium sulfate. The concentrations of ammonium sulfate in the storage buffers of plots 209 to 212 were 5 mM, 10 mM, 15 mM, and 20 mM respectively. The best results were observed in plot 211 (ammonium sulfate concentration of 15 mM).
[0133] FIG. 3 presents broad first pass experiment preamplification quality control data for samples stored in different buffer compositions. “SB4” represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.05% SDS (ionic surfactant). The yield for the SB4 sample condition was found to be 10.736 nanograms (ng). “SB4Tweenl” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 0.5%. The yield for the “SB4Tweenl” condition was found to be 242 ng. “SB4Tween2” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 1%. The yield for the “SB4Tween2” condition was found to be 277.2 ng. “SB4Tween3” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 1.5%. “SB4Tween3” was found to be 129.58 ng. “SB4Tween4” represents a
condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Tween concentration of 2%. The yield for the “SB4Tween4” condition was found to be 11.066 ng. [0134] “ SB4Tritonl” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 0.2%. The yield for the “SB4Tritonl” condition was found to be 9.636 ng. “SB4Triton2” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 0.5%. The yield for the SB4Tritonl condition was found to be 177.98 ng. “SB4Triton3” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 0.75%. The yield for the SB4Triton3 condition was found to be 77 ng. “SB4Triton4” represents a condition in which the storage buffer is SB4 and the neutralizing buffer comprises a Triton-X concentration of 1%. The yield for the SB4Triton4 condition was found to be 83.6 ng. [0135] “ SB4AS1” represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 5 mM. The yield for the SB4AS1 condition was found to be 9.658 ng. “SB4AS2” represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 10 mM. The yield for the SB4AS1 condition was found to be 6.578 ng. “SB4AS3” represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 15 mM. The yield for the SB4AS1 condition was found to be 3.63 ng. “SB4AS4” represent a condition in which the storage buffer is SB4 and also comprises an Ammonium Sulfate concentration of 20 mM. The yield for the SB4AS1 condition was found to be 8.074 ng.
[0136] FIG. 4A presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of ionic surfactant and different compositions of neutralizing buffer on which reverse transcription or pre-amplification has been performed. The samples presented in plots 401 and 402 were stored in storage buffer SB4. Consistent with the preceding figures, “SB4” represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.05% SDS (ionic surfactant). The samples presented in plots 403 and 404 were stored in storage buffer SB5. “SB5” represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.1% SDS. The samples of plots 401 and 402 were then neutralized with a neutralizing buffer comprising Tween-20. The samples of plots 402 and 404 were neutralized with a neutralizing buffer comprising Triton-X. The most preferrable results were observed in plot 401 (SB4 storage buffer (the composition) neutralized with a neutralizing buffer (the second composition) comprising Tween-20).
[0137] FIG. 4B presents real-time amplification multicomponent plots for samples stored in buffer compositions of the present disclosure with varying concentrations of ionic surfactant and
different compositions of neutralizing buffer which have been processed using V2DNA™ Amplification Kit from ILLUMINA™. The samples presented in plot 405 and 406 were stored in storage buffer SB4. Consistent with the preceding figures, “SB4” represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.05% SDS (ionic surfactant). The samples of plots 407 and 408 were stored in storage buffer SB5. SB5 represents a storage buffer comprising 50 mM TrisHCl at pH 8.5, 1 mM EDTA, 200 pg/ml Proteinase K, 500 pM CaCh, 1 mM DTT, and 0.1% SDS. The samples of plots 405 and 407 were then neutralized with a neutralizing buffer comprising Tween-20. The samples of plots 406 and 408 were neutralized with a neutralizing buffer comprising Triton-X. The most preferrable results were observed in plot 405 (SB4 storage buffer (the composition) neutralized with a neutralizing buffer (the second composition) comprising Tween-20).
[0138] FIG. 5A presents reverse transcription pre-amplification of RNA in samples stored in storage buffer compositions SB4 and SB5. The recipes of the storage buffers SB4 and SB5 are consistent with the rest of the figures and are also provided in this figure. Reverse transcription was performed using reverse transcription (RT) buffers “5xRwa” and “5xRxa”. FIG. 5B presents primary template amplification of DNA in samples stored in storage buffer compositions SB4 and SB5. Reverse transcription was performed using reverse transcription (RT) buffers 5xRwa and 5xRxa. Buffer recipes are provided in FIG. 5A.
[0139] FIG. 6 presents genomics metrics quantified for samples stored in different buffer compositions for varying durations of storage time. Buffer recipes for storage buffers SB4 and SB5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the rest of the figures and provided in FIG. 5A.
[0140] Unless defined otherwise, the definitions of the analytical metrics are consistent with those generally known in the art. “PropExonic” is the relative ratio or proportion of reads that map to exons, relative to introns or intergenic regions. “Protein Coding Transcripts” is the number of transcripts in the subsampled reads that map to protein coding regions. “PreSeq Counts” is the expected number of bases covered greater than 1 for the theoretical larger experiment. “PreSeq Counts” may be used as a measure of unique reads that map to regions of the genome, estimating the total and uniform coverage of a single genome. “Gini Coefficient” ranges from 0 to 1 where 0 is perfect uniformity and 1 is perfect non -uniformity in genomic coverage. “% Chimera” is the percentage of reads with two ends mapping to different chromosomes or with too long insert size. “Ratio Transcript Body” is a balanced representation from the 3’ to 5’ end of an expected mRNA transcript. 0.8 to 1.2 are relative ideals with 1 being most balanced ratio.
[0141] FIG. 7 presents RNA sequencing results for samples stored in different buffer conditions for varying durations of time. Buffer recipes for storage buffers SB4 and SB5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the previous figures and also provided in FIG. 5A. These results indicated that the storage buffer SB4 resulted in more consistent, less intronic results.
[0142] FIG. 8 presents pre-amplification yields obtained from samples stored under different conditions using the methods and compositions of the present disclosure. Buffer recipes for storage buffers SB4 and SB5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the rest of the figures and also provided in FIG. 5A. The results indicated that, when SB4 was used as storage buffer, Tween-20 was needed to be used in the neutralizing buffer before pre-amplification, in order to get accurate and uncompromised amplification and sequencing results.
[0143] FIG. 9 presents genomics metrics quantified for samples stored in different buffer compositions and reverse transcription buffers. Buffer recipes for storage buffers SB4 and SB 5 and reverse transcription buffers 5xRwa and 5xRxa are consistent with the rest of the figures and also provided in FIG. 5A. “CB” represents conventional cell buffer not containing ionic surfactant and protease. “FT” represents a sample that has experienced at least one round of freeze and thaw. The results indicated that the SB4 storage buffer can yield better recovery of transcripts compared to other storage buffers/conditions.
[0144] FIG. 10 presents reverse transcription pre-amplification sequencing data performed on samples stored in different buffer conditions for varying durations of time. “SB4” represents a storage buffer with the recipe disclosed in FIG. 5A.“CB” represents conventional “cell buffer”. Cell buffer does not contain ionic surfactant and protease. “FT” indicates the sample has experienced freeze and thaw. 5xRTw and 5xRTa represent reverse transcription buffers with recipes disclosed in FIG. 5A. The results indicated that, compared to conventional cell buffer (CB), samples stored in SB4 with or without experiencing freeze thaw (FT), resulted in better recovery of transcripts and genes.
[0145] FIG. 11 presents genome-wide amplification results and coverage in different buffer conditions. Buffer recipes are consistent with the rest of the figures and also provided in FIG. 5A. The sample stored in SB4 storage buffer resulted in uniform genome-wide transcript, while samples stored in CB (e.g., conventional cell buffer not containing the ionic surfactant and protease) resulted in biased transcription curves.
[0146] FIG. 12 presents results of expression analysis for different storage conditions. The results indicated that when stored in conventional buffers, the cells were stressed leading to compromised expression data. Cells stored in SB4 were found to be in better condition and
resulted in expression data that were not deteriorated upon storage of the cell samples. For example, storage in SB4 resulted in cells that were not stressed and yielded uncompromised genomic analysis data.
[0147] FIGs. 13A-13F present sequencing metrics quantified for different storage conditions. “CB” represent conventional cell buffer. SB4 represents storage buffer with the recipe disclosed in FIG. 5A. Unless defined otherwise, the definitions of the analytical metrics are consistent with those generally known in the art. The definitions of the metrics are consistent with the rest of the figures.
EXAMPLE 4: High-throughput cell analysis
[0148] Following the general procedures of Examples 1-3, hundreds or thousands of cells are harvested (picked, isolated by flow cytometry, etc.) and placed into individual wells with storage buffer SB4. The cells are then stored or shipped for 3-15 days to an analysis facility, in some cases with a cold plate to maintain a lower than ambient temperature. The temperature of the cells may fluctuate during shipping, and may include exposure to temperatures of -20C to 30C. The cells are then subjected to a PTA genomic analysis (e.g., ResolveDNA from Bioskryb Genomics), expression/RNA analysis, or a combined DNA/RNA multiomics workflow (e.g., ResolveOME from Bioskryb Genomics) using a neutralization buffer described in Example 1. Use of the storage buffer is expected to result in less stress to cells (fewer expression artifacts from cell processing/storage), higher quality pre-amplification DNA/RNA, and improved or comparable sequencing metrics (PreSeq, protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera and Gini index to cell buffer CB1. Moreover, cells treated with storage buffer SB4 and stored are expected to result in similar outcome metrics (pre-amplification quality) and sequencing metrics as cells which were harvested and processed the same day.
EXAMPLE 5: Embryonic cell analysis
[0149] Following the general procedures of Example 1-4, embryonic cells grown in-vitro are suspended in storage buffer SB4, packaged, and shipped to an analysis facility for preimplantation genetic testing (PGT). The cells are analyzed for common genetic fetal abnormalities using the PTA method.
EXAMPLE 6: Comparison of Cell Buffer (CB) and Storage Buffer (SB) using B-cells or clinical samples
[0150] Following the general procedures of Examples 1-4, nucleic samples from single cells were amplified using either conventional cell buffer (CB) or storage buffer (SB, “SB5”, modified to contain 0.2% ionic surfactant). A no template control (NTC), DNA, and RNA were also used as controls. HG001 B cells stored at varying temperatures (-20°C, 6-10°C, and RT
(room temperature)) and durations (1 versus 7 days at RT) display higher and more consistent ResolveOME DNA and cDNA yields in SB compared to CB (cell buffer). Improvements from SB are robust and consistent across three different lots (1, 2, and 3). SB and CB differed by a significant improvement in the total number of expressed protein coding genes detected. FIGS. 14A-14E
[0151] Stability buffer also provided insights with clinically derived biopsies using housekeeping gene analysis. Housekeeping gene analysis demonstrated a unique signature for clinical samples could be identified when clinical samples are sorted into stability buffer. Comparing clinical samples in stability buffer to the same sample without stability buffer demonstrated the improved recovery of genes. Additionally, clinical samples not in stability buffer generally looked more like cell lines than clinical samples in stability buffer. FIG. 15. [0152] Gene signatures were also analyzed against both buffer systems. The enhanced recovery of mRNA from clinical samples stored in stability buffer extended beyond housekeeping genes. The top five (FIG. 16A) and top ten (FIG. 16B) differentially expressed genes can be seen in a sub population of the clinical samples in PBS. When clinical samples are sorted into stability buffer (SB) this signature can be seen for most clinical samples.
EXAMPLE 7: Amplification of cell-free DNA
[0153] Following the general procedures of Example 6, cfDNA was amplified using PTA with or without the stability buffer, and with or without a non-ionic surfactant (NIS). The fragmented/degraded cell free DNA was amplified in the presence of stability buffer so long as presence of the non-ionic surfactant (NIS) was maintained in the downstream reaction. The stability buffer could be added to the reaction at concentrations lOx previously demonstrated. FIG. 17
EXAMPLE 8: Storage buffer compositions
[0154] The general procedures of Example 3 are followed with modification: additional storage buffer formulations shown in Table 1 are evaluated.
[0155] In a further embodiment, storage buffers in Tables 2A-Table 8 are utilized with the general methods of Example 3.
[0156] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Claims
1. A composition comprising:
(a) a salt;
(b) an ionic surfactant at a concentration from about 0.00001 to about 1 volume percent;
(c) a protease at a concentration from about 0.01 units to about 5 units per milliliter;
(d) a reducing agent; and
(e) a chelator.
2. The composition of claim 1, wherein the ionic surfactant comprises sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), sodium laureth sulfate (SLES), ammonium lauryl sulfate (ALS), ammonium laureth sulfate (ALES), sodium stearate, potassium cocoate, or any combination thereof.
3. The composition of claim 1, wherein the protease comprises Proteinase K, Proteocut K, Nargarse, optionally wherein the protease is thermolabile.
4. The composition of claim 1, wherein the salt comprises Tris-HCl, HEPES, or TES.
5. The composition of claim 1, wherein the composition further comprises a salt selected from CaCh, MgCh, KC1, MgSO4, K2SO4, or NaHCCh.
6. The composition of claim 1, wherein the reducing agent comprises dithiothreitol (DTT), tris(2-carboxyethyl)phosphine, (TCEP), or P-mercapto ethanol.
7. The composition of claim 1, wherein the chelator comprises ethylenedi aminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA).
8. The composition of any one of the claims 1-7 for performing at least one of: lysing a cell, extracting the genomic materials of the cell, storing the cell or genomic materials of the cell, and preparing the genomic materials of the cell for amplification.
9. The composition of claim 8 for performing lysing a cell, extracting the genomic materials of the cell, storing the cell or genomic materials of the cell, and preparing the genomic materials of the cell for amplification.
10. The composition of claim 8 or 9, wherein storing comprises storing for at least about 1 hour, at least about 12 hours, at least about 24 hours, at least about 48 hours, at least about 72 hours or longer.
11. The composition of any one of claims 1-10, further comprising one or more cells stored therein.
12. The composition of claim 11, wherein the one or more cells comprise at most about 20, at most about 15, at most about 10 cells, or a single cell.
13. The composition of claim 11, wherein the one or more cells comprises a human embryonic cell.
14. The composition of claim 11, wherein storage temperature is from about -20 to about 30 degrees Celsius (°C), optionally with a variation of from about 0 to about 20%.
15. The composition of claim 14, wherein the storage temperature changes or fluctuates.
16. The composition of any one of preceding claims, wherein the sample remains substantially stable during storage despite changes to storage conditions such as temperature, humidity, and/or pressure.
17. The composition of any one of claims 1-16, wherein the one or more cells comprise live cells or fixed cells.
18. The composition of any one of claims 1-17, wherein one or more of:
(a) the salt has a concentration of 10-100 mM;
(b) the ionic surfactant has a concentration volume percent of 0.001 to 0.1;
(c) the protease has a concentration of about 0.01 units to about 5 units per milliliter;
(d) the reducing agent has a concentration of 0.1-10 mM; and
(e) the chelator has a concentration of 0.1 to 10 mM.
19. The composition of claim 18, wherein the ionic surfactant has a concentration volume percent of 0.01 to 0.5.
20. The composition of any one of claims 1-19, further comprising a second salt different from the salt, wherein the second salt stimulates the activity of the protease.
21. A kit comprising the composition of any one of the claims 1-19.
22. The kit of claim 21, further comprising a second composition different from the composition.
23. The kit of claim 22, wherein the second composition comprises a neutralizing buffer comprising a component that substantially neutralizes the ionic surfactant.
24. The kit of any one of claim 21-23, further comprising instructions for lysing a cell, storing a cell, and amplifying the genomic contents of the cell using the composition and the second composition, wherein the composition is used to store the cell and the second composition is used to neutralize the composition, wherein neutralizing the composition improves the results of a downstream amplification performed on the cell.
25. The kit of any one of claims 21-24, wherein the second composition comprises a non-ionic surfactant.
26. The kit of claim 25, wherein the non-ionic surfactant comprises Tween or Triton.
27. A method of cell analysis, comprising:
(a) providing or obtaining a sample comprising one or more cells, wherein the sample is stored in a composition comprising:
(i) a salt;
(ii) an ionic surfactant;
(iii) a protease;
(iv) a reducing agent; and
(v) a chelator;
(b) amplifying genomic materials of the one or more cells; and
(c) performing genomic analysis on the genomic materials of the one or more cells.
28. The method of claim 27, wherein the one or more cells comprises from 1 to about 10 cells.
29. The method of claim 28, wherein the method does not require column filtration and is amenable to retrieval of the genomic materials of the one or more cells.
30. The method of any one of claims 27-29, wherein the one or more cells is a single cell, and genomic analysis is single cell genomics or multiomics, and wherein the composition is configured for storing the single cell and preparing the sample for amplifying the genomic contents of the single cell.
31. The method of any one of claims 27-30, wherein the method further comprises preparing the sample for genomic analysis before (b).
32. The method of any one of claims 27-31, further comprising neutralizing the composition with a second composition.
33. The method of claim 32, wherein the second composition comprises a neutralizing buffer capable of neutralizing the ionic surfactant.
34. The method of claim 33, wherein the neutralizing buffer comprises a non-ionic surfactant, optionally wherein the non-ionic surfactant is Tween or Triton.
35. The method of any one of claims 27-34, wherein the ionic surfactant comprises from about 0.00001 to about 1 volume percent of the composition.
36. The method of claim 35, wherein the ionic surfactant comprises from about 0.03 to about 0.09 volume percent of the composition.
37. The method of any one of claims 27-36, wherein the protease comprises from about 0.01 units to about 5 units per milliliter of the composition.
38. The method of any one of claims 27-37, wherein the ionic surfactant comprises sodium dodecyl sulfate (SDS), sodium lauryl sulfate (SLS), Sodium laureth sulfate (SLES), Ammonium lauryl sulfate (ALS), Ammonium laureth sulfate (ALES), Sodium stearate, Potassium cocoate, or any combination thereof.
39. The method of any one of claims 27-38, wherein the protease comprises Proteinase K, Proteocut K, Nargarse, optionally wherein the protease is thermolabile.
40. The method of any one of claims 27-39, wherein the composition further comprises a salt selected from CaCh, MgCh, KC1, MgSCU, K2SO4, orNaHCCh.
41. The method of any one of claims 27-40, wherein the salt comprises Tris-HCl, HEPES, or TES.
42. The method of any one of claims 27-41, wherein the chelator comprises Ethylenediaminetetraacetic acid (EDTA) or ethylene glycol tetraacetic acid (EGTA).
43. The method of any one of claims 27-42, wherein the composition further comprises a second salt different from the salt, wherein the second salt is capable of stimulating the activity of the protease.
44. The method of claim 43, wherein the second salt comprises a divalent cation.
45. The method of any one of claims 27-44, further comprising lysing the one or more cells and extracting the genomic materials thereof.
46. The composition of any one of claims 1-45, wherein one or more of:
(a) the salt has a concentration of 10-100 mM;
(b) the ionic surfactant has a concentration volume percent of 0.001 to 0.1;
(c) the protease has a concentration of about 0.01 units to about 5 units per milliliter;
(d) the reducing agent has a concentration of 0.1-10 mM; and
(e) the chelator has a concentration of 0.1 to 10 mM.
47. The composition of claim 46, wherein the ionic surfactant has a concentration volume percent of 0.01 to 0.5.
48. The method of any one of claims 27-47, wherein genomic analysis comprises primary template amplification (PTA).
49. The method of any one of claims 27-48, wherein genomic analysis comprises single cell genomics or single cell multiomics.
50. The method of any one of claims 27-49, wherein genomic materials comprise nucleic acid molecules.
51. The method of any one of claims 27-50, wherein nucleic acid molecules comprise ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or both.
52. The method of any one of claims 27-51, further comprising fixating the one or more cells on a substrate prior to storage.
53. A method of cell analysis, comprising:
(a) providing a single cell stored in a buffer for a time period of at least 1 day;
(b) lysing the single cell;
(c) amplifying mRNA transcripts and genomic DNA from the cell to generate cDNA and genomic DNA libraries, respectively; and
(d) sequencing mRNA transcripts and genomic DNA from the cell to obtain one or more sequencing metrics, wherein one or more of:
(i) the yield of pre-amplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day;
(ii) the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day; and
(iii) the sequencing metrics comprise values within lOx of values obtained when compared to storage conditions having a time period of less than 1 day.
54. The method of claim 53, wherein sequencing metrics comprises one or more Picard Metrics.
55. The method of claim 53, wherein sequencing metrics comprises one or more of PreSeq, protein coding transcripts, proportion exonic sequences, ratio transcript body, sequencing coverage, fold-80 base penalty, dropouts, percent chimera, and Gini index.
56. The method of any one of claims 53-55, wherein the single cell is an embryonic cell.
57. The method of claim 50, wherein the single cell is a human embryonic cell.
58. The method of any one of claims 53-57, wherein providing comprises shipping.
59. The method of any one of claims 53-58, wherein the time period is at least 5 days.
60. The method of claim 53, wherein the time period is at least 10 days.
61. The method of any one of claims 53-58, wherein the time period is 3-15 days.
62. The method of any one of claims 53-61, wherein the single cell was stored at a storage temperature.
63. The method of claim 62, wherein the storage temperature is from about -20 to about 30 degrees Celsius (°C), optionally with a variation of from about 0 to about 20%.
64. The method of claim 62 or 63, wherein the storage temperature changes or fluctuates.
65. The method of any one of claims 53-64, wherein
(a) the yield of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day;
(b) the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day; and
(c) the sequencing metrics comprise values within 5x of values obtained when compared to storage conditions having a time period of less than 1 day.
66. The method of any one of claims 53-64, wherein any one of:
(a) the yield of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day;
(b) the average fragment size of pre-amplification of cDNA or genomic DNA comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day; and
(c) the sequencing metrics comprise values within 2x of values obtained when compared to storage conditions having a time period of less than 1 day.
67. The method of any one of claims 53-66, wherein the single cell is suspended in a composition of any one of claims 1-20.
68. A method of amplifying a fragmented or degraded nucleic acid sample comprising:
(a) providing or obtaining a sample comprising fragmented or degraded nucleic acids;
(b) suspending the fragmented or degraded nucleic acids in a composition of any one of claims 1-20;
(c) amplifying genomic materials of the one or more cells, wherein a non-ionic surfactant is maintained during amplification; and
(d) performing genomic analysis on the genomic materials of the one or more cells.
69. The method of claim 68, wherein the nucleic acid sample comprises ctDNA.
70. The method of claim 68, wherein the nucleic acid sample comprises cfDNA.
71. The method of claim 68, wherein the composition is present at 10X concentration.
72. The method of claim 68, wherein a total yield of amplicons is within 25% compared to amplification without the composition.
73. The method of claim 68, wherein the non-ionic surfactant comprises Tween or Triton.
74. A composition as described in Tables 1-8 or FIG. 5A.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363591715P | 2023-10-19 | 2023-10-19 | |
| US63/591,715 | 2023-10-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025085821A1 true WO2025085821A1 (en) | 2025-04-24 |
Family
ID=95449289
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/052079 Pending WO2025085821A1 (en) | 2023-10-19 | 2024-10-18 | Methods, systems, and compositions for cell storage and analysis |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025085821A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130053254A1 (en) * | 2010-02-26 | 2013-02-28 | Qiagen Gmbh | Process for parallel isolation and/or purification of rna and dna |
| US20170327815A1 (en) * | 2016-05-13 | 2017-11-16 | Roche Molecular Systems, Inc. | Protein-based sample collection matrices and devices |
-
2024
- 2024-10-18 WO PCT/US2024/052079 patent/WO2025085821A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130053254A1 (en) * | 2010-02-26 | 2013-02-28 | Qiagen Gmbh | Process for parallel isolation and/or purification of rna and dna |
| US20170327815A1 (en) * | 2016-05-13 | 2017-11-16 | Roche Molecular Systems, Inc. | Protein-based sample collection matrices and devices |
Non-Patent Citations (2)
| Title |
|---|
| NELSON JAMES STUART, KHIONG CHAN WOON, MING CHOU LOKE, PHANG VIOLET PAN ENG: "Immediate Digestion of Fish Muscle Following Field Collections Yields DNA Suitable for RAPD Fingerprinting", BIOTECHNIQUES, INFORMA HEALTHCARE, US, vol. 23, no. 2, 1 August 1997 (1997-08-01), US , pages 224 - 226, XP093307470, ISSN: 0736-6205, DOI: 10.2144/97232bm09 * |
| SHAFI NUZHAT, RAUF ABDUL, AKHTAR TASLEEM, MINHAS RIAZ AZIZ, BIBI SIDRA, MINHAS AZIZ: "Comparative isolation and amplification of Cytochrome oxidase 1 DNA from Oncorhynchus mykiss (Rainbow Trout) of Azad Jammu & Kashmir", INTERNATIONAL JOURNAL OF FISHERIES AND AQUATIC STUDIES IJFAS, vol. 4, no. 6, 1 January 2016 (2016-01-01), pages 196 - 199, XP093307471, ISSN: 2347-5129 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11643682B2 (en) | Method for nucleic acid amplification | |
| CN108431233B (en) | Efficient construction of DNA libraries | |
| JP7379418B2 (en) | Deep sequencing profiling of tumors | |
| JP7691975B2 (en) | Single cell analysis | |
| CN110536967A (en) | Reagents and methods for analyzing associated nucleic acids | |
| WO2011049955A1 (en) | Deducing exon connectivity by rna-templated dna ligation/sequencing | |
| JP2013544498A (en) | Direct capture, amplification, and sequencing of target DNA using immobilized primers | |
| EP3775274B1 (en) | Detection method of somatic genetic anomalies, combination of capture probes and kit of detection | |
| US20250230493A1 (en) | Method for combined genome methylation and variation analyses | |
| US20240368695A1 (en) | Embryonic nucleic acid analysis | |
| WO2025085821A1 (en) | Methods, systems, and compositions for cell storage and analysis | |
| WO2025072326A1 (en) | Methods, systems, and compositions for cfdna analysis | |
| WO2023215524A2 (en) | Primary template-directed amplification and methods thereof | |
| EP4655437A2 (en) | Fine needle aspiration methods | |
| EP4514961A1 (en) | Single cell multiomics | |
| HK40042337A (en) | Method for nucleic acid amplification | |
| HK40042337B (en) | Method for nucleic acid amplification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24880699 Country of ref document: EP Kind code of ref document: A1 |