WO2025240918A1 - Systèmes et procédés de génération de livres de codes - Google Patents
Systèmes et procédés de génération de livres de codesInfo
- Publication number
- WO2025240918A1 WO2025240918A1 PCT/US2025/029852 US2025029852W WO2025240918A1 WO 2025240918 A1 WO2025240918 A1 WO 2025240918A1 US 2025029852 W US2025029852 W US 2025029852W WO 2025240918 A1 WO2025240918 A1 WO 2025240918A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- code words
- observed
- codebook
- code word
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/64—Fluorescence; Phosphorescence
- G01N21/645—Specially adapted constructive features of fluorimeters
- G01N21/6456—Spatial resolved fluorescence measurements; Imaging
- G01N21/6458—Fluorescence microscopy
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/64—Fluorescence; Phosphorescence
- G01N21/6428—Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes"
- G01N2021/6439—Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks
- G01N2021/6441—Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks with two or more labels
Definitions
- the present disclosure generally relates to methods and systems for imagingbased in situ analysis of target analytes in biological samples and, more specifically, to methods for designing codebooks having a set of code words that are assigned to barcoded target analytes in a multiplexed assay.
- the codebook is designed to reduce (e.g., minimize) the impact of spatial crowding on accurate target analyte detection.
- each target molecule to be detected in a multiplexed assay is assigned a unique codeword from a codebook of valid code words.
- Some codebooks e.g., binary codebooks
- nucleic acid probes with target- specific barcodes corresponding to the designed code words are introduced to the tissue specimen, attached to the target molecules within the sample (the target molecules may have a generally stochastic distribution throughout the tissue specimen volume), and then typically amplified to create features (e.g., rolling circle amplification products (RCPs)) comprising multiple copies of the target- specific barcodes assigned to the target molecules.
- RCPs rolling circle amplification products
- Different target molecules may be close to one another within the three-dimensional volume of the tissue specimen.
- the distance between two target molecules, or representative features thereof approaches the "localization precision" (z.e., the accuracy with which the center of each representative feature can be measured in an optical image of the tissue specimen) of the optical imaging technique utilized for detection, then the observed optical signal in the image of that region of tissue will contain optical signals (e.g., "ON" signals in one or more optical detection channels) arising from the representative features of both target molecules, and the estimated center positions of the two representative target molecule features will partially or completely overlap.
- a decoding algorithm used to decode the target- specific barcodes corresponding to code words may not be able to determine which optical signal arose from each target molecule’s representative feature.
- Disclosed herein are methods comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- the method further comprises comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal.
- each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold.
- decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location.
- each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
- each code word segment comprises a four bit string of binary values such that: a code word segment of 1 00 0 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0 1 00 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument; a code word segment of 0 0 1 0 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection
- determining the assignment of the observed code word to one of the plurality of valid code words comprises determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words. In some embodiments, the method further comprises selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
- the method further comprises identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook.
- the identified target analyte comprises a messenger RNA (mRNA) molecule or protein molecule.
- mRNA messenger RNA
- each valid code word of the plurality of valid code words has a second Hamming distance of greater than or equal to 4 from every other valid code word.
- the codebook comprises at least 50 valid code words.
- the codebook comprises up to 200,000 valid code words.
- systems comprising: a codebook database; a computing system comprising at least one computer-readable storage medium having program instructions stored thereon, the program instructions executable by at least one processor of the computing system to cause the at least one processor to perform a method comprising: receiving a codebook from the codebook database comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining
- a codebook comprising a plurality of valid code words, wherein, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to a predetermined number; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold.
- decoding the plurality of observed optical signals in the plurality of images comprises obtaining the plurality of observed code words based on a series of binary values determined for each location.
- each observed code word of the plurality of code words comprises a plurality of code word segments, and wherein each code word segment comprises a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
- the method further comprises selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
- the plurality of images comprises a plurality of images comprising different fields-of-view of the biological sample.
- the plurality of images comprises a plurality of z-stack images of the biological sample.
- the plurality of observed optical signals represents light emitted from a plurality of fluorophores.
- the method further comprises identifying a target analyte in the biological sample based on the determined assignment of the observed code word to a valid code word and the codebook.
- Disclosed herein are computer program products comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform any of the methods described herein.
- databases comprising: a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1.
- Disclosed herein are methods comprising: receiving a codebook comprising a plurality of valid code words, wherein, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a plurality of locations for a plurality of observed optical signals, wherein the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- Disclosed herein are methods comprising: receiving a codebook having a plurality of code words, wherein, for all valid code words, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1; receiving a analyte-index assignment for a plurality of target analytes; and using the analyte-index assignment to assign each target analyte of the plurality of target analytes to at least one of the plurality of code words such that each code word has at most one target analyte assignment, thereby generating an analyte-codeword assignment matrix.
- each code word of the plurality of codewords has an associated index.
- receiving the analyte-index assignment comprises receiving an analyte-index matrix.
- assigning each target analyte of the plurality of target analytes to at least one of the plurality of codewords comprises linking the plurality of target analytes and plurality of codewords based on the same indices.
- the plurality of target analytes comprises a plurality of nucleic acids.
- the plurality of nucleic acids comprises a plurality of genes.
- the plurality of nucleic acids comprises a plurality of RNA transcripts.
- the plurality of target analytes comprises a plurality of proteins.
- Disclosed herein are methods for performing in situ decoding comprising: receiving a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detecting, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determining, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identifying the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which: Hamming Distance ( V
- each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals.
- the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3.
- two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship: Hamming Distance (Wi ⁇ Wj, W m ⁇ W n ) > K.
- the value of K is selectable by a user during design of the codebook.
- a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance Wj, Wm
- a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V
- the values of Ki and K2 are selectable by a user during design of the codebook.
- a code word from the code book is randomly assigned to each of the one or more barcoded target analytes. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ⁇ 10% of a mean number of ON signals detected per image for the plurality of images. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
- a code word from the code book is assigned to each of two or more barcoded target analytes based on expression data for the two or more barcoded target analytes in clustered cell types, and wherein the clustered cell types represent a distribution of cell types found in the biological sample.
- the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
- the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types, and the two or more code words are assigned to the two or more rank-ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte.
- K is equal to 3, 4, or 5.
- Q is equal to 4, 5, 6, 7, or 8.
- the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
- the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, 120,000, 140,000, 160,000, 180,000, or 200, 000 unique code words.
- the series of optical signals comprise fluorescence signals.
- each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
- databases comprising: one or more non-transitory computer- readable storage medium components, the one or more non-transitory computer-readable storage medium components individually or collectively storing a codebook comprising a plurality of code words for which: Hamming Distance (V Wj, Wm
- the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, and wherein Q is an integer value greater than or equal to 3.
- the value of K is selectable by a user during design of the codebook.
- a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( Wz
- a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V
- the values of Ki and K2 are selectable by a user during design of the codebook.
- K is equal to 3, 4, or 5.
- Q is equal to 4, 5, 6, 7, or 8.
- the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
- the plurality of code words in the codebook comprises at least 100, 500, 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words.
- each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- each code word in the codebook has at least 2 ON bits. In some embodiments, each code word in the codebook has no more than 4, 5, or 6 ON bits.
- systems comprising: one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: receive a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detect, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determine, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identify the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which: Hamming Distance (V W)
- each series of optical signals detected in the plurality of images at the one or more locations comprises a series of ON signals and OFF signals.
- the plurality of code words in the codebook further satisfy a property that: Hamming Distance (Wi, Wj) > Q for any two pairwise combination of code words Wi and Wj, wherein Q is an integer value greater than or equal to 3.
- two or more code words are determined that correspond to two or more barcoded target analytes for which the corresponding series of optical signals partially overlap within the plurality of images, and wherein an error rate for correctly identifying the two or more barcoded target analytes is reduced compared to that when the plurality of code words in the codebook do not satisfy the relationship: Hamming Distance ( V Wj, Wm ⁇ W n ) > K.
- the value of K is selectable by a user during design of the codebook.
- a first portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance f W/
- a second portion of the plurality of code words in the codebook satisfies a relationship: Hamming Distance ( V Wj, U%
- the values of Ki and K2 are selectable by a user during design of the codebook.
- a code word from the code book is randomly assigned to each of the one or more barcoded target analytes.
- a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ⁇ 10% of a mean number of ON signals detected per image for the plurality of images. In some embodiments, a code word from the code book is assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
- a code word from the code book is assigned to each of two or more barcoded target analytes is based on expression data for the two or more barcoded target analytes in clustered cell types, and wherein the clustered cell types represent a distribution of cell types found in the biological sample.
- the expression data for the two or more barcoded target analytes comprises bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
- K is equal to 3, 4, or 5.
- Q is equal to 4, 5, 6, 7, or 8.
- the plurality of code words comprise code words of at least 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, or 180 bits in length.
- the plurality of code words in the codebook comprises at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, or 100,000 unique code words.
- the series of optical signals comprise fluorescence signals.
- each code word of the plurality comprises M x N bits, where M is a number of sequencing or probing cycles and N is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- the one or more barcoded target analytes comprise barcoded gene sequences, barcoded gene transcripts, barcoded proteins, or any combination thereof.
- FIGS. 1A-1B provide a non-limiting example of a process flowchart for generating an OR-robust codebook, in accordance with one implementation of the methods described herein.
- FIG. 2 provides a non-limiting example of a process flowchart for assigning the code words in an OR-robust codebook to a corresponding list of target analytes, in accordance with one implementation of the methods described herein.
- FIG. 3 provides a non-limiting example of a process flowchart for decoding optical signals derived from images of a biological sample to identify barcoded target analytes, in accordance with one implementation of the methods described herein.
- FIG. 4A depicts a non-limiting example of the structure of a binary code word for use with hybridization probe-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
- FIG. 6 provides a non-limiting schematic illustration of sequencing-based in situ detection of barcoded target analytes, in accordance with some implementations of the methods described herein.
- FIG. 7 depicts an overview of a volumetric sample imaging system and illustrates a Field of View (FOV) grid bounding the sample (e.g., hydrogel, tissue section, one or more cells, etc.) as projected onto the surface of a solid substrate supporting the sample.
- FOV Field of View
- FIG. 8 depicts the XZ cross-sectional view and illustrates tissue non-uniformity in the Z dimension, where the full (non-reduced) imaging volume is oversampled in the Z dimension.
- the objective lens focal point is positioned to acquire an image at every Z-slice in a Z-stack.
- An XZ image of signal distribution (bottom) demonstrates a non-uniform distribution of detected signal within the imaging volume.
- FIG. 9 depicts a system for performing an in situ detection or sequencing assay, in accordance with some implementations of the methods described herein.
- FIGS. 10A-10B illustrate cross-sectional views of an optics module in an imaging system, according to some embodiments.
- FIG. 11 depicts a computer system or computer network, in accordance with some instances of the systems described herein.
- the codebook designs described herein provide robust protections against codebook calling errors, for example, calling of a valid codeword in the codebook based on a detected codeword that has one or more detection errors.
- Detection errors may occur due to crosstalk in imaging channels (e.g., where two dyes have overlapping excitation spectra) or due to autofluorescence. Detection errors may also occur, for example, due to the close proximity of two or more target analytes having fluorescent oligonucleotides configured to emit fluorescence during the same imaging cycle.
- the codebook designs described herein reduce (e.g., minimize) the potential for errors (e.g., calling an incorrect transcript) during decoding.
- the codebook designs described herein reduce (e.g., minimizes) the impact of spatial crowding of target molecules within a biological sample (e.g., a tissue specimen) when performing decoding of detected signals from a plurality of imaging rounds.
- the codebooks described herein are referred to as "OR-robust" or "spatial collision robust" codebooks.
- An OR-robust codebook has the property that all or a portion of the valid codewords in the codebook satisfy the property that the Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to an OR-robust radius (i.e., a specified integer value greater than zero).
- an OR-robust codebook reduces the chance that light signals from any two different analytes in close proximity to one another combine into the same observed codeword (after all imaging cycles are completed) that is ultimately decoded to a valid codeword (with one or more errors in the observed codeword and/or a low quality score) or discarded entirely.
- methods for decoding optical signals can leverage an OR-robust codebook to enable accurate decoding of barcoded target molecules in multiplexed in situ assays even under conditions where the spatial densities of target molecules are high.
- the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to a predetermined number (e.g., where the predetermined number is 1, 2, 3, 4, 5, 6, 7, or 8); receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, each image of the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is
- the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for at least a first portion of the valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to 1; receiving a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles, each image of the plurality of images comprising a plurality of observed optical signals; decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- the disclosed methods may comprise: receiving a codebook comprising a plurality of valid code words, where, for all valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a plurality of locations for a plurality of observed optical signals, wherein the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: determining an assignment of the observed code word to one of the plurality of valid code words, or determining that the observed code word is not a valid code word of the plurality of valid code words.
- methods of generating a codebook for in situ decoding comprising: receiving a plurality of code words, where, for all valid code words, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words (including the case where the first logical bitwise OR combination and the second logical bitwise OR combination include a common code word) is greater than or equal to 1 ; receiving a list of a plurality of target analytes; and for each target analyte on the list of the plurality of target analytes: assigning the target analyte to at least one of the plurality of code words such that each code word has at most one target analyte assignment, thereby generating the codebook.
- methods for performing in situ decoding comprising: receiving a plurality of images of a biological sample, where the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detecting, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determining, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identifying the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
- each OR combination is based on two different codewords.
- a pair of OR-combinations can share at most one codeword in the comparison.
- z cannot be equal to m and j cannot be equal to n.
- the codebooks described herein comprise binary codebooks, i.e., codebooks comprising binary code words having a plurality of binary segments (e.g., 4- bit binary segments of the form “bit 1, bit 2, bit 3, bit 4, etc.”) where each ON bit (“1”) indicates that a signal was detected in one of the plurality of optical detection channels of an imaging instrument used to perform the decoding in a given decoding cycle, and each OFF bit (“0”) indicates that no signal was detected in the particular optical detection channel in the given decoding cycle.
- each subsegment (and ultimately, each observed full codeword is associated with a specific X, Y, Z set of coordinates within an imaged 3D volume).
- Each binary segment may represent an individual imaging cycle of a plurality of imaging cycles. For example, where the color channels associated with the binary segments are “red channel, yellow channel, green channel, blue channel”, a binary segment of “1 0 00” indicates that a signal was detected in the red channel and no signal was detected in the yellow, green, or blue channels. When all binary segments are appended together, the resulting string of l’s and 0’s represents a full binary code word.
- the disclosed OR-robust codebooks satisfy the property that Hamming Distance (CWA ⁇ CWB, for all possible pairwise combinations (or a portion of all possible pairwise combinations) of a list of valid code words (e.g., CWA, CWB, CWC, CWD, etc.), where the notation CWX ⁇ CWY denotes a code word derived from the logical bitwise OR combination of code words CWx and CWY, and where K is an integer value greater than zero.
- code words within the code book are represented in illumination state space. For example, each state may be represented as a letter in the alphabet (e.g., red is state A, yellow is state B, green is state C, blue is state D, and empty is state E).
- each state corresponds to a binary string.
- the state A may be represented as 1000
- the state B may be represented as 0100
- the state C may be represented as 0010
- the state D may be represented as 0001
- the empty (no emission) state may be represented as 0000 (where each bit in the binary string corresponds to a color channel, similar to the binary segments described above).
- pairs of codewords in the codebook may satisfy Hamming Distance (CWA ⁇ CWB, CWC CWD) >0 for many combinations of two pairs of the code words, which means that if one observes a signal corresponding to CWA ⁇ CWB (e.g., if a first target molecule, such as a first rolling circle product (RCP), labeled with a barcode corresponding to code word CWA in a given decoding cycle is very close to a second target molecule, such as a second RCP, labeled with a barcode corresponding to code word CWB in that decoding cycle), then the signal from the first and second target molecules may be indistinguishable from a signal corresponding to CWD, and therefore may not be accurately decoded with suitable confidence.
- CWA ⁇ CWB Hamming Distance
- an OR-robust codebook can be designed for use with high-plexy analysis (e.g., a 2k gene panel, a 5k gene panel, or a whole transcriptome panel), gene panels where at least one gene in the panel is a highly expressed gene (signals from highly expressed genes can cause spatial crowding or overpower signals from lesser expressed genes), and/or protein panels (protein can appear diffuse thus causing spatial crowding).
- high-plexy analysis e.g., a 2k gene panel, a 5k gene panel, or a whole transcriptome panel
- gene panels where at least one gene in the panel is a highly expressed gene (signals from highly expressed genes can cause spatial crowding or overpower signals from lesser expressed genes), and/or protein panels (protein can appear diffuse thus causing spatial crowding).
- code words CWA and CWB are two different code words in the code book (e.g., code words CWA and CWB are not the same string of letters or bits) and code words CWc and CWD are two different code words in the codebook (e.g., code words CWc and CWD are not the same string of letters or bits).
- code words CWA and CWB are a first pair of code words and code words CWc and CWD are a second pair of codewords that is different from the first pair of codewords.
- code words CWA and CWc may be the same code word, but code words CWB and CWD are different code words such that the first pair of codewords is different from the second pair of codewords.
- code word CWA is not equal to code words CWB, CWC, and CWD;
- code word CWB is not equal to code words CWA, CWC, and CWD and
- code word CWc is not equal to code words CWA, CWB, and CWD (thus, code word CWD is not equal to code words CWA, CWB, and CWc).
- an OR-robust codebook can be generated by starting with at least one arbitrary code word (e.g., one, two, or three code words).
- the at least one arbitrary starting code words pass at least one validation check.
- a validation check may include a check that the code word has at least a specific number of ON bits or exactly a specific number of ON bits.
- the at least two starting code words may be separated by at least a predetermined Hamming distance (e.g., a HD of at least 6).
- two code words are arbitrarily selected having at least a predetermined edit distance (e.g., Hamming distance) from each other.
- two code words having 60 total bits (15 segments of 4 bits) and a maximum number of ON bits in any given code word of five, the maximum possible Hamming distance between these two code words is 10 (i.e., each of the five ON bits in the first code word do not overlap with any of the five ON bits in the second code word, meaning that 10 edits).
- two code words are selected having at least a predetermined edit distance (e.g., Hamming distance) from each other.
- any codewords beyond the second are selected such that the codeword has a predetermined edit distance (e.g., Hamming distance) from all other codewords, and the logical bitwise OR between any two pairs of codewords has a predetermined edit distance (e.g., Hamming distance) from one another.
- a predetermined edit distance e.g., Hamming distance
- two arbitrary codewords are selected (with sufficiently high edit distance from each other) to start, but the third codeword is no longer arbitrary and is selected so that the OR-robust property is not violated.
- pairs of pairs can be created: ⁇ CW A
- CWC) > K; HD(CW A
- CWC) > K; HD(CWA
- CWC) > K.
- new codewords are generated using a random generator.
- new codewords are generated using a deterministic generator having one or more properties (z.e., generating codewords in a specific, useful order).
- new codewords are generated to have a predetermined number of overlapping bits with codewords already present in the codebook (e.g., new codewords have a same number of overlapping bits or states with codewords already present in the codebook, new codewords have a maximum number of overlapping bits or states with codewords already present in the codebook).
- some methods for generating new codewords (e.g., as described above) will allow for larger OR-robust codebooks to be generated.
- a new candidate code word is generated (e.g., randomly generated) and tested to determine whether adding the new candidate code word to the codebook satisfies the Hamming Distance property and the OR-robust property. If adding the new candidate code word violates the Hamming Distance property (z.e., the candidate code word is less than a specific HD away from at least one other valid code word in the code book), then the candidate code word can be discarded. If adding the new code word to the codebook satisfies the OR-robust property, the new code word is added to the codebook as a valid code word, otherwise the new codeword is discarded, and another new code word is randomly generated and tested.
- This process may be repeated until a codebook having a predetermined number of valid code words is generated or no new valid codewords can be found after a predetermined number of attempts (e.g., 1 million trials) (such that the codebook satisfies the OR-robust property for all pairs of valid codewords).
- a predetermined number of attempts e.g., 1 million trials
- the process is repeated until no more codewords can be added to the codebook without violating the OR-robust property (or any other suitable constraint, such as a predetermined number of OR-robust codewords has been achieved).
- the process for generating new code words is generalized as a search (e.g., a depth-first search, breadth-first search, or informed search).
- the search algorithm implements back-tracking (e.g., where a codeword, such as a newly added codeword, is removed and another new codeword is generated that allows for a larger final OR-robust codebook).
- the new codeword generation algorithm finds a codeword CW_a, that can be added to the codebook without violating any constraints, but then the algorithm determines that no more codewords can be added beyond that codeword CW_a, giving a final OR-robust codebook size of 100.
- the algorithm may perform backtracking by removing at least codeword CW_a from the codebook.
- the algorithm determines a new codeword CW_b can be added to the codebook (after removing at least codeword CW_a), and that codeword CW_c can also be added with codeword CW_b resulting in a codebook with at least 101 codewords, as opposed to 100 codewords if CW_a was chosen.
- backtracking allows for the new codeword generation algorithm to go back arbitrarily far (z.e., remove arbitrarily up to a predetermined number of codewords from the codebook) to explore if a denser packing of OR-robust codewords is possible, thereby generating a larger OR-robust codebook.
- the OR-robust property may be imposed in addition to a conventional edit distance criterion, e.g., that the Hamming Distance for any pairwise combination of the base code words CWA and CWB is at least a predetermined distance parameter Q.
- the predetermined distance parameter Q is two times the number of single errors that can be detected and corrected plus 1.
- This edit distance criterion is defined as follows: Hamming Distance (CWA, CWB) (where Q is an integer value of greater than or equal to 1, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.).
- a hamming weight (HW) of a codeword is equal to the number of ON bits in that codeword.
- all codewords in the codebook have the same Hamming Weight.
- at least some codewords in the codebook have a different hamming weight (i.e., not all codewords in the codebook have the same Hamming Weight).
- a codebook may have 100 codewords having a HW of 5 and 50 codewords having a HW of 6.
- Additional constraint criteria include, but are not limited to, the conventional Hamming distance criterion (Hamming Distance (CWA, CWB) ⁇ Q), a maximum number of ON bits allowed per decoding cycle or code word segment (e.g., 1 ON bit for a 4-bit segment), a maximum number of ON bits allowed per code word (e.g., 4 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, 5 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, 6 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process, or 7 ON bits per a 60-bit code word corresponding to a 15 cycle 4 color imaging decoding process), exclusion of code words from a predetermined list of selected code words, etc.
- the conventional Hamming distance criterion Hamming Distance (CWA, CWB) ⁇ Q
- a maximum number of ON bits allowed per decoding cycle or code word segment
- a constraint imposed is that all codewords have at least one bit on in each of the optical channels (e.g., one of the four optical channels).
- a constraint imposed is that ON-bits from adjacent cycles cannot occupy the same color channel (e.g., if codeword CWA has an ON-bit in the red color channel in cycle 1 or has a state space assigned to the “red” state, then codeword CWA will not have an ON-bit in cycle 2 in the red color channel or have a state space assigned to the “red” color channel in cycle 2).
- codeword CWA has an ON-bit in the red color channel in cycle 1 or has a state space assigned to the “red” state, then codeword CWA will not have an ON-bit in cycle 2 in the red color channel or have a state space assigned to the “red” color channel in cycle 2).
- only valid candidate codewords are generated.
- a first candidate code word is selected and a second candidate code word is selected according to the methods described herein.
- the candidate code words can be checked to determine if the specified set of constraint criteria (e.g., the OR-robust criterion, etc.) are met and, if so, an additional (e.g., third, fourth, etc.) candidate code word can be selected and checked against the first two or more code words to see if the specified set of constraint criteria are still met for each pair, etc.
- the specified set of constraint criteria e.g., the OR-robust criterion, etc.
- an additional (e.g., third, fourth, etc.) candidate code word can be selected and checked against the first two or more code words to see if the specified set of constraint criteria are still met for each pair, etc.
- the first layer includes constraints imposed on any single codeword, e.g., a minimum hamming weight, minimum number of channels, maximum number of channels, etc.
- the third codeword (CW_3) is chosen such that CW_3 satisfies all single codeword constraints, in addition to being sufficiently far from CW_1 and CW_2 in Hamming space, and it has to satisfy the third layer of constraints, z.e., the "pairs of pairs of codewords constraints.”
- a first random starting codeword is selected having exactly a specific number of ON bits (e.g., five ON bits).
- testing of candidate codewords against all valid codewords is a computationally expensive process.
- testing candidate codewords for inclusion in the list of valid codewords can be sped up so that a candidate codeword does not need to be tested against all valid codewords to ensure that adding the candidate codeword does not cause the codebook to violate the OR-robust property.
- a test is performed to determine if adding the candidate codeword to the codebook will violate the OR-robust constraint.
- the test is to bitwise OR the candidate codeword with all valid codewords in the codebook and compare the OR-ed candidates against bitwise ORs of all pairs of valid codewords.
- the time complexity of this test is quadratic time and, thus, computationally expensive.
- the maximum OR robust radius for a codebook of codewords having five ON bits per codeword is 20.
- an OR-robust radius is selected for a codebook that is 2, 3, 4, 5, 6, 7, 8, 9, or 10.
- a constraint of a higher OR-robust radii will generate a codebook that does not have enough codewords for a given purpose, for example, for a whole 1 transcriptome analysis.
- an OR-robust radius of about 2 to about 10 will generate codebooks having a suitable number of valid codewords for in situ analysis.
- a process to efficiently check if adding a candidate codeword to the codebook would violate the or_robust_radius constraint To solve this problem in less than quadratic time, we can take advantage of the special structure of our codewords and our codebook.
- a property of the codebook is that all codewords in the codebook are at least Hamming distance of 6 apart. In various embodiments, all codewords have exactly a specific number of ON bits set (e.g., five ON bits).
- a faster algorithm using knowledge of the codebook properties described above is as follows:
- this algorithm reduces the time complexity of OR-robust validation of candidate codewords because the minimum Hamming distance between a first pair of ORed codeword pairs is a function of the number of set bits that each codeword in an ORed second pair has with the ORed first pair.
- FIGS. 1A-1B provide a non-limiting example of a flowchart for a process 100 for generating an OR-robust codebook.
- Process 100 can be performed, for example, using one or more electronic devices implementing software configured to perform the process 100.
- process 100 is performed using a client-server system, and the blocks of process 100 are divided up between the server and multiple client devices.
- portions of process 100 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 100 is not so limited.
- process 100 is performed using only a client device or only multiple client devices.
- some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted.
- additional steps may be performed in combination with the process 100. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
- a plurality of candidate code words are generated randomly (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 1A).
- the randomly generated set of candidate code words may comprise at least 1,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, or more than 100,000 unique candidate code words.
- the candidate code words (and the selected set of filtered code words) may comprise binary code words, e.g., code words comprising a series (or string) of binary values (z.e., “1” or “0”).
- the candidate binary code words may comprise code words of at least 20 bits, 40 bits, 60 bits, 80 bits, 100 bits, 120 bits, 140 bits, 160 bits, 180 bits, or more than 180 bits in length.
- a codebook comprising binary code words of length 20 bits may include up to 1,048,576 unique code words
- the candidate code words (and the selected set of filtered code words) may comprise a series of code word segments, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more than 20 code word segments, where each code word segment comprises, e.g., 2, 3, 4, or more than 4 bits.
- Each code word segment may represent a unique imaging cycle of a plurality of imaging cycles where a sample is imaged in a plurality of color channels e.g., red, yellow, green, and blue color channels).
- a binary code word of total length 60 bits includes 15 code word segments of 4 bits each.
- a binary code word of total length 80 bits includes 20 code word segments of 4 bits each.
- more than 100 imaging cycles are supported (e.g., enough imaging cycles to analyze a full transcriptome).
- the candidate code words may optionally be filtered to remove code words that don’t conform to, e.g., a constraint on a maximum number of ON bits per code word segment.
- a maximum number of ON bits per code word segment may be 1 bit, 2 bits, 3 bits, 4 bits, 5 bits, 6 bits, 7 bits, 8 bits, or more than 8 bits depending on the length of the code word segment.
- the candidate code words may optionally be filtered to remove code words that don’t conform to one or more additional constraints, e.g., a maximum number of ON bits allowed per code word (e.g., 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- additional constraints e.g., a maximum number of ON bits allowed per code word (e.g., 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- the plurality of candidate code words may optionally be filtered to remove candidate code words that don’t conform to a specified edit distance criterion, e.g., a criterion that Hamming Distance (CWA ⁇ CWB) for all pairwise combinations with other candidate code words of the plurality, where CWA and CWB are candidate code words, and predetermined distance parameter Q is an integer having a value of greater than or equal to 1.
- a specified edit distance criterion e.g., a criterion that Hamming Distance (CWA ⁇ CWB) for all pairwise combinations with other candidate code words of the plurality, where CWA and CWB are candidate code words
- predetermined distance parameter Q is an integer having a value of greater than or equal to 1.
- Q is 2k+l , where k is an integer have a value greater than or equal to 1.
- Hamming distance is a special case of an edit distance, a class of metrics used to compare and evaluate distances between two character strings, which allow for three kinds of edit operations to be performed on the characters of one string to transform it into the other string (e.g., substitution, insertion, or deletion of a single character).
- Other examples of edit distances include the Longest Common Subsequence Distance (LCSD) and the Levenshtein distance (LevD).
- the Levenshtein distance allows for deletion, insertion and substitution.
- the Longest Common Subsequence Distance allows for insertion and deletion, but not substitution.
- the Hamming distance allows only substitution, and hence only applies to strings of the same length.
- the use of a higher value of k provides greater error detection and correction capability for overcoming noisy signal detection during image-based decoding.
- the minimum acceptable value of k is determined based on the observed signal detection error rate for a given instrument used to perform decoding.
- the remaining candidate code words may be filtered to remove candidate code words that don’t conform to another specified edit distance criteria, e.g., a criterion that Hamming Distance (CWA ⁇ CWB, all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where CWA ⁇ CWB indicates the logical bitwise OR combination of code words CWA and CWB, CWC ⁇ CWD indicates the logical bitwise OR combination of code words CWc and CWD, and K is an integer having a value greater than or equal to 1. In some instances, K may be equal to 1, 2, 3, 4, 5, 6, 7, 8, etc.
- CWA ⁇ CWB Hamming Distance
- the value of K is selectable by a user during design of the codebook.
- the hamming distance between two pairs of pairs of codewords is between 0 and the sum of their hamming weights (inclusive).
- the maximum OR-robust Hamming distance between pairs of pairs of codewords is 20 (z.e., none of the four codewords share any bit with any other codeword).
- the maximum OR-robust Hamming distance between pairs of pairs of codewords is 24.
- a first portion of the plurality of code words in the codebook satisfies a constraint that the Hamming Distance (CWA ⁇ CWB, CWC ⁇ CWD) ⁇ KI for all logical bitwise OR combinations of any two candidate code words in the first portion
- a second portion of the plurality of code words in the codebook satisfies a constraint that the Hamming Distance (CWA ⁇ CWB, CWcjCW ⁇ K2 for all logical bitwise OR combinations of any two candidate code words in the second portion, where Ki K2.
- the values of Ki and K2 are selectable by a user during design of the codebook.
- Such codebooks comprising a first portion of code words and a second portion of code words that satisfy different OR-robust constraints may be useful, for example, in situations where it is desirable to decode a first set of genes/transcripts with higher accuracy than a second set, so may use OR-robust code words that have a higher value of K (z.e., a stronger OR-robust criterion) for the first portion of code words than the remaining code words.
- K z.e., a stronger OR-robust criterion
- the remaining candidate code words may optionally be filtered to remove candidate code words that don’t conform to, e.g., a constraint on a maximum number of ON bits per code word segment.
- a maximum number of ON bits per code word segment may be 1 bit, 2 bits, 3 bits, 4 bits, 5 bits, 6 bits, 7 bits, 8 bits, or more than 8 bits depending on the length of the code word segment.
- the remaining candidate code words may optionally be filtered to remove candidate code words that don’t conform to one or more additional constraints, e.g., a maximum number of ON bits allowed per code word (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- additional constraints e.g., a maximum number of ON bits allowed per code word (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 ON bits depending on the length of the code word, exclusion of code words from a predetermined list of selected code words, etc.
- an OR-robust codebook is output that comprises a plurality of selected code words that meet the specified list of constraints.
- each code word of the plurality of selected code words comprises M x N bits, where M is a number of sequencing or probing cycles and A is a number of optical detection channels in an instrument configured to perform the in situ decoding.
- the OR-robust codebook may be tailored for a specific in situ detection or sequencing application by assigning one or more code words contained therein to each of a plurality of barcoded target molecules (or target analytes) of interest. In some instances, more than one code word may be assigned to a single barcoded target molecule.
- the code words thus correspond to and represent the physical barcodes (e.g., oligonucleotide barcode sequences) attached to the target molecules in a multiplexed in situ assay, where the relationship between the structure of the code words and the structure of the physical barcodes depends on the read-out method (e.g., hybridization probe-based detection or nucleic acid sequencing) used in the in situ assay.
- the logical bitwise OR between any two pairs of codewords will be equal to 1 or 2. In various embodiments, the logical bitwise OR between at least one pair of pair of codewords is 0.
- FIG. 2 provides a non-limiting example of a flowchart for a process 200 for assigning the code words in an OR-robust codebook to a corresponding list of target analytes.
- Process 200 can be performed, for example, using one or more electronic devices implementing a software platform.
- process 200 is performed using a client-server system, and the blocks of process 200 are divided up between the server and multiple client devices.
- portions of process 200 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 200 is not so limited. In other examples, process 200 is performed using only a client device or only multiple client devices.
- process 200 some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted. In some examples, additional steps may be performed in combination with the process 200. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
- a list of code words from an OR-robust codebook is received (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 2).
- the OR robust codebook may be, for example, a codebook generated using the process illustrated in FIG. 1 for which valid code words comply with the “OR- robust” constraint that Hamming Distance ( CWA ⁇ CWB, for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- a list of a plurality of target molecules (or target analytes) that are of interest in a particular experiment is received.
- the list may comprise a plurality of nucleic acids.
- the plurality of nucleic acids may comprise a plurality of genes.
- the plurality of nucleic acids may comprise a plurality of RNA transcripts.
- the plurality of target analytes may comprise a plurality of proteins.
- the plurality of target analytes may comprise a combination of nucleic acids e.g., genes or transcripts) and proteins.
- a target analyte from the list is assigned to at least one code word from the plurality of code words. Step 206 is repeated until all target analytes on the list have been assigned at least one code word from the OR-robust codebook.
- a code word from the codebook may be randomly assigned to each of the one or more barcoded target analytes.
- specific code words from the codebook e.g., those with the largest OR-robust distances
- a code word from the code book may be assigned to each of the one or more barcoded target analytes based on a decision rule designed to ensure that a total number of ON signals detected in a given image of the plurality of images is within ⁇ 5%, ⁇ 10%, ⁇ 15%, or ⁇ 20% of a mean number of ON signals detected per image for the plurality of images.
- a code word from the code book may be assigned to each of the one or more barcoded target analytes based on a decision rule designed to minimize a maximum predicted density of ON signals detected in images of the plurality of images.
- a code word from the code book may be assigned to each of two or more barcoded target analytes based on expression data for the two or more barcoded target analytes in clustered cell types (e.g., where code words with the largest OR-robust distances are assigned to genes/transcripts with the highest expression levels), and where the clustered cell types represent a distribution of cell types found in the biological sample.
- the expression data for the two or more barcoded target analytes may comprise bulk gene expression data, bulk protein expression data, spatial gene expression data, spatial protein expression data, single cell gene expression data, single cell protein expression data, or any combination thereof.
- the two or more assigned code words are rank-ordered according to code word weight
- the two or more barcoded target analytes are rank-ordered according to a maximum expression level across all clustered cell types
- the two or more rank-ordered code words are assigned to the two or more rank- ordered barcoded target analytes using an iterative process repeated for each of the two or more barcoded target analytes in decreasing order of maximum expression level, the iterative process comprising: computing a predicted density of ON signals for every combination of remaining, unassigned code words and the barcoded target analyte across the plurality of images; selecting a code word from the remaining, unassigned code words that minimizes the predicted density of ON signals across the plurality of images; and assigning the selected code word to the barcoded target analyte.
- the updated OR-robust codebook that includes code word - target analyte assignments is output.
- FIG. 3 provides a non-limiting example of a flowchart for a process 300 for decoding optical signals derived from images of a biological sample to identify barcoded target analytes.
- Process 300 can be performed, for example, using one or more electronic devices implementing a software platform.
- process 300 is performed using a client-server system, and the blocks of process 300 are divided up between the server and multiple client devices.
- portions of process 300 are described herein as being performed by particular devices of a client-server system, it will be appreciated that process 300 is not so limited. In other examples, process 300 is performed using only a client device or only multiple client devices.
- process 300 some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally omitted. In some examples, additional steps may be performed in combination with the process 300. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.
- an OR-robust codebook comprising a plurality of valid code words and their corresponding target analytes is received (e.g., by one or more processors of a system configured to perform the process illustrated in FIG. 3).
- the OR robust codebook may be, for example, a codebook generated using the process illustrated in FIG. 1 for which valid code words comply with the “OR-robust” constraint that Hamming for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- all valid code words in the OR-robust codebook may comply with the with the “OR-robust” constraint that Hamming Distance (CWA ⁇ CWB, CWC ⁇ CWD) for all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- At least a first portion of the valid code words in the OR-robust codebook may comply with the “OR-robust” constraint that Hamming Distance ( CWA ⁇ CWB, CWCI CWD) all logical bitwise OR combinations of any two candidate code words in a plurality of remaining candidate code words, where K is an integer having a value of greater than or equal to 1.
- the first portion of the valid code words may comprise, for example, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the valid code words in the OR-robust codebook, and the remaining portion of the valid code words may not comply with the OR-robust property.
- the codebook may comprise at least 50 valid code words (e.g., at least 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, or 1,000 code words). In some instances, the codebook may comprise up to 300,000 valid code words (e.g., up to 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 120,000, 140,000, 160,000, 180,000, 200,000, 220,000, 240,000, 260,000, 280,000, or 300,000 valid code words).
- the codebook may comprise at least 50 valid code words (e.g., at least 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, or 1,000 code words). In some instances, the codebook may comprise up to 300,000 valid code words (e.g., up to 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 120,000, 140,000, 160,000, 180,000, 200,000, 220,000, 240,000,
- a plurality of images of a biological sample is received, where the plurality of images was acquired over a plurality of decoding (sequencing or probing) cycles, and where each image comprises a plurality of observed optical signals.
- the biological sample may comprise a tissue sample.
- the biological sample may comprise cells, e.g., cells derived from a cell culture, a tissue sample, or cells deposited on a surface.
- the biological sample may comprise, e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein.
- the plurality of images may comprise a plurality of images comprising different fields-of-view of the biological sample.
- one or more images e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images
- each decoding (probing or sequencing) cycle as necessary to image the entire cross-sectional area of the biological sample.
- the plurality of images may comprise a plurality of z-stack images of the biological sample.
- a z-stack of images i.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample
- each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images acquired at different focal planes within the thickness of the biological sample.
- the plurality of observed optical signals may comprise signal intensity measurements based on the plurality of images. In some instances, the plurality of optical signals and may represent light emitted from a plurality of fluorophores.
- the plurality of observed optical signals is decoded to obtain a plurality of observed code words.
- decoding the plurality of observed optical signals in the plurality of images to obtain a plurality of observed code words may comprise: determining a location of each observed optical signal in a first image of the plurality of images; aligning the location of each observed optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the first image to a corresponding location of each observed optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the remaining images of the plurality of images to obtain a series of observed optical signals at each location; and obtaining the plurality of observed code words based on the series of observed optical signals at each location.
- aligning the locations of each optical signal (or corresponding feature, e.g., an RCP derived from a target analyte) in the first image to corresponding locations in the remaining images of the plurality of images may comprise registering the plurality of images acquired over the plurality of decoding (sequencing or probing) cycles.
- the alignment may comprise determining that optical signals derived from features (e.g., RCPs derived from target analytes) in different images arise from the same feature if the features in different images are within about 5, 10, 15, 20, 40, or 50 nm of each other.
- each observed optical signal of the plurality of observed optical signals in each image has associated therewith at least one value .
- the at least one value includes an intensity value of the observed optical signal for each color channel.
- the at least one value includes one or more statistical parameters, such as, for example, mean brightness, median brightness, variance, or standard deviation.
- the at least one intensity value may comprise an analog intensity value.
- an analog intensity value is determined for each light signal (or lack of light signal) in each color channel for each cycle of the plurality of cycles. For example, an analog intensity value may have a range of 0 (no intensity observed) to 12,000 (e.g., a full well capacity of each pixel in the sensor array).
- Full well capacity is defined as the amount of charge that can be stored within an individual pixel without the pixel becoming saturated (when an individual pixel can no longer accept any more photoelectrons). Full well capacity is dependent on the pixel size of the sensor and the camera operating voltages.
- the analog intensity value is a sum of intensities from multiple pixels (e.g., two or more adjacent pixels). In some instances, the analog intensity value is an area under the curve. In some instances, an amplitude of an observed optical signal is a value of the peak of the spot, constrained by the well depth. In some instances, an analog intensity value is determined at a specific position for a detected RCP in each color channel. In some instances, presence of signal is detected separately in each color channel.
- the separately detected signals are combined into a single vector or array.
- an intensity measurement in that channel will be missing or assigned a value of zero or null.
- An exemplary set of analog intensity values for a single RCP detected during a single imaging cycle may be ⁇ 11000, 100, 0, 0 ⁇ indicating that a high intensity was detected in the first color channel (e.g., red color channel), a small amount of intensity was detected in a second color channel (e.g., yellow color channel), and no intensity was detected in the third (e.g., green color channel) and fourth color channels (e.g., blue color channel).
- the intensity values are binned into a single bin of a plurality of bins, where each bin represents a range of intensity values.
- the plurality of bins includes more than 2 bins.
- the plurality of bins includes 3 bins, 4 bins, 5 bins, 6 bins, 7 bins, 8 bins, 9 bins, 10 bins, 11 bins, 12 bins, 13 bins, 14 bins, 15 bins, 16 bins, 17 bins, 18 bins, 19 bins, 20 bins, 21 bins, 22 bins, 23 bins, 24 bins, 25 bins, etc.
- the plurality of bins includes more than 25 bins.
- the plurality of bins includes up to 100 bins.
- the plurality of bins may include 4 bins as follows: Bin 0 is 0 to 2999; Bin 1 is 3000 to 5999; Bin 2 is 6000 to 8999; Bin 3 is 9000 to 12000.
- intensity values from 0 to 2999 are binned into Bin 0, intensity values from 3000 to 5999 are binned into Bin 1, intensity values from 6000 to 8999 are binned into Bin 2, and intensity values from 9000 to 12000 are binned into Bin 3.
- each of the plurality of bins have approximately equal sizes (as per the example above).
- the plurality of bins has different sizes, for example, where Bin 0 is 0 to 1999; Bin 1 is 2000 to 4999; Bin 2 is 5000 to 8999; Bin 3 is 9000 to 12000.
- a set of intensity values are represented by the bin number into which the intensity value is placed.
- An exemplary set of binned intensity values for a single RCP detected during a single imaging cycle may be ⁇ 3, 1, 0, 0 ⁇ indicating that a high intensity was detected in the first color channel (e.g., red color channel), a small amount of intensity was detected in a second color channel (e.g., yellow color channel), and no intensity was detected in the third (e.g., green color channel) and fourth color channels (e.g., blue color channel).
- the at least one intensity value may comprise a raw intensity value, a normalized intensity value, or a calculated intensity value calculated based on at least one of: a size of a feature corresponding to the observed optical signal (e.g., the radius of an imaged RCP), a circularity of a feature corresponding to the observed optical signal (e.g., the circularity of an imaged RCP), or one or more Gaussian statistical parameters (e.g., mean, standard deviation, variance, etc.) characterizing a feature corresponding to the observed optical signal (e.g., an imaged RCP).
- pixel intensity values in an image are normalized based on pixel intensity of background signals and pixel intensity of puncta detected within the image.
- pixel values of an image are scaled using a background measurement (e.g., a mean or median of background intensities) as a floor and a predetermined intensity percentile (e.g., 99th intensity percentile) of the detected puncta as a ceiling.
- a background measurement e.g., a mean or median of background intensities
- a predetermined intensity percentile e.g., 99th intensity percentile
- intensity values are normalized.
- the intensity values of puncta e.g., observed optical signals
- a high percentile value e.g., 99 th percentile
- the values are scaled by the median raw intensity over all images to bring the values back into an intensity range similar to the original observed values.
- a third step for every decoding neighborhood (e.g., a predetermined radius around a specific puncta) of puncta, divide the intensity values of all puncta by the intensity value of the central puncta of the neighborhood, so that systematically dimmer puncta may decode while penalizing variance in the brightness values.
- the first and/or second steps may be omitted.
- the third step reduces FOV-to-FOV decoding variability ("global decoding").
- the process may further comprise comparing the at least one intensity value representing an intensity of each observed optical signal to a predetermined intensity threshold to determine a binary value representing the intensity of each observed optical signal.
- each binary value comprises a 1 or a 0, wherein 1 represents an observed optical signal for which intensity is greater than or equal to the predetermined intensity threshold (e.g., an ON signal), and 0 represents an observed optical signal for which intensity is less than the predetermined intensity threshold (e.g., an OFF signal).
- decoding the plurality of observed optical signals in the plurality of images may comprise obtaining the plurality of observed code words based on a series of binary values determined for each location.
- each observed code word of the plurality of code words may comprise a plurality of code word segments, and each code word segment may comprise a specified string of binary values that corresponds to one of a specified set of observed optical signal states.
- each code word segment comprises, for example, a four bit string of binary values such that:
- a code word segment of 1 0 00 corresponds to a first optical signal state, A, in which an optical signal is detected in a first detection channel of a four-channel optical imaging instrument, and no optical signal is detected in a second, third, or fourth detection channel of the four-channel optical imaging instrument;
- a code word segment of 0 1 00 corresponds to a second optical signal state, B, in which an optical signal is detected in the second detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, third, or fourth detection channel of the four-channel optical imaging instrument;
- a code word segment of 00 1 0 corresponds to a third optical signal state, C, in which an optical signal is detected in the third detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or fourth detection channel of the four-channel optical imaging instrument;
- a code word segment of 00 0 1 corresponds to a fourth optical signal state, D, in which an optical signal is detected in the fourth detection channel of the four-channel optical imaging instrument, and no optical signal is detected in the first, second, or third detection channel of the four-channel optical imaging instrument;
- a code word segment of 00 00 corresponds to a fifth optical signal state, E, in which no optical signal is detected in any of the first, second, third, or fourth detection channels of the four-channel optical imaging instrument.
- the valid code words in the codebook are stored in a database using the optical signal state format.
- a valid codeword for a 15 cycle run may be AEEEDEEBEEEAECE.
- the observed optical signal is converted into the optical signal state format before assigning the observed optical signal to a valid code word.
- a probabilistic method of decoding is used to map a set of intensity values (e.g., binary intensity values, analog intensity values, binned intensity values, etc.) from each cycle of the plurality of cycles to an observed optical signal state.
- a frequency table is generated using all possible combinations of intensity values for the color channels (e.g., four color channels) such that the frequency table maps each unique set of four intensity values to a most-likely optical signal state (e.g., A, B, C, D, or E).
- the frequency table is generated from previous runs of an opto-fluidic instrument.
- the frequency table is generated using a control sample.
- the frequency table is updated during (e.g., after each cycle) or after each run in complete.
- an observed set of binned intensity values ⁇ 3, 1, 0, 0 ⁇ may be most likely to map to state A based on the frequency table.
- an observed set of binned intensity values ⁇ 3, 0, 0, 0 ⁇ may be most likely to map to state A based on the frequency table.
- an observed set of binned intensity values ⁇ 0, 1, 3, 1 ⁇ may be most likely to map to state C based on the frequency table (e.g., the low intensities may be caused by autofluorescence or spectral crosstalk).
- an observed set of binned intensity values ⁇ 0, 0, 2, 0 ⁇ may be most likely to map to state C based on the frequency table.
- an observed set of binned intensity values ⁇ 1, 1, 1, 1 ⁇ may be most likely to map to state E based on the frequency table.
- an observed set of binned intensity values ⁇ 0, 0, 0, 0 ⁇ may be most likely to map to state E based on the frequency table.
- an observed set of binned intensity values ⁇ 0, 0, 1, 1 ⁇ may be most likely to map to state E based on the frequency table.
- the optical signal state can be converted into a binary format as described above with respect to the code word segments.
- each observed code word is analyzed to determine if an assignment of the observed code word to one of the plurality of valid code words from the OR-robust codebook can be made, or alternatively, to determine that the observed code word is not a valid code word.
- Step 308 is repeated until all of the observed code words have been processed and either assigned to valid code words or classified as artifacts resulting from, e.g., non-specific hybridization of a labeled detection probe used for decoding or a sequencing error.
- determining the assignment of the observed code word to one of the plurality of valid code words may comprise identifying a valid code word of the plurality of valid code words that is identical to the observed code word.
- determining the assignment of the observed code word to one of the plurality of valid code words may comprise changing at least one of the binary values in the series of binary values corresponding to the observed code word to thereby assign the observed code word to a valid code word of the plurality of valid code words.
- determining the assignment of the observed code word to one of the plurality of valid code words may comprise determining a plurality of scores based on comparison of the observed code word to all or a portion of the plurality of valid code words.
- determining the assignment of the observed code word to one of the plurality of valid code words may further comprise selecting one of the plurality of valid code words having a highest score to assign as a replacement for the observed code word.
- the presence (and location) of a barcoded target analyte within the biological sample is identified for each valid code word detected in the plurality of images.
- the identified target analyte may comprise a messenger RNA (mRNA) molecule or protein molecule.
- the process illustrated in FIG. 3 may comprise post-processing of a plurality of stored optical images to obtain the plurality of optical signals (and their respective locations) for subsequent use in determining observed code words and determining their assignment to valid code words in a codebook.
- all or a portion of the process illustrated in FIG. 3 may be performed in the cloud, e.g., using a received plurality of images or received optical signal data (and corresponding location data) previously derived from the plurality of images.
- the process may comprise: receiving a codebook comprising a plurality of valid code words, where, for all or a first portion of valid code words in the codebook, a first Hamming distance between a first logical bitwise OR combination of any pair of valid code words and a second logical bitwise OR combination of any other pair of valid code words is greater than or equal to 1 ; receiving a plurality of locations for a plurality of observed optical signals, where the plurality of observed optical signals are obtained from a plurality of images of a biological sample acquired over a plurality of sequencing or probing cycles; decoding the plurality of observed optical signals to obtain a plurality of observed code words at the plurality of locations; and for each observed code word: (i) determining an assignment of the observed code word to one of the plurality of valid code words, or (ii) determining that the observed code word is not a valid code word of the plurality of valid code words.
- FIGS. 4A-4B illustrate the structure of binary code words, in accordance with some implementations of the methods described herein.
- the code words correspond to and represent the physical barcodes e.g., oligonucleotide barcode sequences) attached to the target molecules in a multiplexed in situ assay, where the relationship between the structure of the code words and the structure of the physical barcodes depends on the read-out method (e.g., hybridization probe-based detection or nucleic acid sequencing) used in the in situ assay.
- the read-out method e.g., hybridization probe-based detection or nucleic acid sequencing
- FIG. 4A depicts a non-limiting example of the structure of a binary code word for use with hybridization probe-based in situ detection of barcoded target analytes (described in more detail below).
- Each code word comprises a series of code word segments (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 segments), where each code word segment comprises as series of bits (e.g., 2, 3, 4, or more than 4 bits), and where each bit in a given code word segment corresponds to the detection of an ON signal (“1”) or an OFF signal (“0”) in a given optical detection channel in a given decoding cycle.
- the number of bits in each code word segment corresponds to the number of optical detection channels (e.g., different fluorescence emission detection channels or different color detection channels) in an imaging instrument used to perform a cyclical decoding process comprising, e.g., 4, 5,
- each code word segment corresponds to optical signals detected in images acquired in a given decoding cycle after contacting the biological sample (e.g., a tissue specimen) with a set of detectably-labeled hybridization probes designed to hybridize to a segment of the physical barcode (e.g., a segment of the oligonucleotide barcode sequence).
- the biological sample e.g., a tissue specimen
- a set of detectably-labeled hybridization probes designed to hybridize to a segment of the physical barcode (e.g., a segment of the oligonucleotide barcode sequence).
- FIG. 4B depicts a non-limiting example of the structure of a binary code word for use with sequencing-based in situ detection of barcoded target analytes (described in more detail below).
- each code word comprises a series of code word segments (e.g., 4, 5, 6,
- each code word segment comprises as series of bits (e.g., 2, 3, 4, or more than 4 bits), and where each bit in a given code word segment corresponds to the detection of an ON signal (“1”) or an OFF signal (“0”) in a given optical detection channel in a given sequencing cycle.
- the number of bits in each code word segment corresponds to the number of optical detection channels (e.g., different fluorescence emission detection channels or different color detection channels) in an imaging instrument used to perform a cyclical sequencing process comprising, e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more than 15 sequencing cycles.
- the total number of bits in the binary code word is given by:
- each code word segment corresponds to optical signals detected in images acquired in a given sequencing cycle to determine the identify of a single nucleotide in the physical barcode (e.g., the oligonucleotide barcode sequence).
- FIG. 5 provides a non-limiting schematic illustration of hybridization probe-based in situ detection of barcoded target analytes (or amplified representations, e.g., RCPs, thereof), where the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the physical barcode sequences each comprise a series of short barcode (BC) segments (e.g., BC segment 1, BC segment 2, , BC segment M) with one barcode segment for each cycle in a cyclical decoding (probing) process (comprising M cycles in total) that is used to decode a set of optical signals associated with each barcode as detected in a plurality of images acquired of a biological sample during the cyclical decoding (probing) process.
- BC short barcode
- each decoding (probing) cycle a set of detectably-labeled hybridization probes (e.g., fluorescently-labeled hybridization probes) that are designed to hybridize to specific barcode segments are introduced into a biological sample (e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein) and allowed to hybridize to a corresponding barcode segment.
- a biological sample e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein
- the number of unique hybridization probes in the set is typically the same as the number of unique barcode segments to be probed in a given decoding (probing) cycle.
- all of the unique hybridization probes in the set may be labeled with a detectable label, e.g., a fluorescent label, where different hybridization probes in the set are labeled with a different fluorophore.
- a detectable label e.g., a fluorescent label
- only a subset of the unique hybridization probes in the set may be labeled with a detectable label, e.g., a fluorophore, where different hybridization probes in the subset are labeled with a different fluorophore.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled hybridization probes may be the same for sets used in different cycles of the hybridization probe-based decoding process. In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably- labeled hybridization probes may be different for sets used in different cycles of the hybridization probe-based decoding process.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled hybridization probes will depend on factors such as the number of different optical detection channels (e.g., one color, two color, three color, or four color detection) in the instrument used to perform decoding, and the design of the code words used in the multiplexed in situ assay (e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design).
- the design of the code words used in the multiplexed in situ assay e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design).
- the total number of unique barcode segments to be probed in a given decoding (probing) cycle may be, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 unique barcode segments (or unique hybridization probes).
- the biological sample is then imaged, and the image(s) (e.g., fluorescence image(s)) are processed using any of a variety of image processing techniques known to those of skill in the art to measure signal intensities at the locations of a plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- image(s) e.g., fluorescence image(s)
- RCPs amplified representations
- the plurality of barcoded target molecules may comprise, e.g., at least 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- One or more images comprising different fields-of-view of the biological sample may be acquired in each cycle as necessary to image the entire cross-sectional area of the biological sample.
- a z-stack of images z.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample
- each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 25, 30, 35, 40, 45, 50, or more than 50 images acquired at different focal planes within the thickness of the biological sample.
- the hybridized probes are stripped from the biological sample and the process is repeated for a specified number of cycles, M.
- FIG. 6 provides a non-limiting schematic illustration of sequencing-based in situ detection of barcoded target analytes (or amplified representations, e.g., RCPs, thereof), where the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the barcodes comprise, e.g., oligonucleotide barcode sequences that have been assigned to a corresponding code word from an OR-robust codebook.
- the physical barcode sequences each comprise an oligonucleotide sequence of M nucleotides in length, where one nucleotide is to be identified in each cycle in a cyclical decoding (base-by- base nucleic acid sequencing) process (comprising M cycles in total) that is used to decode a set of optical signals associated with each barcode as detected in a plurality of images acquired of a biological sample during the cyclical decoding (sequencing) process.
- base-by- base nucleic acid sequencing comprising M cycles in total
- Any of a variety of base-by-base sequencing techniques known to those of skill in the art may be used to determine barcode sequences multiplexed in situ assays that utilize the disclosed codebook design methods.
- a set of detectably-labeled nucleotides e.g., fluorescently-labeled, 3’ reversibly terminated nucleotides
- a biological sample e.g., a tissue specimen that has been fixed, embedded, and/or cleared as described elsewhere herein
- a polymerase e.g., a reversed barcode sequence
- the number of unique 3’ reversibly terminated nucleotides in the set is typically the same as the number of unique nucleotide residues (typically four) that are potentially present at a given position in the barcode sequence to be probed in a given decoding (sequencing) cycle.
- all of the unique 3’ reversibly terminated nucleotides in the set may be labeled with a detectable label, e.g., a fluorescent label, where different nucleotides in the set are labeled with a different fluorophore.
- a detectable label e.g., a fluorescent label
- only a subset of the unique 3’ reversibly terminated nucleotides in the set may be labeled with a detectable label, e.g., a fluorophore, where different nucleotides in the subset are labeled with a different fluorophore.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides may be the same for sets used in different cycles of the sequencing -based decoding process. In some instances, the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides may be different for sets used in different cycles of the sequencing-based decoding process.
- the number of different detectable labels, e.g., fluorophores, used in each set of detectably-labeled 3’ reversibly terminated nucleotides will depend on factors such as the number of different optical detection channels (e.g., one color, two color, three color, or four color detection) in the instrument used to perform decoding, and the design of the code words used in the multiplexed in situ assay (e.g., in some cases, an absence of signal in a given decoding cycle (i.e., an OFF signal) may be used as part of code word design)
- the total number of unique nucleotide residues to be probed in a given decoding (sequencing) cycle may be, e.g., 2, 3, or 4 (or more than 4 if non-natural nucleotides that obey similar base-pairing rules are included).
- the biological sample is then imaged, and the image(s) (e.g., fluorescence image(s)) are processed using any of a variety of image processing techniques known to those of skill in the art to measure signal intensities at the locations of a plurality of barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- image(s) e.g., fluorescence image(s)
- RCPs amplified representations
- the plurality of barcoded target molecules may comprise, e.g., at least 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, or 100,000 barcoded target molecules (or amplified representations, e.g., RCPs, thereof).
- One or more images comprising different fields-of-view of the biological sample may be acquired in each cycle as necessary to image the entire cross-sectional area of the biological sample.
- a z-stack of images z.e., a series of images acquired at each of two or more focal planes within the thickness of the biological sample
- each z-stack of image may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 images acquired at different focal planes within the thickness of the biological sample.
- the 3’ reversibly terminated nucleotide that has been incorporated into the priming strand is deprotected and the process is repeated for a specified number of cycles, M.
- the processing of images acquired during the in situ decoding schemes illustrated in the flowcharts of FIG. 5 and FIG. 6 is performed in similar fashion.
- the images may be processed in real-time immediately following acquisition.
- the images may be post-processed, i.e., they may be stored in computer memory and processed at a later time.
- Processing of the image(s) acquired in each decoding (probing or sequencing) cycle results in the generation of a fluorescence data set for each decoding (probing or sequencing) cycle (e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M) that each comprise measured fluorescence signal intensities (in the case that the detectable labels comprise fluorophores) for each of the plurality of locations at which target molecules (or amplified representations, e.g., RCPs, thereof) are detected.
- a fluorescence data set for each decoding (probing or sequencing) cycle e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M
- each comprise measured fluorescence signal intensities in the case that the detectable labels comprise fluorophores
- the fluorescence data sets comprise measured fluorescence signal intensities for each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) in two dimensions.
- the fluorescence data sets comprise measured fluorescence signal intensities for each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) in three dimensions.
- the compiled set of fluorescence data sets e.g., fluorescence data set 1, fluorescence data set 2, , fluorescence data set M
- fluorescence data set 1 may then be processed to identify a series of fluorescence signals at each of a plurality of target molecule locations (or the locations of amplified representations, e.g., RCPs, thereof) detected in the images acquired over the course of performing the M decoding (probing or sequencing) cycles.
- the fluorescence signals may comprise analog signals (z.e., continuous, real-valued fluorescence intensity signals, such as those obtained when using photomultipliers or photomultiplier arrays).
- the fluorescence signals may comprise digital signals (z.e., digitized renditions of continuous, real-valued fluorescence intensity signals, such as those obtained when using CMOS or CCD image sensors).
- the fluorescence signals may be processed, e.g., to perform one or more of background subtraction, normalization, fitting to a Gaussian or other line shape function, determination of a centroid position, etc. Any of a variety of image processing methods known to those of skill in the art may be used for image processing / pre-processing.
- Examples include, but are not limited to, Canny edge detection methods, Canny-Deriche edge detection methods, first-order gradient edge detection methods (e.g., the Sobel operator), second order differential edge detection methods, phase congruency (phase coherence) edge detection methods, other image segmentation algorithms (e.g., intensity thresholding, intensity clustering methods, intensity histogram-based methods, etc.), feature and pattern recognition algorithms (e.g., the generalized Hough transform for detecting arbitrary shapes, the circular Hough transform, etc.), and mathematical analysis algorithms (e.g., Fourier transform, fast Fourier transform, wavelet analysis, auto-correlation, etc.), or any combination thereof.
- Canny edge detection methods Canny-Deriche edge detection methods
- first-order gradient edge detection methods e.g., the Sobel operator
- second order differential edge detection methods e.g., phase congruency (phase coherence) edge detection methods
- other image segmentation algorithms e.g., intensity thresholding, intensity clustering methods, intensity histogram-
- the fluorescence signals may be processed and/or compared to a predetermined fluorescence intensity threshold to generate corresponding binary signal values (e.g., ON signals (“1”) or OFF signals (“0”) that indicate whether or not a fluorescence signal of intensity greater than or equal to the predetermined fluorescence intensity threshold was detected in a given optical detection channel (e.g., a given fluorescence emission detection channel or a given color detection channel) for a given decoding (probing or sequencing) cycle.
- a predetermined fluorescence intensity threshold e.g., ON signals (“1”) or OFF signals (“0”) that indicate whether or not a fluorescence signal of intensity greater than or equal to the predetermined fluorescence intensity threshold was detected in a given optical detection channel (e.g., a given fluorescence emission detection channel or a given color detection channel) for a given decoding (probing or sequencing) cycle.
- a given optical detection channel e.g., a given fluorescence emission detection channel or a given color detection channel
- the series of binary signal values determined for each target molecule location (or the location of the amplified representation, e.g., RCP, thereof) in the series of M decoding (probing or sequencing) cycles may then be used, in combination with prior knowledge of the optical detection channels for which signals were detected in each decoding (probing or sequencing) cycle, to identify a plurality of observed code words corresponding to the plurality of barcoded target molecules.
- an observed code word may be identical to one of the valid code words from the OR-robust codebook and the identity of the corresponding target molecule can be determined directly from the OR-robust codebook assignments.
- an observed code word may correspond closely to one of the valid code words from the OR-robust codebook, but may not be identical series of binary values.
- the properties of the OR-robust code book may be used to detect and/or correct errors arising from, e.g., non-specific hybridization of detectably-labeled probes, or sequencing errors, and thereby assign the observed code word to a valid code word from the OR-robust codebook.
- an observed code word may be assigned to a valid code word if changing one or more of the binary values (e.g., bits) in the series of binary values corresponding to the observed code word results in with the observed code word being identical to a valid code word of the plurality of valid code words in the OR-robust codebook.
- an observed code word may be assigned to (e.g., replaced by) a valid code word based on determining a plurality of scores (e.g., pairwise edit distances, Hamming distances, and/or Hamming distances between logical bitwise OR code word combinations) based on comparison of the observed code word to all or a portion of the plurality of valid code words in the OR-robust codebook.
- the observed code word may be assigned to (e.g., replaced by) a valid code word that exhibits the highest score (e.g., the minimum edit distance, Hamming distance, and/or Hamming distance between logical bitwise OR code word combinations).
- each score in the plurality of scores is a probability (e.g., 0 to 1). In some instance, the highest score is the highest probability. In some instances, each score in the plurality of scores is a loglikelihood. In some instance, the highest score is the highest log-likelihood.
- one or more observed code words may be assigned to (e.g., replaced by) valid code words based on replacement with a corresponding valid code word in the OR-robust codebook that has a maximum likelihood as computed from the log likelihood (or negative log likelihood) of a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals (e.g., fluorescence signals) associated with a set of hybridization probes or nucleotides used to detect the barcode sequences.
- optical signals e.g., fluorescence signals
- one or more observed code words may be assigned to (e.g., replaced by) valid code words based on replacement with a corresponding valid code word in the OR-robust codebook that: (i) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) from the observed code word, and (ii) has a maximum likelihood as computed from the log likelihood (or negative log likelihood) for a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals associated with a set of hybridization probes or nucleotides used to detect the barcode sequences.
- a predetermined pairwise edit distance e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words
- one or more observed code words may be assigned to (e.g., replaced by) valid code words based on an iterative process comprising correcting the one or more observed code words by replacement with one of the valid code words that: (i) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) from the observed code word (determined, for example, by rank-ordering the set of valid code words according to their pairwise edit distance from the observed code word), and (ii) has a maximum likelihood as computed from a log likelihood (or negative log likelihood) for a probability distribution generated by a probabilistic model that provides probabilities for detecting a given code word, or code word segment thereof, at a given location in a given decoding (probing or sequencing) cycle based on a set of detected optical signals, and updating the probabilistic model using the corrected code words, where the process is repeated until a fully corrected set of validate
- each previously corrected code word is replaced with one of the valid code words that: (iii) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) of the previously corrected code word, and (iv) has a maximum likelihood as computed from the log likelihood (or negative log likelihood) for a probability distribution generated by the updated probabilistic model.
- rates of various error modes are estimated, such of optical cross-talk or stripping errors, by comparing the observed codeword to the bestmatching valid codeword, and a probabilistic decoding model can be updated based on the estimated error rates.
- parameters of a maximum likelihood model are updated according to the empirical rates of those errors.
- each previously corrected code word is replaced with one of the valid code words that: (iii) is within a predetermined pairwise edit distance (e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words) of the previously corrected code word, and (iv) has a maximum likelihood as computed from the truncated log likelihood (or negative truncated log likelihood) for a probability distribution generated by the updated probabilistic model.
- a predetermined pairwise edit distance e.g., a predetermined Hamming distance and/or a predetermined Hamming distance between logical bitwise OR combinations of valid code words
- the provided methods involve analyzing, e.g., detecting or determining, one or more sequences present in the probes or probe sets or products thereof (e.g., rolling circle amplification products thereof).
- the detecting is performed at one or more locations in the biological sample.
- the locations are the locations of RNA transcripts in the biological sample.
- the locations are the locations at which the probes or probe sets hybridize to the RNA transcripts in the biological sample, and are optionally ligated and amplified by rolling circle amplification.
- detecting the one or more sequences present in the probes or probe sets in the biological sample is performed, and the detected sequences are compared to an expected set of detected sequences.
- the expected set of sequences is based on the barcode sequences of the panels of probes or probe sets in the probe mixture and the known expression levels of the RNA transcripts of the first, second, and/or third sets of genes in the first and second cell populations.
- the one or more sequences are one or more barcode sequences or complements thereof.
- the expected set of detected sequences include sequences expected to be detected at a high expression level (e.g., more than 20 counts of the detected sequence per cell) in one or both of the first and second cell populations.
- the expected set of detected sequences include sequences expected to be detected at a medium expression level (e.g., 5-20 counts of the detected sequence per cell) in one or both of the first and second cell populations. In some embodiments, the expected set of detected sequences include sequences expected to be detected at a low expression level (e.g., 1-5 counts of the detected sequence per cell) in one or both of the first and second cell populations.
- a medium expression level e.g., 5-20 counts of the detected sequence per cell
- the expected set of detected sequences include sequences expected to be detected at a low expression level (e.g., 1-5 counts of the detected sequence per cell) in one or both of the first and second cell populations.
- the detecting comprises a plurality of repeated cycles of hybridization and removal of probes (e.g., detectably labeled probes, or intermediate probes that bind to detectably labeled probes) to the primary probe or probe set hybridized to the target nucleic acid, or to a rolling circle amplification product generated from the probe or probe set hybridized to the target nucleic acid.
- probes e.g., detectably labeled probes, or intermediate probes that bind to detectably labeled probes
- Detectably-labeled probes can be useful for detecting multiple target nucleic acids and be detected in one or more hybridization cycles (e.g., sequential hybridization assays, or sequencing by hybridization).
- the detecting can comprise binding an intermediate probe directly or indirectly to the primary probe or probe set, binding a detectably labeled probe directly or indirectly to a detection region of the intermediate probe, and detecting a signal associated with the detectably labeled probe.
- the method comprises detecting a rolling circle amplification product (RCP) generated using a circular or circularized primary probe or probe set as a template.
- the method comprises detecting a rolling circle amplification product (RCP) generated using a circular or circularized probe or probe that binds to a primary probe or probe set as a template.
- detecting the RCP comprises binding an intermediate probe directly or indirectly to the RCP, binding a detectably labeled probe directly or indirectly to a detection region of the intermediate probe, and detecting a signal associated with the detectably labeled probe.
- the method can comprise performing one or more wash steps to remove unbound and/or nonspecifically bound intermediate probe molecules from the primary probes or the products of the primary probes.
- the detecting can comprise: detecting signals associated with detectably labeled probes that are hybridized to barcode regions or complements thereof in the primary probe or probe set or a product thereof (e.g., an RCP); and/or detecting signals associated with detectably labeled probes that are hybridized to intermediate probes which are in turn hybridized to the barcode regions or complements thereof.
- the detectably labeled probes can be fluorescently labeled.
- the methods comprise detecting the sequence in all or a portion of a primary probe or probe set or an RCP, or detecting a sequence of the primary probe or probe set or RCP, such as one or more barcode sequences present in the primary probe or probe set or RCP.
- the sequence of the RCP, or barcode thereof is indicative of a sequence of the target nucleic acid to which the RCP is hybridized.
- the analysis and/or sequence determination comprises detecting a sequence in all or a portion of the nucleic acid concatemer and/or in situ hybridization to the RCP.
- the detection step is by sequential fluorescent in situ hybridization (e.g., for combinatorial decoding of the barcode sequence or complement thereof).
- the detection or determination comprises hybridizing to a probe directly or indirectly a detection oligonucleotide labeled with a fluorophore, an isotope, a mass tag, or a combination thereof.
- the detection or determination comprises imaging the probe hybridized to the target nucleic acid (e.g., imaging one or more detectably labeled probes hybridized thereto).
- the target nucleic acid is an mRNA in a tissue sample, and the detection or determination is performed when the target nucleic acid and/or the amplification product is in situ in the tissue sample.
- the target nucleic acid is an amplification product (e.g., a rolling circle amplification product).
- sequencing can be performed by sequencing-by- synthesis (SBS).
- a sequencing primer is complementary to primer binding sequences located at or near the one or more barcode sequence(s).
- sequencing-by- synthesis can comprise reverse transcription and/or amplification in order to generate a template sequence from which a primer sequence can bind.
- Exemplary SBS methods comprise those described for example, but not limited to, US 2007/0166705, US 2006/0188901, US 7,057,026, US 2006/0240439, US 2006/0281109, US 2011/0059865, US 2005/0100900, US 9,217,178, US 2009/0118128, US 2012/0270305, US 2013/0260372, and US 2013/0079232, all of which are herein incorporated by reference in their entireties.
- Accurate decoding of a single-stranded template (barcode) sequences relies on successfully classifying signals that arise from the stepwise addition of A, G, C, and T nucleotides by a polymerase to a complementary primer extension strand.
- these methods typically include modifying the template sequences with a known adapter sequence used to tether the template sequences to a solid support (e.g., the interior surface(s) of a flow cell) in a random or patterned array by hybridization to complementary adapter sequence attached to the support surface, where the adapter sequences typically also include primer binding sites used for clonal amplification and/or sequencing.
- the template sequences may be designed to include both the barcode sequences and amplification and/or sequencing primer binding sites, where the template sequences may be attached to target analytes (for nucleic acid analytes) using, e.g., a padlock or other circularizable probe, and amplified using, e.g., rolling circle amplification.
- the amplified template sequences (comprising barcode sequences) are then probed through a cyclic series of single-base addition primer extension reactions that use detectably-labeled, e.g., fluorescently-labeled, nucleotides to identify the sequence of bases in the template sequences, where the fluorescently-labeled nucleotides are typically blocked at the 3’-OH group with a reversible terminator moiety.
- detectably-labeled e.g., fluorescently-labeled
- the cyclical sequence process thus comprises repeating the steps of (i) contacting a primed template sequence (i.e., a template sequence comprising a bound primer strand having a free 3 ’-OH group) with a mixture of fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides and a polymerase to enable incorporation of a nucleotide that is complementary to a nucleotide in the template sequence into an extended primer strand, (ii) washing away any unbound nucleotides and polymerase molecules, (iii) imaging the sample (e.g., the surface of a flow cell to which the amplified template sequences are attached, or a tissue sample within which the amplified template sequences are distributed), and (iv) deprotecting the 3’ end of the extended primer strand to remove the reversible terminator moiety and cleaving off the fluorophore, thereby enabling initiation of the next cycle.
- a primed template sequence
- the mixture of nucleotides (e.g., fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides) used in each cycle may be the same. In some instances, the mixture of nucleotides (e.g., fluorescently-labeled, 3 ’-OH reversibly-terminated nucleotides) used in one or more cycles may be different from that used in one or more different cycles.
- all of the nucleotides (e.g., detectably-labeled, 3’-OH reversibly-terminated nucleotides) in the mixture of nucleotides may be labeled with a detectable label (e.g., a fluorophore), where different nucleotides in the mixture are labeled with different detectable labels.
- a detectable label e.g., a fluorophore
- only a subset of the nucleotides (e.g., detectably-labeled, 3 ’-OH reversibly-terminated nucleotides) in the mixture of nucleotides may be labeled with a detectable label (e.g., a fluorophore), where different nucleotides in the subset are labeled with different detectable labels.
- the subset of nucleotide (e.g., detectably-labeled, 3’-OH reversibly-terminated nucleotides) may comprise, e.g., one, two, or three of A, T/U, G, and C.
- the “sequencing-by-ligation” (SBL) approach uses a DNA ligase to identify the nucleotide present at a given position in a template sequence. Unlike sequencing-by- synthesis approaches, this method does not use a DNA polymerase to perform primer extension. Instead, the mismatch sensitivity of a DNA ligase enzyme is used to determine the underlying sequence of the template nucleic acid molecule (see, e.g., EP0703991).
- the "sequencing-by-binding" (SBB) approach is based on performing repetitive cycles of detecting a stabilized complex that forms at each position along the template sequence (e.g., a ternary complex that includes the primed template, a polymerase, and a cognate nucleotide for the position), under conditions that prevent covalent incorporation of the cognate nucleotide into the primer, and then extending the primer to allow detection of the next position along the template (see, e.g., U.S. Pat. Nos. 9,951,385 and 10,655,176).
- detection of the nucleotide at each position of the template occurs prior to extension of the primer to the next position.
- the methodology is used to distinguish the four different nucleotide types that can be present at positions along a nucleic acid template by uniquely labelling each type of ternary complex (i.e., different types of ternary complexes differing in the type of nucleotide it contains) or by separately delivering the reagents needed to form each type of ternary complex.
- the labeling may comprise fluorescence labeling of, e.g., the cognate nucleotide or the polymerase that participates in the ternary complex.
- the "sequencing-by-avidity" (or SB A) approach relies on the increased avidity ( or “functional affinity") derived from forming a complex comprising a plurality of individual non-covalent binding interactions (see, e.g., U.S. Pat. Nos. 10,768,173 and 10,982,280).
- the sequencing-by-avidity approach is based on the detection of a multivalent binding complex formed between a fluorescently-labeled polymer-nucleotide conjugate, a polymerase, and a plurality of primed target nucleic acid molecules, which allows the detection/base calling step to be separated from the nucleotide incorporation step. Fluorescence imaging is used to detect the bound complex and thereby determine the identity of the N + 1 nucleotide in the target nucleic acid sequence (where the primer extension strand is N nucleotides in length).
- the disclosed methods may comprise using one or more nucleotides or analogs thereof, including a native nucleotide or a nucleotide analog or modified nucleotide (e.g., labeled with one or more detectable labels).
- a nucleotide analog comprises a nitrogenous base, five-carbon sugar, and phosphate group, wherein any component of the nucleotide may be modified and/or replaced.
- a method disclosed herein may comprise using one or more non-incorporable nucleotides. Non-incorporable nucleotides may be modified to become incorporable at any point during the sequencing method.
- Nucleotide analogs include, but are not limited to, alpha-phosphate modified nucleotides, alpha-beta nucleotide analogs, beta-phosphate modified nucleotides, beta-gamma nucleotide analogs, gamma-phosphate modified nucleotides, caged nucleotides, or ddNTPs. Examples of nucleotide analogs are described in U.S. Patent No. 8,071,755, which is incorporated by reference herein in its entirety.
- a method disclosed herein may comprise using terminators that reversibly prevent nucleotide incorporation at the 3 '-end of the primer.
- One type of reversible terminator is a 3'-O-blocked reversible terminator.
- the terminator moiety is linked to the oxygen atom of the 3'-OH end of the 5-carbon sugar of a nucleotide.
- U.S. Patent Nos. 7,544,794 and 8,034,923 (the disclosures of these patents are incorporated by reference) describe reversible terminator dNTPs having the 3'-OH group replaced by a 3'-ONH2 group.
- reversible terminator is a 3 '-unblocked reversible terminator, wherein the terminator moiety is linked to the nitrogenous base of a nucleotide.
- U.S. Patent No. 8,808,989 discloses particular examples of base-modified reversible terminator nucleotides that may be used in connection with the methods described herein.
- Other reversible terminators that similarly can be used in connection with the methods described herein include those described in U.S. Patent Nos. 7,956,171, 8,071,755, and 9,399,798, herein incorporated by reference.
- a method disclosed herein may comprise using nucleotide analogs having terminator moieties that irreversibly prevent nucleotide incorporation at the 3 '-end of the primer.
- Irreversible nucleotide analogs include 2', 3'-dideoxynucleotides, ddNTPs (ddGTP, ddATP, ddTTP, ddCTP). Dideoxynucleotides lack the 3'-OH group of dNTPs that is essential for polymerase-mediated synthesis.
- a method disclosed herein may comprise using non- incorporable nucleotides comprising a blocking moiety that inhibits or prevents the nucleotide from forming a covalent linkage to a second nucleotide (3'-OH of a primer) during the incorporation step of a nucleic acid polymerization reaction.
- the blocking moiety can be removed from the nucleotide, allowing for nucleotide incorporation.
- a method disclosed herein may comprise using 1, 2, 3, 4 or more nucleotide analogs.
- a nucleotide analog is replaced, diluted, or sequestered during an incorporation step.
- a nucleotide analog is replaced with a native nucleotide.
- a nucleotide analog is modified during an incorporation step. The modified nucleotide analog can be similar to or the same as a native nucleotide.
- a method disclosed herein may comprise using a nucleotide analog having a different binding affinity for a polymerase than a native nucleotide.
- a nucleotide analog has a different interaction with a next base than a native nucleotide.
- Nucleotide analogs and/or non-incorporable nucleotides may base-pair with a complementary base of a template nucleic acid.
- Any suitable enzyme having a polymerase activity can be used in the sequencing reactions described herein, and exemplary polymerases include, but are not limited to, bacterial DNA polymerases, eukaryotic DNA polymerases, archaeal DNA polymerases, viral DNA polymerases and phage DNA polymerases.
- Bacterial DNA polymerases include E. coli DNA polymerases I, II and III, IV and V, the Klenow fragment of E. coli DNA polymerase, Clostridium stercorarium (Cst) DNA polymerase, Clostridium thermocellum (Cth) DNA polymerase and Sulfolobus solfataricus (Sso) DNA polymerase.
- Eukaryotic DNA polymerases include DNA polymerases a, P, y, 5, e, q, , c. p, and K, as well as the Revl polymerase (terminal deoxycytidyl transferase) and terminal deoxynucleotidyl transferase (TdT).
- Viral DNA polymerases include T4 DNA polymerase, phi-29 DNA polymerase, GA-1, phi-29-like DNA polymerases, PZA DNA polymerase, phi- 15 DNA polymerase, Cpl DNA polymerase, Cp7 DNA polymerase, T7 DNA polymerase, and T4 polymerase.
- DNA polymerases include thermostable and/or thermophilic DNA polymerases such as DNA polymerases isolated from Thermits aquaticus (Taq) DNA polymerase, Thermits filiformis (Tfi) DNA polymerase, Thermococcus zilligi (Tzi) DNA polymerase, Thermus thermophilus (Tth) DNA polymerase, Thermus flavusu (Tfl) DNA polymerase, Pyrococcus woesei (Pwo) DNA polymerase, Pyrococcus furiosus (Pfu) DNA polymerase and Turbo Pfu DNA polymerase, Thermococcus litoralis (Tli) DNA polymerase, Pyrococcus sp.
- Taq Thermits aquaticus
- Tfi Thermits filiformis
- Tzi Thermococcus zilligi
- Tzi Thermus thermophilus
- Tth Thermus flavusu
- Pwo Pyrococc
- GB-D polymerase Thermotoga maritima (Tma) DNA polymerase, Bacillus stearothermophilus (Bst) DNA polymerase, Pyrococcus Kodakaraensis (KOD) DNA polymerase, Pfx DNA polymerase, Thermococcus sp. JDF-3 (JDF-3) DNA polymerase, Thermococcus gorgonarius (Tgo) DNA polymerase, Thermococcus acidophilium DNA polymerase; Sulfolobus acidocaldarius DNA polymerase; Thermococcus sp.
- modified versions of the extremely thermophilic marine archaea Thermococcus species 9° N can be used.
- Still other useful DNA polymerases, including the 3PDX polymerase are disclosed in U.S. Patent No. 8,703,461, the disclosure of which is incorporated by reference in its entirety.
- RNA polymerases such as T7 RNA polymerase, T3 polymerase, SP6 polymerase, and Kl l polymerase
- Eukaryotic RNA polymerases such as RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, and RNA polymerase V
- Archaea RNA polymerase HIV-1 reverse transcriptase from human immunodeficiency virus type 1 (PDB 1HMV), HIV-2 reverse transcriptase from human immunodeficiency virus type 2, M-MLV reverse transcriptase from the Moloney murine leukemia virus, AMV reverse transcriptase from the avian myeloblastosis virus, and Telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes.
- PDB 1HMV human immunodeficiency virus type 1
- HIV-2 reverse transcriptase from human immunodeficiency virus type 2
- one or more nucleotides can be labeled with distinguishing and/or detectable tags or labels.
- the tags may be distinguishable by means of their differences in fluorescence, Raman spectrum, charge, mass, refractive index, luminescence, length, or any other measurable property.
- the tag may be attached to one or more different positions on the nucleotide, so long as the fidelity of binding to the polymerase-nucleic acid complex is sufficiently maintained to enable identification of the complementary base on the template nucleic acid correctly.
- the tag is attached to the nucleobase of the nucleotide.
- a tag is attached to the gamma phosphate position of the nucleotide.
- Detectable labels can be suitable for small scale detection and/or suitable for high- throughput screening.
- suitable detectable labels include, but are not limited to, radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes.
- the detectable label can be qualitatively detected (e.g., optically or spectrally), or it can be quantified.
- Qualitative detection generally includes a detection method in which the existence or presence of the detectable label is confirmed, whereas quantifiable detection generally includes a detection method having a quantifiable (e.g., numerically reportable) value such as an intensity, duration, polarization, and/or other properties.
- the detectable label is bound to another moiety, for example, a nucleotide or nucleotide analog, and can include a fluorescent, a colorimetric, or a chemiluminescent label.
- a detectable label can be attached to another moiety, for example, a nucleotide or nucleotide analog.
- one or more nucleotides can be labeled with a cleavable detectable tag or label.
- the non-terminating fluorescently labeled nucleotides can include a DBCO-nucleotide conjugated to fluorescent compound with a disulfide linker.
- a non-terminating fluorescently labeled nucleotide is incorporated into the strand without termination, and after imaging, the linker can be cleaved to remove fluorescent label.
- a DBCO-nucleotide e.g., 5-DBCO-PEG4-UTP
- a click reaction with the cleavable linker conjugated to a fluorescent label (e.g., cleavable linker- ATTO647N), and a disulfide group can be cleaved by tris(2-carboxyethyl)phosphine (TCEP) reduction together with 3’-O-azidomethyl- dNTP.
- TCEP tris(2-carboxyethyl)phosphine
- the detectable label is a fluorophore.
- the fluorophore can be from a group that includes: 7-AAD (7- Aminoactinomycin D), Acridine Orange (+DNA), Acridine Orange (+RNA), Alexa Fluor® 350, Alexa Fluor® 430, Alexa Fluor® 488, Alexa Fluor® 532, Alexa Fluor® 546, Alexa Fluor® 555, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor® 633, Alexa Fluor® 647, Alexa Fluor® 660, Alexa Fluor® 680, Alexa Fluor® 700, Alexa Fluor® 750, Allophycocyanin (APC), AMCA / AMCA-X, 7- Aminoactinomycin D (7-AAD), 7- Amino-4-methylcoumarin, 6- Aminoquinoline, Aniline Blue, ANS, APC-Cy7, ATTO-TAGTM CBQC
- the detectable label can be directly detectable by itself (e.g., radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, can be indirectly detectable, e.g., by catalyzing chemical alterations of a substrate compound or composition, which substrate compound or composition is directly detectable.
- the label can emit a signal or alter a signal delivered to the label so that the presence or absence of the label can be detected.
- coupling may be via a linker, which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically-cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g., via an esterase, lipase, peptidase, or protease).
- a linker which may be cleavable, such as photo-cleavable (e.g., cleavable under ultra-violet light), chemically-cleavable (e.g., via a reducing agent, such as dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP)) or enzymatically cleavable (e.g.,
- Fluorescence detection in tissue samples can often be hindered by the presence of strong background fluorescence.
- “Autofluorescence” is the general term used to distinguish background fluorescence (that can arise from a variety of sources, including aldehyde fixation, extracellular matrix components, red blood cells, lipofuscin, and the like) from the desired immunofluorescence from the fluorescently labeled antibodies or probes. Tissue autofluorescence can lead to difficulties in distinguishing the signals due to fluorescent antibodies or probes from the general background.
- a method disclosed herein utilizes one or more agents to reduce tissue autofluorescence, for example, Autofluorescence Eliminator (Sigma/EMD Millipore), TrueBlack Lipofuscin Autofluorescence Quencher (Biotium), MaxBlock Autofluorescence Reducing Reagent Kit (MaxVision Biosciences), and/or a very intense black dye (e.g., Sudan Black, or comparable dark chromophore).
- Autofluorescence Eliminator Sigma/EMD Millipore
- Biotium TrueBlack Lipofuscin Autofluorescence Quencher
- MaxBlock Autofluorescence Reducing Reagent Kit MaxVision Biosciences
- a very intense black dye e.g., Sudan Black, or comparable dark chromophore
- fluorescent labels and nucleotides and/or polynucleotides conjugated to such fluorescent labels comprise those described in, for example, Hoagland, Handbook of Fluorescent Probes and Research Chemicals, Ninth Edition (Molecular Probes, Inc., Eugene, 2002); Keller and Manak, DNA Probes, 2nd Edition (Stockton Press, New York, 1993); Eckstein, editor, Oligonucleotides and Analogues: A Practical Approach (IRL Press, Oxford, 1991); and Wetmur, Critical Reviews in Biochemistry and Molecular Biology, 26:227- 259 (1991).
- exemplary techniques and methods methodologies applicable to the provided embodiments comprise those described in, for example, US 4,757,141, US 5,151,507 and US 5,091,519.
- one or more fluorescent dyes are used as labels for labeled target sequences, for example, as described in US 5,188,934 (4,7- dichlorofluorescein dyes); US 5,366,860 (spectrally resolvable rhodamine dyes); US 5,847,162 (4,7- dichlororhodamine dyes); US 4,318,846 (ether-substituted fluorescein dyes); US 5,800,996 (energy transfer dyes); US 5,066,580 (xanthine dyes); and US 5,688,648 (energy transfer dyes).
- fluorescent label comprises a signaling moiety that conveys information through the fluorescent absorption and/or emission properties of one or more molecules.
- Exemplary fluorescent properties comprise fluorescence intensity, fluorescence lifetime, emission spectrum characteristics and energy transfer.
- the detection is carried out using any of a number of different types of microscopy, e.g., confocal microscopy, two-photon microscopy, light-field microscopy, intact tissue expansion microscopy, and/or CLARITYTM-optimized light sheet microscopy (COLM).
- confocal microscopy e.g., confocal microscopy, two-photon microscopy, light-field microscopy, intact tissue expansion microscopy, and/or CLARITYTM-optimized light sheet microscopy (COLM).
- fluorescence microscopy is used for detection and imaging of the sample.
- a fluorescence microscope is an optical microscope that uses fluorescence and phosphorescence instead of, or in addition to, reflection and absorption to study properties of organic or inorganic substances.
- fluorescence microscopy a sample is illuminated with light of a wavelength which excites fluorescence in the sample. The fluoresced light, which is usually at a longer wavelength than the illumination, is then imaged through a microscope objective.
- Two filters may be used in this technique; an illumination (or excitation) filter which ensures the illumination is near monochromatic and at the correct wavelength, and a second emission (or barrier) filter which ensures none of the excitation light source reaches the detector.
- the "fluorescence microscope” comprises any microscope that uses fluorescence to generate an image, whether it is a more simple set up like an epifluorescence microscope, or a more complicated design such as a confocal microscope, which uses optical sectioning to get better resolution of the fluorescent image.
- confocal microscopy is used for detection and imaging of the sample.
- Confocal microscopy uses point illumination and a pinhole in an optically conjugate plane in front of the detector to eliminate out-of-focus signal.
- the image's optical resolution is much better than that of wide-field microscopes.
- this increased resolution is at the cost of decreased signal intensity - so long exposures are often required.
- CLARITYTM-optimized light sheet microscopy provides an alternative microscopy for fast 3D imaging of large clarified samples. COLM interrogates large immunostained tissues, permits increased speed of acquisition and results in a higher quality of generated data.
- microscopy Other types of microscopy that can be employed comprise bright field microscopy, oblique illumination microscopy, dark field microscopy, phase contrast, differential interference contrast (DIC) microscopy, interference reflection microscopy (also known as reflected interference contrast, or RIC), single plane illumination microscopy (SPIM), super-resolution microscopy, laser microscopy, electron microscopy (EM), Transmission electron microscopy (TEM), Scanning electron microscopy (SEM), reflection electron microscopy (REM), Scanning transmission electron microscopy (STEM) and low- voltage electron microscopy (LVEM), scanning probe microscopy (SPM), atomic force microscopy (ATM), ballistic electron emission microscopy (BEEM), chemical force microscopy (CFM), conductive atomic force microscopy (C- AFM), electrochemical scanning tunneling microscope (ECSTM), electrostatic force microscopy (EFM), fluidic force microscope (FluidFM), force modulation microscopy (FMM), feature-oriented scanning probe microscopy (FOSPM),
- a method herein comprises subjecting the sample to expansion microscopy methods and techniques. Expansion allows individual targets (e.g., mRNA or RNA transcripts) which are densely packed within a cell, to be resolved spatially in a high-throughput manner. Expansion microscopy techniques are known in the art and can be performed as described in US 2016/0116384 and Chen et al., Science, 347, 543 (2015), each of which are incorporated herein by reference in their entirety.
- the method does not comprise subjecting the sample to expansion microscopy. In some embodiments, the method does not comprise dissociating a cell from the sample such as a tissue or the cellular microenvironment. In some embodiments, the method does not comprise lysing the sample or cells therein. In some embodiments, the method does not comprise embedding the sample or molecules from the sample in an exogenous matrix.
- analysis is performed on one or more images captured, and may comprise processing the image(s) and/or quantifying signals observed.
- images of signals from different fluorescent channels and/or nucleotide incorporation cycles can be compared and analyzed.
- images of signals (or absence thereof) at a particular location in a sample from different fluorescent channels and/or sequential incorporation cycles can be aligned to analyze an analyte at the location. For instance, a particular location in a sample can be tracked and signal spots from sequential incorporation cycles can be analyzed to detect a target polynucleotide sequence (e.g., a barcode sequence or subsequence thereof) in an analyte at the location.
- a target polynucleotide sequence e.g., a barcode sequence or subsequence thereof
- the analysis may comprise processing information of one or more cell types, one or more types of analytes, a number or level of analyte, and/or a number or level of cells detected in a particular region of the sample.
- the analysis comprises detecting a sequence e.g., a barcode sequence present in an amplification product at a location in the sample.
- the number of signals detected in a unit area in the biological sample is quantified.
- the signals detected at a corresponding position in the biological sample in a plurality of images taken at different z positions is quantified and analyzed.
- Methods and compositions disclosed herein may be used for analyzing a biological sample, which may be obtained from a subject using any of a variety of techniques including, but not limited to, biopsy, surgery, and laser capture microscopy (LCM), and generally includes cells and/or other biological material from the subject.
- a biological sample can also be obtained from a eukaryote, such as a tissue sample, a patient derived organoid (PDO) or patient derived xenograft (PDX).
- a biological sample from an organism may comprise one or more other organisms or components therefrom.
- a mammalian tissue section may comprise a prion, a viroid, a virus, a bacterium, a fungus, or components from other organisms, in addition to mammalian cells and non-cellular tissue components.
- Subjects from which biological samples can be obtained can be healthy or asymptomatic individuals, individuals that have or are suspected of having a disease (e.g., a patient with a disease such as cancer) or a pre-disposition to a disease, and/or individuals in need of therapy or suspected of needing therapy.
- a disease e.g., a patient with a disease such as cancer
- a pre-disposition to a disease e.g., a pre-disposition to a disease
- the biological sample corresponds to cells (e.g., derived from a cell culture, a tissue sample, or cells deposited on a surface).
- cells e.g., derived from a cell culture, a tissue sample, or cells deposited on a surface.
- individual cells can be naturally unaggregated.
- the cells can be derived from a suspension of cells (e.g., a body fluid such as blood) and/or disassociated or disaggregated cells from a tissue or tissue section.
- the number of cells in the biological sample can vary.
- Some biological samples comprise large numbers of cells, e.g., blood samples, while other biological samples comprise smaller or only a small number of cells or may only be suspected of containing cells, e.g., plasma, serum, urine, saliva, synovial fluids, amniotic fluid, lachrymal fluid, lymphatic fluid, liquor, cerebrospinal fluid and the like.
- a cell-containing biological sample can comprise a body fluid or a cell-containing sample derived from the body fluid, e.g., whole blood, samples derived from blood such as plasma or serum, buffy coat, urine, sputum, lachrymal fluid, lymphatic fluid, sweat, liquor, cerebrospinal fluid, ascites, milk, stool, bronchial lavage, saliva, amniotic fluid, nasal secretions, vaginal secretions, semen/seminal fluid, wound secretions, cell culture and swab samples, or any cell-containing sample derived from the aforementioned samples.
- a body fluid or a cell-containing sample derived from the body fluid e.g., whole blood, samples derived from blood such as plasma or serum, buffy coat, urine, sputum, lachrymal fluid, lymphatic fluid, sweat, liquor, cerebrospinal fluid, ascites, milk, stool, bronchial lavage, saliva, amniotic fluid, nasal secretions, vaginal secretions,
- a cell-containing biological sample can be a body fluid, a body secretion or body excretion, e.g., lymphatic fluid, blood, buffy coat, plasma or serum.
- a cell-containing biological sample can be a circulating body fluid such as blood or lymphatic fluid, e.g., peripheral blood obtained from a mammal such as human.
- the biological sample can include any number of macromolecules, for example, cellular macromolecules and organelles (e.g., mitochondria and nuclei).
- the biological sample can be obtained as a tissue sample, such as a tissue section, a cell pellet, a cell block, a biopsy, a core biopsy, needle aspirate, or fine needle aspirate.
- the sample can be a fluid sample, such as a blood sample, urine sample, or saliva sample.
- the sample can be a skin sample, a colon sample, a cheek swab, a histology sample, a histopathology sample, a plasma or serum sample, a tumor sample, living cells, cultured cells, a clinical sample such as, for example, whole blood or blood-derived products, blood cells, or cultured tissues or cells, including cell suspensions.
- the biological sample may comprise cells which are deposited on a surface.
- the biological sample may comprises transcripts of antigen receptor molecules.
- the biological sample comprises analytes from any of the sources described herein deposited on a surface.
- Bio samples can be derived from a homogeneous culture or population of the subjects or organisms mentioned herein or alternatively from a collection of several different organisms, for example, in a community or ecosystem.
- Biological samples can include one or more diseased cells.
- a diseased cell can have altered metabolic properties, gene expression, protein expression, and/or morphologic features. Examples of diseases include inflammatory disorders, metabolic disorders, nervous system disorders, and cancer. Cancer cells can be derived from solid tumors, hematological malignancies, cell lines, or obtained as circulating tumor cells. Biological samples can also include fetal cells and immune cells.
- Biological samples can include analytes (e.g., protein, RNA, and/or DNA) embedded in a 3D matrix.
- amplicons e.g., rolling circle amplification products
- analytes e.g., protein, RNA, and/or DNA
- a 3D matrix may comprise a network of natural molecules and/or synthetic molecules that are chemically and/or enzymatically linked, e.g., by crosslinking.
- a 3D matrix may comprise a synthetic polymer.
- a 3D matrix comprises a hydrogel.
- a substrate herein can be any support that is insoluble in aqueous liquid and which allows for positioning of biological samples, analytes, features, and/or reagents on the support.
- a biological sample can be attached to a substrate. Attachment of the biological sample can be irreversible or reversible, depending upon the nature of the sample and subsequent steps in the analytical method.
- the sample can be attached to the substrate reversibly by applying a suitable polymer coating to the substrate, and contacting the sample to the polymer coating. The sample can then be detached from the substrate, e.g., using an organic solvent that at least partially dissolves the polymer coating. Hydrogels are examples of polymers that are suitable for this purpose.
- the substrate can be coated or functionalized with one or more substances to facilitate attachment of the sample to the substrate.
- Suitable substances that can be used to coat or functionalize the substrate include, but are not limited to, lectins, poly-lysine, antibodies, and polysaccharides.
- a biological sample can be harvested from a subject (e.g., via surgical biopsy, whole subject sectioning) or grown in vitro on a growth substrate or culture dish as a population of cells, and prepared for analysis as a tissue slice or tissue section. Grown samples may be sufficiently thin for analysis without further processing steps. Alternatively, grown samples, and samples obtained via biopsy or sectioning, can be prepared as thin tissue sections using a mechanical cutting apparatus such as a vibrating blade microtome. As another alternative, in some embodiments, a thin tissue section can be prepared by applying a touch imprint of a biological sample to a suitable substrate material.
- the thickness of the tissue section can be a fraction of (e.g., less than 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, or 0.1) the maximum cross-sectional dimension of a cell.
- tissue sections having a thickness that is larger than the maximum cross-section cell dimension can also be used.
- cryostat sections can be used, which can be, e.g., 10-20 pm thick.
- the thickness of a tissue section typically depends on the method used to prepare the section and the physical characteristics of the tissue, and therefore sections having a wide variety of different thicknesses can be prepared and used.
- the thickness of the tissue section can be at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 1.0, 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 20, 30, 40, or 50 pm.
- Thicker sections can also be used if desired or convenient, e.g., at least 70, 80, 90, or 100 pm or more.
- the thickness of a tissue section is between 1-100 pm, 1-50 pm, 1-30 pm, 1-25 pm, 1-20 pm, 1-15 pm, 1- 10 pm, 2-8 pm, 3-7 pm, or 4-6 pm, but as mentioned above, sections with thicknesses larger or smaller than these ranges can also be analyzed.
- Multiple sections can also be obtained from a single biological sample.
- multiple tissue sections can be obtained from a surgical biopsy sample by performing serial sectioning of the biopsy sample using a sectioning blade. Spatial information among the serial sections can be preserved in this manner, and the sections can be analyzed successively to obtain three-dimensional information about the biological sample.
- the biological sample (e.g., a tissue section as described above) can be prepared by deep freezing at a temperature suitable to maintain or preserve the integrity (e.g., the physical characteristics) of the tissue structure.
- the frozen tissue sample can be sectioned, e.g., thinly sliced, onto a substrate surface using any number of suitable methods.
- a tissue sample can be prepared using a chilled microtome (e.g., a cryostat) set at a temperature suitable to maintain both the structural integrity of the tissue sample and the chemical properties of the nucleic acids in the sample.
- a temperature can be, e.g., less than -15°C, less than -20°C, or less than -25°C.
- the biological sample can be prepared using formalinfixation and paraffin-embedding (FFPE), which are established methods.
- FFPE formalinfixation and paraffin-embedding
- cell suspensions and other non-tissue samples can be prepared using formalinfixation and paraffin-embedding.
- the sample can be sectioned as described above.
- the paraffin-embedding material can be removed from the tissue section (e.g., deparaffinization) by incubating the tissue section in an appropriate solvent (e.g., xylene) followed by a rinse (e.g., 99.5% ethanol for 2 minutes, 96% ethanol for 2 minutes, and 70% ethanol for 2 minutes).
- a biological sample can be fixed in any of a variety of other fixatives to preserve the biological structure of the sample prior to analysis.
- a sample can be fixed via immersion in ethanol, methanol, acetone, paraformaldehyde (PFA)-Triton, and combinations thereof.
- acetone fixation is used with fresh frozen samples, which can include, but are not limited to, cortex tissue, mouse olfactory bulb, human brain tumor, human post-mortem brain, and breast cancer samples.
- pre-permeabilization steps may not be performed.
- acetone fixation can be performed in conjunction with permeabilization steps.
- the methods provided herein comprises one or more postfixing (also referred to as postfixation) steps.
- one or more post-fixing step is performed after contacting a sample with a polynucleotide disclosed herein, e.g., one or more probes such as a circular or padlock probe.
- one or more postfixing step is performed after a hybridization complex comprising a probe and a target is formed in a sample.
- one or more post-fixing step is performed prior to a ligation reaction disclosed herein, such as the ligation to circularize a padlock probe.
- one or more post-fixing step is performed after contacting a sample with a binding or labelling agent (e.g., an antibody or antigen binding fragment thereof) for a non-nucleic acid analyte such as a protein analyte.
- the labelling agent can comprise a nucleic acid molecule (e.g., reporter oligonucleotide) comprising a sequence corresponding to the labelling agent and therefore corresponds to (e.g., uniquely identifies) the analyte.
- the labelling agent can comprise a reporter oligonucleotide comprising one or more barcode sequences.
- a post-fixing step may be performed using any suitable fixation reagent disclosed herein, for example, 3% (w/v) paraformaldehyde in DEPC-PBS. (iv) Embedding
- a biological sample can be embedded in any of a variety of other embedding materials to provide structural substrate to the sample prior to sectioning and other handling steps.
- the embedding material can be removed e.g., prior to analysis of tissue sections obtained from the sample.
- suitable embedding materials include, but are not limited to, waxes, resins (e.g., methacrylate resins), epoxies, and agar.
- the biological sample can be embedded in a matrix (e.g., a hydrogel matrix). Embedding the sample in this manner typically involves contacting the biological sample with a hydrogel such that the biological sample becomes surrounded by the hydrogel.
- a hydrogel matrix e.g., a hydrogel matrix
- the sample can be embedded by contacting the sample with a suitable polymer material, and activating the polymer material to form a hydrogel.
- the hydrogel is formed such that the hydrogel is internalized within the biological sample.
- the biological sample is immobilized in the hydrogel via cross-linking of the polymer material that forms the hydrogel.
- Cross-linking can be performed chemically and/or photochemically, or alternatively by any other hydrogelformation method.
- composition and application of the hydrogel-matrix to a biological sample typically depends on the nature and preparation of the biological sample (e.g., sectioned, nonsectioned, type of fixation).
- the hydrogel-matrix can include a monomer solution and an ammonium persulfate (APS) initiator/tetramethylethylenediamine (TEMED) accelerator solution.
- APS ammonium persulfate
- TEMED tetramethylethylenediamine
- the biological sample consists of cells (e.g., cultured cells or cells disassociated from a tissue sample)
- the cells can be incubated with the monomer solution and APS/TEMED solutions.
- hydrogel-matrix gels are formed in compartments, including but not limited to devices used to culture, maintain, or transport the cells.
- hydrogelmatrices can be formed with monomer solution plus APS/TEMED added to the compartment to a depth ranging from about 0.1 pm to about 2 mm.
- biological samples can be stained using a wide variety of stains and staining techniques.
- a sample can be stained using any number of stains and/or immunohistochemical reagents.
- One or more staining steps may be performed to prepare or process a biological sample for an assay described herein or may be performed during and/or after an assay.
- the sample can be contacted with one or more nucleic acid stains, membrane stains (e.g., cellular or nuclear membrane), cytological stains, or combinations thereof.
- the stain may be specific to proteins, phospholipids, DNA (e.g., dsDNA, ssDNA), RNA, an organelle or compartment of the cell.
- the sample may be contacted with one or more labeled antibodies (e.g., a primary antibody specific for the analyte of interest and a labeled secondary antibody specific for the primary antibody).
- labeled antibodies e.g., a primary antibody specific for the analyte of interest and a labeled secondary antibody specific for the primary antibody.
- cells in the sample can be segmented using one or more images taken of the stained sample.
- the stain is performed using a lipophilic dye.
- the staining is performed with a lipophilic carbocyanine or aminostyryl dye, or analogs thereof (e.g, Dil, DiO, DiR, DiD).
- a lipophilic carbocyanine or aminostyryl dye or analogs thereof (e.g, Dil, DiO, DiR, DiD).
- Other cell membrane stains may include FM and RH dyes or immunohistochemical reagents specific for cell membrane proteins.
- the stain may include but is not limited to, acridine orange, acid fuchsin, Bismarck brown, carmine, coomassie blue, cresyl violet, DAPI, eosin, ethidium bromide, acid fuchsine, haematoxylin, Hoechst stains, iodine, methyl green, methylene blue, neutral red, Nile blue, Nile red, osmium tetroxide, ruthenium red, propidium iodide, rhodamine (e.g., rhodamine B), or safranine, or derivatives thereof.
- the sample may be stained with haematoxylin and eosin (H&E).
- the sample can be stained using hematoxylin and eosin (H&E) staining techniques, using Papanicolaou staining techniques, Masson’s trichrome staining techniques, silver staining techniques, Sudan staining techniques, and/or using Periodic Acid Schiff (PAS) staining techniques.
- HPA staining is typically performed after formalin or acetone fixation.
- the sample can be stained using Romanowsky stain, including Wright’s stain, Jenner’s stain, Can-Grunwald stain, Leishman stain, and Giemsa stain.
- biological samples can be destained. Methods of destaining or discoloring a biological sample generally depend on the nature of the stain(s) applied to the sample. For example, in some embodiments, one or more immunofluorescent stains are applied to the sample via antibody coupling. Such stains can be removed using techniques such as cleavage of disulfide linkages via treatment with a reducing agent and detergent washing, chaotropic salt treatment, treatment with antigen retrieval solution, and treatment with an acidic glycine buffer. Methods for multiplexed staining and destaining are described, for example, in Bolognesi et al., J. Histochem. Cytochem.
- a biological sample embedded in a matrix can be isometrically expanded.
- Isometric expansion methods that can be used include hydration, a preparative step in expansion microscopy, as described in Chen et al., Science 347(6221):543-548, 2015.
- Isometric expansion can be performed by anchoring one or more components of a biological sample to a gel, followed by gel formation, proteolysis, and swelling.
- analytes in the sample, products of the analytes, and/or probes associated with analytes in the sample can be anchored to the matrix (e.g., hydrogel).
- Isometric expansion of the biological sample can occur prior to immobilization of the biological sample on a substrate, or after the biological sample is immobilized to a substrate.
- the isometrically expanded biological sample can be removed from the substrate prior to contacting the substrate with probes disclosed herein.
- the steps used to perform isometric expansion of the biological sample can depend on the characteristics of the sample (e.g., thickness of tissue section, fixation, cross-linking), and/or the analyte of interest (e.g., different conditions to anchor RNA, DNA, and protein to a gel).
- proteins in the biological sample are anchored to a swellable gel such as a polyelectrolyte gel.
- An antibody can be directed to the protein before, after, or in conjunction with being anchored to the swellable gel.
- DNA and/or RNA in a biological sample can also be anchored to the swellable gel via a suitable linker.
- linkers include, but are not limited to, 6-((Acryloyl)amino) hexanoic acid (Acryloyl-X SE) (available from ThermoFisher, Waltham, MA), Label-IT Amine (available from MirusBio, Madison, WI) and Label X (described for example in Chen et al., Nat. Methods 13:679-684, 2016, the entire contents of which are incorporated herein by reference).
- Acryloyl-X SE 6-((Acryloyl)amino) hexanoic acid
- Label-IT Amine available from MirusBio, Madison, WI
- Label X described for example in Chen et al., Nat. Methods 13:679-684, 2016, the entire contents of which are incorporated herein by reference).
- Isometric expansion of the sample can increase the spatial resolution of the subsequent analysis of the sample.
- the increased resolution in spatial profiling can be determined by comparison of an isometrically expanded sample with a sample that has not been isometrically expanded.
- a biological sample is isometrically expanded to a size at least 2x, 2. lx, 2.2x, 2.3x, 2.4x, 2.5x, 2.6x, 2.7x, 2.8x, 2.9x, 3x, 3. lx, 3.2x, 3.3x, 3.4x, 3.5x, 3.6x, 3.7x, 3.8x, 3.9x, 4x, 4. lx, 4.2x, 4.3x, 4.4x, 4.5x, 4.6x, 4.7x, 4.8x, or 4.9x its nonexpanded size.
- the sample is isometrically expanded to at least 2x and less than 20x of its non-expanded size.
- the biological sample is reversibly cross-linked prior to or during an in situ assay.
- the analytes, polynucleotides and/or amplification product (e.g., amplicon) of an analyte or a probe bound thereto can be anchored to a polymer matrix.
- the polymer matrix can be a hydrogel.
- one or more of the polynucleotide probe(s) and/or amplification product (e.g., amplicon) thereof can be modified to contain functional groups that can be used as an anchoring site to attach the polynucleotide probes and/or amplification product to a polymer matrix.
- a modified probe comprising oligo dT may be used to bind to mRNA molecules of interest, followed by reversible crosslinking of the mRNA molecules.
- a hydrogel may include a macromolecular polymer gel including a network. Within the network, some polymer chains can optionally be cross-linked, although crosslinking does not always occur.
- a hydrogel can include hydrogel subunits, such as, but not limited to, acrylamide, bis-acrylamide, polyacrylamide and derivatives thereof, poly(ethylene glycol) and derivatives thereof (e.g., PEG-acrylate (PEG-DA), PEG-RGD), gelatin- methacryloyl (GelMA), methacrylated hyaluronic acid (MeHA), polyaliphatic polyurethanes, polyether polyurethanes, polyester polyurethanes, polyethylene copolymers, polyamides, polyvinyl alcohols, polypropylene glycol, poly tetramethylene oxide, polyvinyl pyrrolidone, polyacrylamide, poly (hydroxy ethyl acrylate), and poly (hydroxy ethyl meth
- a hydrogel includes a hybrid material, e.g., the hydrogel material includes elements of both synthetic and natural polymers.
- the hydrogel material includes elements of both synthetic and natural polymers. Examples of suitable hydrogels are described, for example, in U.S. Patent Nos. 6,391,937, 9,512,422, and 9,889,422, and in U.S. Patent Application Publication Nos. 2017/0253918, 2018/0052081 and 2010/0055733, the entire contents of each of which are incorporated herein by reference.
- the hydrogel can form the substrate.
- the substrate includes a hydrogel and one or more second materials.
- the hydrogel is placed on top of one or more second materials.
- the hydrogel can be pre-formed and then placed on top of, underneath, or in any other configuration with one or more second materials.
- hydrogel formation occurs after contacting one or more second materials during formation of the substrate. Hydrogel formation can also occur within a structure (e.g., wells, ridges, projections, and/or markings) located on a substrate.
- hydrogel formation on a substrate occurs before, contemporaneously with, or after probes are provided to the sample.
- hydrogel formation can be performed on the substrate already containing the probes.
- hydrogel formation occurs within a biological sample.
- a biological sample e.g., tissue section
- hydrogel subunits are infused into the biological sample, and polymerization of the hydrogel is initiated by an external or internal stimulus.
- functionalization chemistry in which a hydrogel is formed within a biological sample, functionalization chemistry can be used.
- functionalization chemistry includes hydrogel-tissue chemistry (HTC).
- HTC hydrogel-tissue chemistry
- Any hydrogel-tissue backbone (e.g., synthetic or native) suitable for HTC can be used for anchoring biological macromolecules and modulating functionalization.
- Non-limiting examples of methods using HTC backbone variants include CLARITY, PACT, ExM, SWITCH and ePACT.
- hydrogel formation within a biological sample is permanent.
- biological macromolecules can permanently adhere to the hydrogel allowing multiple rounds of interrogation.
- hydrogel formation within a biological sample is reversible.
- additional reagents are added to the hydrogel subunits before, contemporaneously with, and/or after polymerization.
- additional reagents can include but are not limited to oligonucleotides (e.g., probes), endonucleases to fragment DNA, fragmentation buffer for DNA, DNA polymerase enzymes, dNTPs used to amplify the nucleic acid and to attach the barcode to the amplified fragments.
- Other enzymes can be used, including without limitation, RNA polymerase, ligase, proteinase K, and DNAse.
- Additional reagents can also include reverse transcriptase enzymes, including enzymes with terminal transferase activity, primers, and switch oligonucleotides.
- optical labels are added to the hydrogel subunits before, contemporaneously with, and/or after polymerization.
- HTC reagents are added to the hydrogel before, contemporaneously with, and/or after polymerization.
- a cell labelling agent is added to the hydrogel before, contemporaneously with, and/or after polymerization.
- a cell-penetrating agent is added to the hydrogel before, contemporaneously with, and/or after polymerization.
- Hydrogels embedded within biological samples can be cleared using any suitable method.
- electrophoretic tissue clearing methods can be used to remove biological macromolecules from the hydrogel-embedded sample.
- a hydrogel-embedded sample is stored before or after clearing of hydrogel, in a medium (e.g., a mounting medium, methylcellulose, or other semi-solid mediums).
- a method disclosed herein comprises de-crosslinking the reversibly cross-linked biological sample.
- the de-crosslinking does not need to be complete.
- only a portion of crosslinked molecules in the reversibly cross-linked biological sample are de-crosslinked and allowed to migrate.
- a biological sample can be permeabilized to facilitate transfer of species (such as probes) into the sample. If a sample is not permeabilized sufficiently, the amount of species (such as probes) in the sample may be too low to enable adequate analysis. Conversely, if the tissue sample is too permeable, the relative spatial relationship of the analytes within the tissue sample can be lost. Hence, a balance between permeabilizing the tissue sample enough to obtain good signal intensity while still maintaining the spatial resolution of the analyte distribution in the sample is desirable.
- a biological sample can be permeabilized by exposing the sample to one or more permeabilizing agents.
- Suitable agents for this purpose include, but are not limited to, organic solvents (e.g., acetone, ethanol, and methanol), cross-linking agents (e.g., paraformaldehyde), detergents (e.g., saponin, Triton X-100TM or Tween-20TM), and enzymes (e.g., trypsin, proteases).
- the biological sample can be incubated with a cellular permeabilizing agent to facilitate permeabilization of the sample. Additional methods for sample permeabilization are described, for example, in Jamur et al., Method Mol. Biol. 588:63-66, 2010, the entire contents of which are incorporated herein by reference. Any suitable method for sample permeabilization can generally be used in connection with the samples described herein.
- the biological sample can be permeabilized by adding one or more lysis reagents to the sample.
- suitable lysis agents include, but are not limited to, bioactive reagents such as lysis enzymes that are used for lysis of different cell types, e.g., gram positive or negative bacteria, plants, yeast, mammalian, such as lysozymes, achromopeptidase, lysostaphin, labiase, kitalase, lyticase, and a variety of other commercially available lysis enzymes.
- lysis agents can additionally or alternatively be added to the biological sample to facilitate permeabilization.
- surfactant-based lysis solutions can be used to lyse sample cells. Lysis solutions can include ionic surfactants such as, for example, sarcosyl and sodium dodecyl sulfate (SDS). More generally, chemical lysis agents can include, without limitation, organic solvents, chelating agents, detergents, surfactants, and chaotropic agents.
- the biological sample can be permeabilized by nonchemical permeabilization methods.
- Non-chemical permeabilization methods that can be used include, but are not limited to, physical lysis techniques such as electroporation, mechanical permeabilization methods (e.g., bead beating using a homogenizer and grinding balls to mechanically disrupt sample tissue structures), acoustic permeabilization (e.g., sonication), and thermal lysis techniques such as heating to induce thermal permeabilization of the sample.
- Additional reagents can be added to a biological sample to perform various functions prior to analysis of the sample.
- DNase and RNase inactivating agents or inhibitors such as proteinase K, and/or chelating agents such as EDTA, can be added to the sample.
- a method disclosed herein may comprise a step for increasing accessibility of a nucleic acid for binding, e.g., a denaturation step to open up DNA in a cell for hybridization by a probe.
- proteinase K treatment may be used to free up DNA with proteins bound thereto.
- RNA or cDNA is the analyte
- one or more RNA or cDNA analyte species of interest can be selectively enriched.
- one or more species of RNA or cDNA of interest can be selected by addition of one or more oligonucleotides to the sample.
- the additional oligonucleotide is a sequence used for priming a reaction by an enzyme (e.g., a polymerase).
- one or more primer sequences with sequence complementarity to one or more RNAs or cDNAs of interest can be used to amplify the one or more RNAs or cDNAs of interest, thereby selectively enriching these RNAs or cDNAs.
- a first and second probe that is specific for (e.g., specifically hybridizes to) each RNA or cDNA analyte are used.
- templated ligation is used to detect gene expression in a biological sample.
- An analyte of interest such as a protein
- a labelling agent or binding agent e.g., an antibody or epitope binding fragment thereof
- the binding agent is conjugated or otherwise associated with a reporter oligonucleotide comprising a reporter sequence that identifies the binding agent, can be targeted for analysis.
- Probes may be hybridized to the reporter oligonucleotide and ligated in a templated ligation reaction to generate a product for analysis.
- gaps between the probe oligonucleotides may first be filled prior to ligation, using, for example, Mu polymerase, DNA polymerase, RNA polymerase, reverse transcriptase, VENT polymerase, Taq polymerase, and/or any combinations, derivatives, and variants (e.g., engineered mutants) thereof.
- the assay can further include amplification of templated ligation products (e.g., by multiplex PCR).
- the analytes may be further enriched for in situ readout by immobilization at a location in the biological sample.
- the analytes may comprise one or more fragments that are specific to a location in the biological sample.
- RNA can be down-selected (e.g., removed) using any of a variety of methods.
- probes can be administered to a sample that selectively hybridize to ribosomal RNA (rRNA), thereby reducing the pool and concentration of rRNA in the sample.
- rRNA ribosomal RNA
- DSN duplex- specific nuclease treatment can remove rRNA (see, e.g., Archer, et al, Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage, BMC Genomics, 15 401, (2014), the entire contents of which are incorporated herein by reference).
- hydroxyapatite chromatography can remove abundant species (e.g., rRNA) (see, e.g., Vandemoot, V.A., cDNA normalization by hydroxyapatite chromatography to enrich transcriptome diversity in RNA-seq applications, Biotechniques, 53(6) 373-80, (2012), the entire contents of which are incorporated herein by reference).
- a biological sample may comprise one or a plurality of analytes of interest. Methods for performing multiplexed assays to analyze two or more different analytes in a single biological sample are provided.
- compositions and kits comprising any of the reagents for sequencing nucleic acids according to any of the embodiments described herein.
- Such compositions can comprise, but are not limited to, nucleic acid molecules, nucleotides conjugated to reversible labels such as fluorophores, nucleotides comprising reversible terminators, polymerases, chelators (e.g. EDTA), and salts and buffer solutions.
- kits for analyzing an analyte in a biological sample according to any of the methods described herein.
- kits may comprise, e.g., one or more reagents for detecting one or more target analytes, and instructions for performing one or more steps of the methods provided herein.
- the one or more reagents for performing the methods provided herein may include, e.g., nucleotides, modified nucleotides, polymerases and/or other enzymes, hybridization probes for detection, circularizable probes for amplification, nucleic acid primers, buffers, etc.
- kits may comprise one or more nucleotide mixtures comprising any combination of reversibly-terminated (e.g., 3’-OH reversibly terminated) and/or non-terminated nucleotides selected from A, T/U, G, and C.
- each terminated or non-terminated (e.g., 3 ’-OH reversibly terminated) nucleotide of a different base can be labeled with a different detectable label (e.g., a different fluorophore).
- a different detectable label e.g., a different fluorophore
- kits may further comprise one or more reagents required for one or more steps comprising hybridization, ligation, extension, amplification, detection, and/or sample preparation as described herein, including, for example, wash buffers and/or ligation buffers.
- the kit further comprises an enzyme such as a ligase and/or a polymerase described herein.
- the kit comprises a polymerase, for instance for performing extension of the primers and to incorporate nucleotides.
- kits contain reagents for fixing, embedding, and/or permeabilizing the biological sample.
- kits may contain reagents for forming a functionalized matrix (e.g., a hydrogel) and/or for functionalizing a matrix (e.g., a hydrogel) with any suitable functional moieties.
- a functionalized matrix e.g., a hydrogel
- buffers and reagents for tethering the probes and products e.g., RCA products
- the various components of the kit may be present in separate containers or certain compatible components may be pre-combined into a single container.
- the kits further contain instructions for using the components of the kit to practice the provided methods.
- instrument systems configured to perform any of the methods or processes described herein, and databases storing codebooks generated using the disclosed methods or processes.
- the disclosed systems may comprise, for example, one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to: receive a plurality of images of a biological sample, wherein the plurality of images comprises images acquired in a plurality of sequencing or probing cycles; detect, based on the plurality of images, a series of optical signals at one or more locations in the biological sample corresponding to one or more barcoded target analytes; determine, based on the series of optical signals detected in the plurality of images, a code word comprising a series of ON and OFF bits that corresponds to a barcode for one of the one or more barcoded target analytes; and identify the barcoded target analyte based on a comparison of the determined code word to a codebook, wherein the code word corresponds to a member of a codebook comprising a plurality of code words for which:
- Wj, W, ⁇ W hinderJ > K for all possible combinations of code words Wi, Wj, W m , W n , wherein W
- the disclosed databases for storing codebooks may comprise, for example, one or more non-transitory computer-readable storage medium components, the one or more non- transitory computer-readable storage medium components individually or collectively storing a codebook comprising a plurality of code words for which: for all possible combinations of code words Wi, Wj, W m , W n , wherein IV] Wj is a logical bitwise OR combination of any two code words Wi and Wj, wherein I ] W n is a logical bitwise OR combination of any two code words W m and W n , wherein K is an integer value greater than or equal to 1, wherein the codebook comprises L code words, and wherein z, j, m, and n are integers ranging in value from 0 to L - 1 and represent indices of the code words in the codebook.
- the disclosed instrument systems may comprise instruments having integrated optics and fluidics modules (e.g., “opto-fluidic instruments” or “opto-fluidic systems”) for detecting target molecules (e.g., nucleic acids, proteins, antibodies, etc.) in biological samples (e.g., one or more cells or a tissue sample) as described herein.
- the fluidics module is configured to deliver one or more reagents (e.g., detectably labeled nucleotides, polymerases, or conjugates) to the biological sample and/or remove spent reagents therefrom.
- the optics module is configured to illuminate the biological sample with light having one or more spectral emission curves (over a range of wavelengths) and subsequently capture one or more images of emitted light signals from the biological sample during one or more sequencing cycles (e.g., as described in Section III).
- an in situ assay e.g., sequencing a template nucleic acid
- the captured images may be processed in real time and/or at a later time to determine the presence of the one or more target molecules in the biological sample, as well as three-dimensional position information associated with each detected target molecule.
- the opto-fluidics instrument includes a sample module configured to receive (and, optionally, secure) one or more biological samples.
- the sample module includes an X-Y stage configured to move the biological sample along an X-Y plane (e.g., perpendicular to an objective lens of the optics module).
- the opto-fluidic instrument is configured to analyze one or more target molecules in their naturally occurring place (z.e., in situ) within the biological sample.
- an opto-fluidic instrument may be an in-situ analysis system used to analyze a biological sample and detect target molecules (e.g., analytes) including but not limited to DNA, RNA, proteins, antibodies, and/or the like.
- an opto-fluidic instrument that can be used for in situ target molecule detection via base-by-base sequencing (e.g., sequencing of an identifier sequence such as a barcode sequence) and/or other imaging or target molecule detection technique.
- an opto-fluidic instrument may include a fluidics module that includes fluids needed for establishing the experimental conditions required for the probing of target molecules in the sample.
- an opto-fluidic instrument may also include a sample module configured to receive the sample, and an optics module including an imaging system for illuminating (e.g., exciting one or more fluorescently labeled nucleotides within the sample) and/or imaging light signals received from the sample.
- the in situ analysis system may also include other ancillary modules configured to facilitate the operation of the opto-fluidic instrument, such as, but not limited to, cooling systems, motion calibration systems, etc.
- volumetric sample imaging systems e.g., an optofluidic instrument
- a z-stack of images is obtained for each Field of View (FOV) of the objective (FIG. 7).
- FOV Field of View
- tissue imaging applications automatically identifying relevant regions - those regions that contain target molecules such as nucleic acids or proteins - can be challenging as distribution of tissue is non-uniform in many biological samples (FIG. 8).
- the data extracted from the detection and analysis methods disclosed herein include the relative coordinates within a field of view (FOV) and provides intricate information regarding tissue organization.
- FOV field of view
- the systems and methods described herein use any suitable method to generate contrast of a sample against a background (e.g., illumination of a sample via bright field imaging, illumination of a sample via fluorescent imaging, inducing autofluorescence within the sample, adding contrast to the sample with one or more stains, etc.)
- FIG. 9 shows an example workflow of analysis of a biological sample 910 (e.g., cell or tissue sample) using an opto-fluidic instrument 900, according to various embodiments.
- the sample 910 can be a biological sample (e.g., a tissue) that includes molecules such as DNA, RNA, proteins, antibodies, etc.
- the sample 910 can be a sectioned tissue that is treated to access the RNA thereof for labeling with circularizable DNA probes. Ligation of the probes may generate a circular DNA probe which can be enzymatically amplified and bound with fluorescent oligonucleotides, which can create bright signal that is convenient to image and has a high signal-to-noise ratio.
- the sample 910 may be placed in the opto-fluidic instrument 900 for analysis and detection of the molecules in the sample 910.
- the opto-fluidic instrument 900 can be a system configured to facilitate the experimental conditions conducive for the detection of the target molecules.
- the opto-fluidic instrument 900 can include a fluidics module 930, an optics module 940, a sample module 950, and an ancillary module 960, and these modules may be operated by a system controller 920 to create the experimental conditions for the probing of the molecules in the sample 910 by selected probes (e.g., circularizable DNA probes), as well as to facilitate the imaging of the probed sample (e.g., by an imaging system of the optics module 940).
- the various modules of the opto-fluidic instrument 900 may be separate components in communication with each other, or at least some of them may be integrated together.
- the sample module 950 may be configured to receive the sample 910 into the opto-fluidic instrument 900.
- the sample module 950 may include a sample interface module (SIM) that is configured to receive a sample device (e.g., cassette) onto which the sample 910 can be deposited. That is, the sample 910 may be placed in the opto-fluidic instrument 900 by depositing the sample 910 (e.g., the sectioned tissue) on a sample device that is then inserted into the SIM of the sample module 950.
- SIM sample interface module
- the sample module 950 may also include an X-Y stage onto which the SIM is mounted.
- the X-Y stage may be configured to move the SIM mounted thereon (e.g., and as such the sample device containing the sample 910 inserted therein) in perpendicular directions along the two-dimensional (2D) plane of the opto-fluidic instrument 900.
- the experimental conditions that are conducive for the detection of the molecules in the sample 910 may depend on the target molecule detection technique that is employed by the opto-fluidic instrument 900.
- the opto-fluidic instrument 900 can be a system that is configured to detect molecules in the sample 910 via hybridization of probes.
- the experimental conditions can include molecule hybridization conditions that result in the intensity of hybridization of the target molecule (e.g., nucleic acid) to a probe (e.g., oligonucleotide) being significantly higher when the probe sequence is complementary to the target molecule than when there is a single-base mismatch.
- the hybridization conditions include the preparation of the sample 910 using reagents such as washing/stripping reagents, hybridizing reagents, etc., and such reagents may be provided by the fluidics module 930.
- the fluidics module 930 may include one or more components that may be used for storing the reagents, as well as for transporting said reagents to and from the sample device containing the sample 910.
- the fluidics module 930 may include reservoirs configured to store the reagents, as well as a waste container configured for collecting the reagents (e.g., and other waste) after use by the opto- fluidic instrument 900 to analyze and detect the molecules of the sample 910.
- the fluidics module 930 may also include pumps, tubes, pipettes, etc., that are configured to facilitate the transport of the reagent to the sample device (e.g., and as such the sample 910).
- the fluidics module 930 may include pumps (“reagent pumps”) that are configured to pump washing/stripping reagents to the sample device for use in washing/stripping the sample 910 (e.g., as well as other washing functions such as washing an objective lens of the imaging system of the optics module 940).
- reagent pumps that are configured to pump washing/stripping reagents to the sample device for use in washing/stripping the sample 910 (e.g., as well as other washing functions such as washing an objective lens of the imaging system of the optics module 940).
- the ancillary module 960 can be a cooling system of the opto-fluidic instrument 900, and the cooling system may include a network of coolantcarrying tubes that are configured to transport coolants to various modules of the opto-fluidic instrument 900 for regulating the temperatures thereof.
- the fluidics module 930 may include coolant reservoirs for storing the coolants and pumps (e.g., “coolant pumps”) for generating a pressure differential, thereby forcing the coolants to flow from the reservoirs to the various modules of the opto-fluidic instrument 900 via the coolant-carrying tubes.
- the fluidics module 930 may include returning coolant reservoirs that may be configured to receive and store returning coolants, i.e., heated coolants flowing back into the returning coolant reservoirs after absorbing heat discharged by the various modules of the opto-fluidic instrument 900.
- the fluidics module 930 may also include cooling fans that are configured to force air (e.g., cool and/or ambient air) into the returning coolant reservoirs to cool the heated coolants stored therein.
- the fluidics module 930 may also include cooling fans that are configured to force air directly into a component of the opto-fluidic instrument 900 so as to cool said component.
- the fluidics module 930 may include cooling fans that are configured to direct cool or ambient air into the system controller 920 to cool the same.
- the opto-fluidic instrument 900 may include an optics module 940 which include the various optical components of the opto-fluidic instrument 900, such as but not limited to a camera, an illumination module (e.g., light source such as LEDs), an objective lens, and/or the like.
- the optics module 940 may include a fluorescence imaging system that is configured to image the fluorescence emitted by the probes (e.g., oligonucleotides) in the sample 910 after the probes are excited by light from the illumination module of the optics module 940.
- the optics module 940 may also include an optical frame onto which the camera, the illumination module, and/or the X-Y stage of the sample module 950 may be mounted.
- the system controller 920 may be configured to control the operations of the opto-fluidic instrument 900 (e.g., and the operations of one or more modules thereof).
- the system controller 920 may take various forms, including a processor, a single computer (or computer system), or multiple computers in communication with each other.
- the system controller 920 may be communicatively coupled with data storage, set of input devices, display system, or a combination thereof. In some cases, some or all of these components may be considered to be part of or otherwise integrated with the system controller 920, may be separate components in communication with each other, or may be integrated together.
- the system controller 920 can be, or may be in communication with, a cloud computing platform.
- the opto-fluidic instrument 900 may analyze the sample 910 and may generate the output 970 that includes indications of the presence of the target molecules in the sample 910. For instance, with respect to the example embodiment discussed above where the opto-fluidic instrument 900 employs a hybridization technique for detecting molecules, the opto-fluidic instrument 900 may cause the sample 910 to undergo successive rounds of fluorescent probe hybridization (using two or more sets of fluorescent probes, where each set of fluorescent probes is excited by a different color channel) and be imaged to detect target molecules in the probed sample 910. In such cases, the output 970 may include optical signatures (e.g., a code word) specific to each gene, which allow the identification of the target molecules.
- optical signatures e.g., a code word
- an assembly for transilluminating a substrate can include a sample carrier device (e.g., a microfluidic chip or glass slide), a thermal control module configured to control the temperature of the sample carrier device (e.g., a thermoelectric module), and a light source configured to illuminate the sample carrier device.
- the assembly includes a heat exchanger (e.g., a fluid block having a cooling fluid flowing therethrough).
- an assembly for transilluminating can include sample carrier device (e.g., a sample substrate), an optically transparent substrate, a light source configured to illuminate the optically transparent substrate, a light scattering layer configured to scatter light from the light source, and/or a thermal control module configured to control the temperature of the sample carrier device and/or optically transparent substrate.
- sample carrier device e.g., a sample substrate
- optically transparent substrate e.g., a sample substrate
- a light source e.g., a sample substrate
- a light scattering layer configured to scatter light from the light source
- a thermal control module configured to control the temperature of the sample carrier device and/or optically transparent substrate.
- the sample carrier device (e.g., a cassette) can be configured to receive a sample.
- the sample carrier device can include one or more microfluidic channels, e.g., sample chambers or microfluidic channels etched into a planar substrate or chambers within a flow cell or microfluidic device.
- a sample carrier device for the systems disclosed herein can include, but is not limited to, a substrate configured to receive a sample, a microscope slide and/or an adapter configured to mount microscope slides (with or without coverslips) on a microscope stage or automated stage (e.g., an automated translation or rotational stage), a substrate, and/or an adapter configured to mount slides on a microscope stage or automated stage, a substrate comprising etched sample containment chambers (e.g., chambers open to the environment) and/or an adapter configured to mount such substrates on a microscope stage or automated stage, a flow cell and/or an adapter configured to mount flow cells on a microscope stage or automated stage, or a microfluidic device and/or an adapter configured to mount microfluidic devices on a microscope stage or automated stage.
- a substrate configured to receive a sample
- a microscope slide and/or an adapter configured to mount microscope slides (with or without coverslips) on a microscope stage or automated stage
- a substrate comprising etched sample containment
- the sample carrier device further includes a cassette configured to secure a substrate (e.g., a glass slide).
- a substrate e.g., a glass slide
- the cassette includes two or more components (e.g., a top half and a bottom half) into which the substrate is secured.
- the one or more sample carrier devices can be designed for performing a variety of chemical analysis, biochemical analysis, nucleic acid analysis, cell analysis, or tissue analysis applications.
- the sample carrier device e.g., flow cells and microfluidic devices
- the sample carrier device may comprise a sample, e.g., a tissue sample.
- the sample carrier device e.g., flow cells and microfluidic devices
- sample carrier devices for the disclosed systems can be fabricated from any of a variety of materials known to those of skill in the art including, but not limited to, glass (e.g., borosilicate glass, soda lime glass, etc.), fused silica (quartz), silicon, polymer (e.g., polystyrene (PS), macroporous polystyrene (MPPS), polymethylmethacrylate (PMMA), polycarbonate (PC), polypropylene (PP), polyethylene (PE), high density polyethylene (HDPE), cyclic olefin polymers (COP), cyclic olefin copolymers (COC), polyethylene terephthalate (PET), poly dimethylsiloxane (PDMS), etc.), polyetherimide (PEI) and perfluoroelasto
- the one or more materials used to fabricate sample carrier devices for the disclosed systems can be optically transparent to facilitate use with spectroscopic or imaging-based detection techniques.
- the entire sample carrier device can be optically transparent.
- only a portion of the sample carrier device e.g., an optically transparent “window”) can be optically transparent.
- sample carrier devices for the disclosed systems can be fabricated using any of a variety of techniques known to those of skill in the art, where the choice of fabrication technique is often dependent on the choice of material used, and vice versa.
- sample carrier device fabrication techniques include, but are not limited to, extrusion, drawing, precision computer numerical control (CNC) machining and boring, laser photoablation, photolithography in combination with wet chemical etching, deep reactive ion etching (DRIE), micro-molding, embossing, 3D-printing, thermal bonding, adhesive bonding, anodic bonding, and the like (see, e.g., Gale, et al. (2016), “A Review of Current Methods in Microfluidic Device Fabrication and Future Commercialization Prospects”, Inventions 3, 60, 1 - 25, which is hereby incorporated by reference in its entirety).
- CNC computer numerical control
- DRIE deep reactive ion etching
- FIG. 10A illustrates a cross-sectional view of an optics module 1000 in an imaging system.
- One or more illumination sources 1010 e.g., one or more light emitting diodes (LEDs)
- LEDs light emitting diodes
- the optical components include a collimator 1011.
- the optical components include a field stop 1012.
- the optical components include one or more excitation filters 1013.
- the one or more excitation filters 1013 are configured to filter light from the illumination source(s) 1010 for a predetermined range of wavelengths (e.g., each filter has one or more blocking band(s) and/or transmission band(s) that may be different or may overlap at least in part) and each excitation filter 1013 is aligned with appropriate illumination sources (e.g., blue LEDs, green LEDs, yellow LEDs, red LEDs, ultraviolet LEDs, etc.).
- the optical components include a condenser 1014.
- the optical components include a beam splitter 1015.
- An optical axis 1051 is illustrated extending through the center of the optical surfaces in the objective lens 1020 and its path includes an image plane, a focal plane, and input/output pupils (illustrated in FIG. 10B).
- a sensor array 1060 receives light signals from the sample 1050.
- the optical components include one or more emission filters 1065.
- the one or more emission filters 1065 are configured to filter light from the sample (e.g., emitted from one or more fluorophores, autofluorescence, etc.) for a predetermined range of wavelengths (e.g., each filter has one or more blocking band(s) and/or transmission band(s) that may be different or may overlap at least in part).
- the emission filters 1065 align (e.g., via motorized translation) with optics and/or the sensor array.
- the sample 1050 is probed with fluorescent probes configured to bind to a target (e.g., DNA or RNA) that, when illuminated with a particular wavelength (or range of wavelengths) of light, emit light signals that can be detected by the sensor array 1060.
- a target e.g., DNA or RNA
- the sample 1050 is repeatedly probed with two or more (e.g., two, three, four, five, six, etc.) different sets of probes.
- each set of probes corresponds to a specific color (e.g., blue, green, yellow, or red) such that, when illuminated by that color, probes bound to a target emit light signals.
- the sensor array 1060 is aligned with the optical axis 1051 of the objective lens 1020 (i.e., the optical axis of the camera is coincident with and parallel to the optical axis of the objective lens 1020). In various embodiments, the sensor array 1060 is positioned perpendicularly to the objective lens 1020 (i.e., the optical axis of the camera is perpendicular to and intersects the optical axis of the objective lens 1020). In various embodiments, a tube lens 1061 is mounted in the optical path to focus light on the sensor array 1060 thereby allowing for image formation with infinity -corrected objectives. Descriptions of optical modules and illumination assemblies for use in opto-fluidic instruments can be found in U.S.
- the sample is illuminated with one or more wavelengths configured to induce fluorescence in the sample.
- the sample is probed during one or more probing cycles with one or more fluorescent probes configured to bind to one or more target analytes.
- the one or more wavelengths are selected to induce fluorescence in a subset of the one or more fluorescent probes.
- each probing cycle includes illumination with two or more (e.g., four) colors of light.
- the sample is treated with a fluorescent stain configured to illuminate one or more structures within the sample.
- the sample is contacted with a nuclear stain.
- the sample is contacted with 4',6-diamidino-2-phenylindole (“DAPI”) configured to bind to adenine-thymine-rich regions in DNA.
- illumination of the sample causes autofluorescence of the sample.
- autofluorescence is the natural emission of light by biological structures when they have absorbed light, and may be used to distinguish the light originating from artificially added fluorescent markers.
- fluorescence of the sample through fluorescent probes, autofluorescence, and/or a fluorescent stain can be used with the methods described herein to determine one or more focus metrics of a tissue sample.
- the sample is illuminated via edge lighting or transillumination along one or more edges of the sample and/or sample substrate.
- the edge lighting provides dark-field illumination of the sample.
- edge lighting is provided by one or more light sources positioned to provide light substantially perpendicular to a normal of the substrate surface on which the sample is disposed.
- the substrate is a glass slide.
- the substrate is configured as a wave guide to thereby guide light emitted from the edge lighting towards the sample.
- illumination of the sample via edge lighting can be used with the methods described herein to determine one or more focus metrics of a tissue sample.
- Example: A mouse brain tissue sample is provided (fresh frozen or FFPE).
- the tissue sample can optionally be permeabilized (FFPE is already permeabilized).
- the tissue sample is contacted with a plurality of barcoded probes.
- the tissue sample is positioned in an optofluidic instrument having an OR-robust codebook stored thereon and, in each probing cycle of a plurality of probing cycles, the tissue sample is contacted with fluorescent tags. Fluorescent blobs from the tissue sample are detected by the optofluidic instrument in each probing cycle and the blobs are registered and/or aligned across all cycles.
- the optical signals are converted into an observed codeword, for example, using a probabilistic based decoder, The resulting observed codewords from the observed optical signals are decoded against an OR-robust codebook stored on the instrument.
- FIG. 11 illustrates an example of a computing device or system in accordance with one or more examples of the disclosure.
- Device 1100 can be a host computer connected to a network.
- Device 1100 can be a client computer or a server.
- device 1100 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device (portable electronic device), such as a phone or tablet.
- the device can include, for example, one or more of processor 1110, input device 1120, output device 1130, memory / storage 1140, and communication device 1160.
- Input device 1120 and output device 1130 can generally correspond to those described above, and they can either be connectable or integrated with the computer.
- Input device 1120 can be any suitable device that provides input, such as a touch screen, keyboard or keypad, mouse, or voice-recognition device.
- Output device 1130 can be any suitable device that provides output, such as a touch screen, haptics device, or speaker.
- Storage 1140 can be any suitable device that provides storage, such as an electrical, magnetic, or optical memory including a RAM, cache, hard drive, or removable storage disk.
- Communication device 1160 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or device.
- the components of the computer can be connected in any suitable manner, such as via a physical bus 1170 or wirelessly.
- Software 1150 which can be stored in memory / storage 1140 and executed by processor 1110, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the methods and systems described above).
- Software 1150 can also be stored and/or transported within any non-transitory computer- readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions.
- a computer-readable storage medium can be any medium, such as storage 1140, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
- Software 1150 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions.
- a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device.
- the transport readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.
- Device 1100 may be connected to a network, which can be any suitable type of interconnected communication system.
- the network can implement any suitable communications protocol and can be secured by any suitable security protocol.
- the network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.
- Device 1100 can implement any operating system suitable for operating on the network.
- Software 1150 can be written in any suitable programming language, such as C, C++, Java, or Python.
- application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a web browser as a web-based application or web service, for example.
- polynucleotide refers to polymeric forms of nucleotides of any length, either ribonucleotides or deoxyribonucleotides.
- this term comprises, but is not limited to, single-, double-, or multi- stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
- “Ligation” may refer to the formation of a covalent bond or linkage between the termini of two or more nucleic acids, e.g., oligonucleotides and/or polynucleotides, in a template-driven reaction.
- the nature of the bond or linkage may vary widely and the ligation may be carried out enzymatically or chemically.
- ligations are usually carried out enzymatically to form a phosphodiester linkage between a 5' carbon terminal nucleotide of one oligonucleotide with a 3' carbon of another nucleotide.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Pathology (AREA)
- Immunology (AREA)
- General Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne des procédés utilisés pour la conception d'un livre de codes comprenant un ensemble de mots de code qui sont attribués à des analytes cibles à code-barres dans une analyse in situ multiplexée, le livre de codes étant conçu pour minimiser l'influence d'un encombrement spatial d'analyte cible sur le décodage et la détection précis d'analytes cibles. L'invention concerne également des procédés pour effectuer un décodage in situ à l'aide des livres de codes décrits, où, pour tous les mots de code valides dans le livre de codes, une première distance de Hamming entre une première combinaison OU de bit logique de n'importe quelle paire de mots de code valides et une seconde combinaison OU de bit logique de n'importe quelle autre paire de mots de code valides est supérieure ou égale à 1.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463649266P | 2024-05-17 | 2024-05-17 | |
| US63/649,266 | 2024-05-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025240918A1 true WO2025240918A1 (fr) | 2025-11-20 |
Family
ID=95981260
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/029852 Pending WO2025240918A1 (fr) | 2024-05-17 | 2025-05-16 | Systèmes et procédés de génération de livres de codes |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025240918A1 (fr) |
Citations (58)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4318846A (en) | 1979-09-07 | 1982-03-09 | Syva Company | Novel ether substituted fluorescein polyamino acid compounds as fluorescers and quenchers |
| US4757141A (en) | 1985-08-26 | 1988-07-12 | Applied Biosystems, Incorporated | Amino-derivatized phosphite and phosphate linking agents, phosphoramidite precursors, and useful conjugates thereof |
| US5066580A (en) | 1988-08-31 | 1991-11-19 | Becton Dickinson And Company | Xanthene dyes that emit to the red of fluorescein |
| US5091519A (en) | 1986-05-01 | 1992-02-25 | Amoco Corporation | Nucleotide compositions with linking groups |
| US5151507A (en) | 1986-07-02 | 1992-09-29 | E. I. Du Pont De Nemours And Company | Alkynylamino-nucleotides |
| US5188934A (en) | 1989-11-14 | 1993-02-23 | Applied Biosystems, Inc. | 4,7-dichlorofluorescein dyes as molecular probes |
| US5366860A (en) | 1989-09-29 | 1994-11-22 | Applied Biosystems, Inc. | Spectrally resolvable rhodamine dyes for nucleic acid sequence determination |
| EP0703991A1 (fr) | 1994-04-04 | 1996-04-03 | Spectragen, Inc. | Sequen age d'adn par ligature et clivage par etapes |
| US5688648A (en) | 1994-02-01 | 1997-11-18 | The Regents Of The University Of California | Probes labelled with energy transfer coupled dyes |
| US5800996A (en) | 1996-05-03 | 1998-09-01 | The Perkin Elmer Corporation | Energy transfer dyes with enchanced fluorescence |
| US5847162A (en) | 1996-06-27 | 1998-12-08 | The Perkin Elmer Corporation | 4, 7-Dichlororhodamine dyes |
| US5990479A (en) | 1997-11-25 | 1999-11-23 | Regents Of The University Of California | Organo Luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6207392B1 (en) | 1997-11-25 | 2001-03-27 | The Regents Of The University Of California | Semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6251303B1 (en) | 1998-09-18 | 2001-06-26 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6322901B1 (en) | 1997-11-13 | 2001-11-27 | Massachusetts Institute Of Technology | Highly luminescent color-selective nano-crystalline materials |
| US20020045045A1 (en) | 2000-10-13 | 2002-04-18 | Adams Edward William | Surface-modified semiconductive and metallic nanoparticles having enhanced dispersibility in aqueous media |
| US6391937B1 (en) | 1998-11-25 | 2002-05-21 | Motorola, Inc. | Polyacrylamide hydrogels and hydrogel arrays made from polyacrylamide reactive prepolymers |
| US6426513B1 (en) | 1998-09-18 | 2002-07-30 | Massachusetts Institute Of Technology | Water-soluble thiol-capped nanocrystals |
| US20030013091A1 (en) | 2001-07-03 | 2003-01-16 | Krassen Dimitrov | Methods for detection and quantification of analytes in complex mixtures |
| US20030017264A1 (en) | 2001-07-20 | 2003-01-23 | Treadway Joseph A. | Luminescent nanoparticles and methods for their preparation |
| US6576291B2 (en) | 2000-12-08 | 2003-06-10 | Massachusetts Institute Of Technology | Preparation of nanocrystallites |
| US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
| US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
| US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
| US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
| US20070166705A1 (en) | 2002-08-23 | 2007-07-19 | John Milton | Modified nucleotides |
| US20090118128A1 (en) | 2005-07-20 | 2009-05-07 | Xiaohai Liu | Preparation of templates for nucleic acid sequencing |
| US7544794B1 (en) | 2005-03-11 | 2009-06-09 | Steven Albert Benner | Method for sequencing DNA and RNA by synthesis |
| US20100015607A1 (en) | 2005-12-23 | 2010-01-21 | Nanostring Technologies, Inc. | Nanoreporters and methods of manufacturing and use thereof |
| US20100047924A1 (en) | 2008-08-14 | 2010-02-25 | Nanostring Technologies, Inc. | Stable nanoreporters |
| US20100055733A1 (en) | 2008-09-04 | 2010-03-04 | Lutolf Matthias P | Manufacture and uses of reactive microcontact printing of biomolecules on soft hydrogels |
| US20100112710A1 (en) | 2007-04-10 | 2010-05-06 | Nanostring Technologies, Inc. | Methods and computer systems for identifying target-specific sequences for use in nanoreporters |
| US20100261026A1 (en) | 2005-12-23 | 2010-10-14 | Nanostring Technologies, Inc. | Compositions comprising oriented, immobilized macromolecules and methods for their preparation |
| US20100262374A1 (en) | 2006-05-22 | 2010-10-14 | Jenq-Neng Hwang | Systems and methods for analyzing nanoreporters |
| US7883869B2 (en) | 2006-12-01 | 2011-02-08 | The Trustees Of Columbia University In The City Of New York | Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators |
| US20110059865A1 (en) | 2004-01-07 | 2011-03-10 | Mark Edward Brennan Smith | Modified Molecular Arrays |
| US7956171B2 (en) | 2007-05-18 | 2011-06-07 | Helicos Biosciences Corp. | Nucleotide analogs |
| US8034923B1 (en) | 2009-03-27 | 2011-10-11 | Steven Albert Benner | Reagents for reversibly terminating primer extension |
| US8071755B2 (en) | 2004-05-25 | 2011-12-06 | Helicos Biosciences Corporation | Nucleotide analogs |
| US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
| US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
| US8703461B2 (en) | 2009-06-05 | 2014-04-22 | Life Technologies Corporation | Mutant RB69 DNA polymerase |
| US8808989B1 (en) | 2013-04-02 | 2014-08-19 | Molecular Assemblies, Inc. | Methods and apparatus for synthesizing nucleic acids |
| US20140371088A1 (en) | 2013-06-14 | 2014-12-18 | Nanostring Technologies, Inc. | Multiplexable tag-based reporter system |
| US9217178B2 (en) | 2004-12-13 | 2015-12-22 | Illumina Cambridge Limited | Method of nucleotide detection |
| US20160116384A1 (en) | 2014-02-21 | 2016-04-28 | Massachusetts Institute Of Technology | Expansion microscopy |
| US9399798B2 (en) | 2011-09-13 | 2016-07-26 | Lasergen, Inc. | 3′-OH unblocked, fast photocleavable terminating nucleotides and methods for nucleic acid sequencing |
| US9512422B2 (en) | 2013-02-26 | 2016-12-06 | Illumina, Inc. | Gel patterned surfaces |
| US20170253918A1 (en) | 2016-03-01 | 2017-09-07 | Expansion Technologies | Combining protein barcoding with expansion microscopy for in-situ, spatially-resolved proteomics |
| US20180052081A1 (en) | 2016-05-11 | 2018-02-22 | Expansion Technologies | Combining modified antibodies with expansion microscopy for in-situ, spatially-resolved proteomics |
| US9951385B1 (en) | 2017-04-25 | 2018-04-24 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10655176B2 (en) | 2017-04-25 | 2020-05-19 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10768173B1 (en) | 2019-09-06 | 2020-09-08 | Element Biosciences, Inc. | Multivalent binding composition for nucleic acid analysis |
| US10982280B2 (en) | 2018-11-14 | 2021-04-20 | Element Biosciences, Inc. | Multipart reagents having increased avidity for polymerase binding |
| US20220084628A1 (en) | 2020-09-16 | 2022-03-17 | 10X Genomics, Inc. | Methods and systems for barcode error correction |
| WO2023172915A1 (fr) * | 2022-03-08 | 2023-09-14 | 10X Genomics, Inc. | Procédés de conception de code in situ pour réduire à un minimum un chevauchement optique |
| EP4273263A2 (fr) * | 2014-07-30 | 2023-11-08 | President and Fellows of Harvard College | Systèmes et méthodes permettant de déterminer des acides nucléiques |
-
2025
- 2025-05-16 WO PCT/US2025/029852 patent/WO2025240918A1/fr active Pending
Patent Citations (65)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4318846A (en) | 1979-09-07 | 1982-03-09 | Syva Company | Novel ether substituted fluorescein polyamino acid compounds as fluorescers and quenchers |
| US4757141A (en) | 1985-08-26 | 1988-07-12 | Applied Biosystems, Incorporated | Amino-derivatized phosphite and phosphate linking agents, phosphoramidite precursors, and useful conjugates thereof |
| US5091519A (en) | 1986-05-01 | 1992-02-25 | Amoco Corporation | Nucleotide compositions with linking groups |
| US5151507A (en) | 1986-07-02 | 1992-09-29 | E. I. Du Pont De Nemours And Company | Alkynylamino-nucleotides |
| US5066580A (en) | 1988-08-31 | 1991-11-19 | Becton Dickinson And Company | Xanthene dyes that emit to the red of fluorescein |
| US5366860A (en) | 1989-09-29 | 1994-11-22 | Applied Biosystems, Inc. | Spectrally resolvable rhodamine dyes for nucleic acid sequence determination |
| US5188934A (en) | 1989-11-14 | 1993-02-23 | Applied Biosystems, Inc. | 4,7-dichlorofluorescein dyes as molecular probes |
| US5688648A (en) | 1994-02-01 | 1997-11-18 | The Regents Of The University Of California | Probes labelled with energy transfer coupled dyes |
| EP0703991A1 (fr) | 1994-04-04 | 1996-04-03 | Spectragen, Inc. | Sequen age d'adn par ligature et clivage par etapes |
| US5552278A (en) | 1994-04-04 | 1996-09-03 | Spectragen, Inc. | DNA sequencing by stepwise ligation and cleavage |
| US5800996A (en) | 1996-05-03 | 1998-09-01 | The Perkin Elmer Corporation | Energy transfer dyes with enchanced fluorescence |
| US5847162A (en) | 1996-06-27 | 1998-12-08 | The Perkin Elmer Corporation | 4, 7-Dichlororhodamine dyes |
| US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
| US6322901B1 (en) | 1997-11-13 | 2001-11-27 | Massachusetts Institute Of Technology | Highly luminescent color-selective nano-crystalline materials |
| US5990479A (en) | 1997-11-25 | 1999-11-23 | Regents Of The University Of California | Organo Luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6423551B1 (en) | 1997-11-25 | 2002-07-23 | The Regents Of The University Of California | Organo luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6207392B1 (en) | 1997-11-25 | 2001-03-27 | The Regents Of The University Of California | Semiconductor nanocrystal probes for biological applications and process for making and using such probes |
| US6251303B1 (en) | 1998-09-18 | 2001-06-26 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6319426B1 (en) | 1998-09-18 | 2001-11-20 | Massachusetts Institute Of Technology | Water-soluble fluorescent semiconductor nanocrystals |
| US6426513B1 (en) | 1998-09-18 | 2002-07-30 | Massachusetts Institute Of Technology | Water-soluble thiol-capped nanocrystals |
| US6444143B2 (en) | 1998-09-18 | 2002-09-03 | Massachusetts Institute Of Technology | Water-soluble fluorescent nanocrystals |
| US6391937B1 (en) | 1998-11-25 | 2002-05-21 | Motorola, Inc. | Polyacrylamide hydrogels and hydrogel arrays made from polyacrylamide reactive prepolymers |
| US20020045045A1 (en) | 2000-10-13 | 2002-04-18 | Adams Edward William | Surface-modified semiconductive and metallic nanoparticles having enhanced dispersibility in aqueous media |
| US6576291B2 (en) | 2000-12-08 | 2003-06-10 | Massachusetts Institute Of Technology | Preparation of nanocrystallites |
| US20030013091A1 (en) | 2001-07-03 | 2003-01-16 | Krassen Dimitrov | Methods for detection and quantification of analytes in complex mixtures |
| US20070166708A1 (en) | 2001-07-03 | 2007-07-19 | Krassen Dimitrov | Methods for detection and quantification of analytes in complex mixtures |
| US20030017264A1 (en) | 2001-07-20 | 2003-01-23 | Treadway Joseph A. | Luminescent nanoparticles and methods for their preparation |
| US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
| US20060188901A1 (en) | 2001-12-04 | 2006-08-24 | Solexa Limited | Labelled nucleotides |
| US20070166705A1 (en) | 2002-08-23 | 2007-07-19 | John Milton | Modified nucleotides |
| US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
| US9889422B2 (en) | 2004-01-07 | 2018-02-13 | Illumina Cambridge Limited | Methods of localizing nucleic acids to arrays |
| US20110059865A1 (en) | 2004-01-07 | 2011-03-10 | Mark Edward Brennan Smith | Modified Molecular Arrays |
| US8071755B2 (en) | 2004-05-25 | 2011-12-06 | Helicos Biosciences Corporation | Nucleotide analogs |
| US9217178B2 (en) | 2004-12-13 | 2015-12-22 | Illumina Cambridge Limited | Method of nucleotide detection |
| US7544794B1 (en) | 2005-03-11 | 2009-06-09 | Steven Albert Benner | Method for sequencing DNA and RNA by synthesis |
| US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
| US20090118128A1 (en) | 2005-07-20 | 2009-05-07 | Xiaohai Liu | Preparation of templates for nucleic acid sequencing |
| US20100015607A1 (en) | 2005-12-23 | 2010-01-21 | Nanostring Technologies, Inc. | Nanoreporters and methods of manufacturing and use thereof |
| US20100261026A1 (en) | 2005-12-23 | 2010-10-14 | Nanostring Technologies, Inc. | Compositions comprising oriented, immobilized macromolecules and methods for their preparation |
| US20100262374A1 (en) | 2006-05-22 | 2010-10-14 | Jenq-Neng Hwang | Systems and methods for analyzing nanoreporters |
| US7883869B2 (en) | 2006-12-01 | 2011-02-08 | The Trustees Of Columbia University In The City Of New York | Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators |
| US20100112710A1 (en) | 2007-04-10 | 2010-05-06 | Nanostring Technologies, Inc. | Methods and computer systems for identifying target-specific sequences for use in nanoreporters |
| US7956171B2 (en) | 2007-05-18 | 2011-06-07 | Helicos Biosciences Corp. | Nucleotide analogs |
| US20100047924A1 (en) | 2008-08-14 | 2010-02-25 | Nanostring Technologies, Inc. | Stable nanoreporters |
| US20100055733A1 (en) | 2008-09-04 | 2010-03-04 | Lutolf Matthias P | Manufacture and uses of reactive microcontact printing of biomolecules on soft hydrogels |
| US8034923B1 (en) | 2009-03-27 | 2011-10-11 | Steven Albert Benner | Reagents for reversibly terminating primer extension |
| US8703461B2 (en) | 2009-06-05 | 2014-04-22 | Life Technologies Corporation | Mutant RB69 DNA polymerase |
| US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
| US9399798B2 (en) | 2011-09-13 | 2016-07-26 | Lasergen, Inc. | 3′-OH unblocked, fast photocleavable terminating nucleotides and methods for nucleic acid sequencing |
| US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
| US9512422B2 (en) | 2013-02-26 | 2016-12-06 | Illumina, Inc. | Gel patterned surfaces |
| US8808989B1 (en) | 2013-04-02 | 2014-08-19 | Molecular Assemblies, Inc. | Methods and apparatus for synthesizing nucleic acids |
| US20140371088A1 (en) | 2013-06-14 | 2014-12-18 | Nanostring Technologies, Inc. | Multiplexable tag-based reporter system |
| US20160116384A1 (en) | 2014-02-21 | 2016-04-28 | Massachusetts Institute Of Technology | Expansion microscopy |
| EP4273263A2 (fr) * | 2014-07-30 | 2023-11-08 | President and Fellows of Harvard College | Systèmes et méthodes permettant de déterminer des acides nucléiques |
| US20170253918A1 (en) | 2016-03-01 | 2017-09-07 | Expansion Technologies | Combining protein barcoding with expansion microscopy for in-situ, spatially-resolved proteomics |
| US20180052081A1 (en) | 2016-05-11 | 2018-02-22 | Expansion Technologies | Combining modified antibodies with expansion microscopy for in-situ, spatially-resolved proteomics |
| US9951385B1 (en) | 2017-04-25 | 2018-04-24 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10655176B2 (en) | 2017-04-25 | 2020-05-19 | Omniome, Inc. | Methods and apparatus that increase sequencing-by-binding efficiency |
| US10982280B2 (en) | 2018-11-14 | 2021-04-20 | Element Biosciences, Inc. | Multipart reagents having increased avidity for polymerase binding |
| US10768173B1 (en) | 2019-09-06 | 2020-09-08 | Element Biosciences, Inc. | Multivalent binding composition for nucleic acid analysis |
| US20220084628A1 (en) | 2020-09-16 | 2022-03-17 | 10X Genomics, Inc. | Methods and systems for barcode error correction |
| WO2023172915A1 (fr) * | 2022-03-08 | 2023-09-14 | 10X Genomics, Inc. | Procédés de conception de code in situ pour réduire à un minimum un chevauchement optique |
Non-Patent Citations (14)
| Title |
|---|
| "Methods in Enzymology", vol. 572, 1 January 2016, ELSEVIER, ACADEMIC PRESS, NL, ISBN: 978-0-12-805382-9, article J.R. MOFFITT ET AL: "RNA Imaging with Multiplexed Error-Robust Fluorescence In Situ Hybridization (MERFISH)", pages: 1 - 49, XP055693313, DOI: 10.1016/bs.mie.2016.03.020 * |
| ARCHER ET AL.: "Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage", BMC GENOMICS, vol. 15, 2014, pages 401, XP021187323, DOI: 10.1186/1471-2164-15-401 |
| BOLOGNESI ET AL., J. HISTOCHEM. CYTOCHEM., vol. 65, no. 8, 2017, pages 431 - 444 |
| CHEN ET AL., NAT. METHODS, vol. 13, 2016, pages 679 - 684 |
| CHEN ET AL., SCIENCE, vol. 347, no. 6221, 2015, pages 543 - 548 |
| CHEN KOK HAO ET AL: "Spatially resolved, highly multiplexed RNA profiling in single cells", SCIENCE - AUTHOR MANUSCRIPT, vol. 348, no. 6233, 24 April 2015 (2015-04-24), US, XP055879252, ISSN: 0036-8075, DOI: 10.1126/science.aaa6090 * |
| GALE ET AL.: "A Review of Current Methods in Microfluidic Device Fabrication and Future Commercialization Prospects", INVENTIONS, vol. 3, no. 60, 2018, pages 1 - 25 |
| HOAGLAND: "Handbook of Fluorescent Probes and Research Chemicals", 2002, MOLECULAR PROBES, INC. |
| JAMUR ET AL., METHOD MOL. BIOL., vol. 588, 2010, pages 63 - 66 |
| KELLERMANAK: "DNA Probes", 1993, STOCKTON PRESS |
| LIN ET AL., NAT COMMUN., vol. 6, 2015, pages 8390 |
| PIRICI ET AL., J. HISTOCHEM. CYTOCHEM., vol. 57, 2009, pages 899 - 905 |
| VANDERNOOT, V.A.: "cDNA normalization by hydroxyapatite chromatography to enrich transcriptome diversity in RNA-seq applications", BIOTECHNIQUES, vol. 53, no. 6, 2012, pages 373 - 80 |
| WETMUR: "Critical Reviews in Biochemistry and Molecular Biology", vol. 26, 1991, IRL PRESS, pages: 227 - 259 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12365944B1 (en) | Compositions and methods for amplification and sequencing | |
| USRE48913E1 (en) | Spatially addressable molecular barcoding | |
| JP2024056705A (ja) | コンビナトリアルバーコーディングのための方法および組成物 | |
| US12400733B2 (en) | In situ code design methods for minimizing optical crowding | |
| US20240026426A1 (en) | Decoy oligonucleotides and related methods | |
| US20240331348A1 (en) | Multi-resolution in situ decoding | |
| CN115210760A (zh) | 分析物的空间分析 | |
| US20240263220A1 (en) | In situ analysis of variant sequences in biological samples | |
| US12417646B2 (en) | Systems and methods for image segmentation | |
| WO2025240918A1 (fr) | Systèmes et procédés de génération de livres de codes | |
| WO2024107727A1 (fr) | Systèmes et procédés d'atténuation active de vibrations | |
| US20250270636A1 (en) | Multi-fluorophore single nucleotide complexes for sequencing | |
| US20250285229A1 (en) | Multi-focus image fusion with background removal | |
| US20250277262A1 (en) | Click-chemistry retention of fluorescent nucleotides | |
| US20250257394A1 (en) | Polymerase-conjugate binding stabilization | |
| US12406371B2 (en) | Systems and methods for image segmentation using multiple stain indicators | |
| US20250207189A1 (en) | Dinucleotide stochastic sequencing | |
| US20250188524A1 (en) | Graphical user interface and method of estimating an instrument run completion time | |
| US20250117932A1 (en) | Feature pyramiding for in situ data visualizations (aka dynamic display of molecular information dependent on zoom level) | |
| US20250012786A1 (en) | Systems and methods for tissue bounds detection | |
| US20250092443A1 (en) | Rolling circle amplification methods and probes for improved spatial analysis | |
| US20250257391A1 (en) | Rolling circle amplification comprising crosslinking and de-crosslinking |