[go: up one dir, main page]

WO2024254071A1 - Systèmes d'imagerie tridimensionnelle et procédés pour déterminer la présence d'acides nucléiques dans des tissus épais - Google Patents

Systèmes d'imagerie tridimensionnelle et procédés pour déterminer la présence d'acides nucléiques dans des tissus épais Download PDF

Info

Publication number
WO2024254071A1
WO2024254071A1 PCT/US2024/032412 US2024032412W WO2024254071A1 WO 2024254071 A1 WO2024254071 A1 WO 2024254071A1 US 2024032412 W US2024032412 W US 2024032412W WO 2024254071 A1 WO2024254071 A1 WO 2024254071A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
images
nucleic acid
confocal microscope
merfish
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/032412
Other languages
English (en)
Inventor
Xiaowei Zhuang
Rongxin FANG
Aaron Halpern
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harvard University
Original Assignee
Harvard University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harvard University filed Critical Harvard University
Publication of WO2024254071A1 publication Critical patent/WO2024254071A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6841In situ hybridisation
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B21/00Microscopes
    • G02B21/0004Microscopes specially adapted for specific applications
    • G02B21/002Scanning microscopes
    • G02B21/0024Confocal scanning microscopes (CSOMs) or confocal "macroscopes"; Accessories which are not restricted to use with CSOMs, e.g. sample holders
    • G02B21/0052Optical details of the image generation
    • G02B21/0076Optical details of the image generation arrangements using fluorescence or luminescence
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B21/00Microscopes
    • G02B21/36Microscopes arranged for photographic purposes or projection purposes or digital imaging or video purposes including associated control and data processing arrangements
    • G02B21/365Control or image processing arrangements for digital or video microscopes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/6428Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes"
    • G01N2021/6439Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks
    • G01N2021/6441Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks with two or more labels
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/645Specially adapted constructive features of fluorimeters
    • G01N21/6456Spatial resolved fluorescence measurements; Imaging
    • G01N21/6458Fluorescence microscopy
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure generally relates to microscopy including confocal microscopy and spatial genomics.
  • MEFISH Multiplexed error-robust fluorescence in-situ hybridization
  • Recent advances in genome- scale imaging methods have allowed in situ gene expression profiling, 3D-genome imaging, and epigenomic profiling of individual cells, which has in turn allowed identification and spatial mapping of molecularly defined cell types in intact tissues.
  • multiplexed error-robust fluorescence in situ hybridization allows simultaneously imaging of thousands of genes by using combinatorial labeling to assign unique barcodes to individual genes, sequential rounds of imaging to read out the barcodes of individual RNA molecules, and error-robust barcoding schemes to ensure high detection accuracy.
  • MERFISH has also been extended to allow spatially resolved 3D-gemone imaging and epigenomic profiling of individual cells. See, e.g., U.S. Pat. No. 11,098,303, incorporated herein by reference.
  • MERFISH measurements have been performed on thin tissue sections of ⁇ 10 micrometer thickness.
  • tissue deformation that occurs during sectioning makes it challenging to align images of serial thin sections and determine the 3D molecular and cellular architecture of tissues.
  • thicktissue MERFISH imaging faces several challenges. First, the fluorescence background caused by out-of-focus signal reduces image quality.
  • the present disclosure generally relates to microscopy including confocal microscopy.
  • the subject matter of the present disclosure involves, in some cases, interrelated products, alternative solutions to a particular problem, and/or a plurality of different uses of one or more systems and/or articles.
  • One aspect is generally directed to using a confocal microscope to acquire images of a sample, and determining nucleic acids within the sample using MERFISH.
  • Another aspect is generally directed to using a confocal microscope to acquire images of a sample, and determining nucleic acids within the sample in 3 dimensions using the images.
  • Yet another aspect is generally directed to exposing a sample to a plurality of nucleic acid probes; for each of the nucleic acid probes, determining binding of the nucleic acid probes within the sample by acquiring images of the sample using a confocal microscope; creating codewords based on the binding of the nucleic acid probes; and for at least some of the codewords, matching the codeword to a valid codeword wherein, if no match is found, applying error correction to the codeword to form a valid codeword.
  • Yet another aspect is generally directed to using a confocal microscope to acquire images of a sample, enhancing the images using machine learning, and determining nucleic acids within the sample.
  • Still another aspect is generally directed to using a confocal microscope to acquire images of a sample, enhancing the images using machine learning, and determining nucleic acids within the sample using MERFISH.
  • Another aspect is generally directed to using a confocal microscope to acquire images of a sample, and determining nucleic acids within the sample by exposing the sample to a plurality of nucleic acid probes and determining binding of the plurality of the nucleic acid probes to the sample.
  • Yet another aspect is generally directed to using a confocal microscope to acquire images of a sample, enhancing the images using machine learning, and determining nucleic acids within the sample by exposing the sample to a plurality of nucleic acid probes and determining binding of the plurality of the nucleic acid probes to the sample.
  • present disclosure encompasses methods of making one or more of the embodiments described herein. In still another aspect, the present disclosure encompasses methods of using one or more of the embodiments described herein.
  • Figs. 1A-1D illustrate enhanced performance of confocal MERFISH imaging by deep learning, in accordance with one embodiment
  • Figs. 2A-2G illustrate imaging of thick brain tissue sections, in another embodiment
  • Figs. 3A-3F illustrate spatial organization of cell types in tissue sections, in yet another embodiment
  • Fig. 4 illustrates a comparison of epifluorescece and confocal images, in still another embodiment
  • Figs. 5A-5D illustrate MERFISH images of RNA molecules in tissues, in yet another embodiment
  • Figs. 6A-6E illustrate MERFISH encoding and readouts, in still another embodiment
  • Figs. 7A-7D illustrate displacement of RNA molecules, in yet another embodiment
  • Figs. 8A-8C illustrates gel expansion, in one embodiment
  • Figs. 9A-9C illustrates 3D MERFISH imaging in mouse tissue, in another embodiment
  • Fig. 10 illustrates cell type identification in mouse tissue, in one embodiment
  • Figs. 11A-1 IB illustrate MERFISH imaging in mouse tissue, in another embodiment
  • Figs. 12A-12B illustrate cell type identification in mouse tissue, in still another embodiment
  • Figs. 13A-13B illustrates multiple read sequences distributed in a population of different nucleic acid probes, in accordance with another embodiment.
  • the present disclosure generally relates to microscopy, including confocal microscopy.
  • techniques such as MERFISH can be used to determine nucleic acids within a sample, with images acquired using confocal microscopy.
  • the nucleic acids may be determined in 3 dimensions.
  • relatively thick samples e.g., greater than 10 micrometers thick or greater than 100 micrometers thick, may be determined.
  • deep learning or other machine learning techniques may be used to enhance the image quality and/or speed up the confocal imaging process.
  • nucleic acids may be determined in 2 or 3 dimensions.
  • the transcriptome of a cell may be determined.
  • Certain embodiments are directed to determining nucleic acids, such as mRNA, within cells at relatively high resolutions.
  • a plurality of nucleic acid probes may be applied to a sample, and their binding within the sample determined, e.g., using confocal microscopy, to determine locations of the nucleic acid probes within the sample.
  • the sample may include a cell culture, a suspension of cells, a biological tissue, a biopsy, an organism, or the like.
  • the sample may also be cell-free but nevertheless contain nucleic acids.
  • the cell may be a human cell, or any other suitable cell, e.g., a mammalian cell, a fish cell, an insect cell, a plant cell, or the like. More than one cell may be present in some cases.
  • Confocal microscopy can be used, in certain aspects, to analyze relatively thick samples, e.g., samples having a thickness of at least 10 micrometers, at least 20 micrometers, at least 30 micrometers, at least 40 micrometers, at least 50 micrometers, at least 60 micrometers, at least 70 micrometers, at least 80 micrometers, at least 90 micrometers, at least 100 micrometers, at least 110 micrometers, at least 125 micrometers, at least 150 micrometers, at least 175 micrometers, at least 200 micrometers, at least 225 micrometers, at least 250 micrometers, at least 300 micrometers, at least 350 micrometers, at least 400 micrometers, at least 450 micrometers, at least 500 micrometers, etc.
  • confocal microscopy uses a spatial pinhole, a spinning disk, or other techniques to block out-of-focus light in image formation.
  • capturing images at different focal depths in a sample may allow for the reconstruction of three-dimensional images of the sample. For instance, different images can be obtained of a sample with different depths.
  • the images may be spaced apart (e.g., in the z- axis) by at least 1 micrometer, at least 2 micrometers, at least 3 micrometers, at least 4 micrometers, at least 5 micrometers, at least 6 micrometers, at least 7 micrometers, at least 8 micrometers, at least 9 micrometers, at least 10 micrometers, at least 12 micrometers, at least 15 micrometers, at least 20 micrometers, etc.
  • the confocal microscopy may be a laser scanning confocal microscope, a spinning disk confocal microscope, a dual spinning-disk confocal microscope, a programmable array microscope (e.g., that may use an electronically controlled spatial light modulator to produce a set of moving pinholes), or the like.
  • one or more images that are acquired can be enhanced using machine learning or artificial intelligence techniques such as deep learning, graph convolutional networks, reinforcement learning, neural networks, recurrent neural networks, or the like.
  • machine learning techniques may involve training a machine learning model on a training set of images, e.g., acquired as discussed herein, then using the trained machine learning model to enhance images.
  • a model may be trained using a data set having suitable properties (for example, precision, clarity of image, edges in image, color, shape, etc.).
  • the training data may be labelled, for example, to allow for supervised learning, unsupervised learning, reinforcement techniques, or the like.
  • one or images may be enhanced by reducing blur, increase edging definition, improving boundary definition, improving contrast, or the like.
  • the output image may have enhanced properties or image segments with enhanced properties. Examples include, but are not limited to, the enhancement of signal-to-noise ratio or signal- to-background ratio.
  • the trained machine learning model can be updated, e.g., based on evaluation of the enhanced images.
  • the output of the machine learning model e.g., one or more enhanced images
  • updating the trained machine learning model may be performed via reinforcement learning, or other techniques.
  • a pre-trained model may be used that is able to take an input and produce an enhanced image output.
  • Various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms, which may in some cases be used to improve the underlying computer system on which it is implemented. Additionally, such software may be written using any of numerous suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a virtual machine or a suitable framework.
  • inventive concepts described herein may be embodied as at least one non-transitory computer readable storage medium (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, etc.) encoded with one or more programs that, when executed on one or more computers or other processors, implement the various embodiments.
  • the non-transitory computer-readable medium or media may be transportable, such that the program or programs stored thereon may be loaded onto any computer resource to implement various embodiments as discussed above.
  • program software
  • application application
  • program software
  • application application
  • program any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above.
  • one or more computer programs that when executed perform methods such as discussed herein need not reside on a single computer or processor, but may be distributed in a modular fashion among different computers or processors to implement various embodiments.
  • data structures may be stored in non-transitory computer-readable storage media in any suitable form.
  • Data structures may have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a non-transitory computer-readable medium that convey relationship between the fields.
  • any suitable mechanism may be used to establish relationships among information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationships among data elements.
  • the sample may be immobilized or embedded within a polymer or a gel, partially or completely.
  • the sample may be embedded within a relatively large polymer or gel, which can then be sectioned or sliced in some cases to produce smaller portions for analysis, e.g., using various microtomy techniques commonly available to those of ordinary skill in the art.
  • tissues or organs may be immobilized within a suitable polymer or gel.
  • the polymer may be selected to be relatively optically transparent.
  • the polymer may also be one that does not significantly distort during the polymerization process, although in some cases, the polymer may exhibit some distortion.
  • the amount of distortion may be determined as a relative change in size that is less than 5, less than 4, less than 3, less than 2, less than 1.5, less than 1.3, or less than 1.2 (i.e., a change in size of 2 means that a sample doubles in linear dimension), or inverses of these (i.e., an inverse change in size of 2 means that a sample halves in linear dimensions).
  • the gel may be prepared at relatively low temperatures, e.g., less than 25 °C, less than 20 °C, less than 15 °C, less than 10 °C, less than 8 °C, less than 6 °C, less than 4 °C, less than 2 °C, less than 0 °C, etc.
  • the gel may be prepared using relatively low concentrations of initiator, e.g., less than 1% vol/vol, less than 0.5% vol/vol, less than 0.2% vol/vol, or less than 0.1% vol/vol.
  • Non-limiting examples of suitable polymers include polyacrylamide and agarose.
  • the polymer is a gel or a hydrogel.
  • a variety of polymers could be used in various embodiments that involve chemical cross links between gel subunits, including but not limited to acrylic acid, acrylamide, ethylene glycol diacrylate, ethylene glycol dimetharcrylate, poly (ethylene glycol dimethacrylate); and/or hydrophobic or hydrogen bonding interactions, such as poly(A-isopropyl acrylamide), methyl cellulose, (ethylene oxide)-(propylene oxide)-(ethylene oxide terpolymers, sodium alginate, poly(vinyl alcohol), alignate, chitosan, gum Arabic, gelatin, and agarose.
  • anchor probes may be used during the polymerization process.
  • the anchor probes may include a portion that is able to polymerize with the polymer during the polymerization process, and is able to immobilize a target, e.g., chemically and/or physically.
  • a target e.g., chemically and/or physically.
  • the anchor probe may include an acrydite portion that can polymerize and become incorporated into the polymer.
  • the anchor probe may also contain a portion that can interact with and bind to nucleic acid molecules, or other molecules in which immobilization is desired, e.g., proteins or lipids, other desired targets, etc.
  • the immobilization may be covalent or non-covalent.
  • the anchor probe may comprise a nucleic acid comprising an acrydite portion (e.g., at the 5’ end, the 3’ end, an internal base, etc.) and a nucleic acid sequence substantially complementary to at least a portion of the target nucleic acid.
  • the nucleic acid may be complementary to at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more nucleotides of the nucleic acid.
  • the complementarity may be exact (Watson-Crick complementarity), or there may be 1, 2, or more mismatches.
  • the anchor probe can be configured to immobilize mRNA, e.g., in the case of transcriptome analysis.
  • the anchor probe may contain a plurality of thymine nucleotides, e.g., sequentially, for binding to the poly-A tail of an mRNA.
  • the anchor probe can have at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more consecutive thymine nucleotides (SEQ ID NO: 2) (e.g., a poly-dT portion) within the anchor probe.
  • SEQ ID NO: 2 e.g., a poly-dT portion
  • at least some of the thymine nucleotides may be “locked” thymine nucleotides. These may comprise at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, or at least 80% of these thymine nucleotides.
  • the locked and non-locked nucleotides may alternate. Such locked thymine nucleotides may be useful, for example, to stabilize the hybridization of the poly-A tails of the mRNA with the anchor probe.
  • nucleic acids such as DNA or RNA may be immobilized by covalent bonding.
  • an alkylating agent may be used that covalently binds to RNA or DNA and contains a second chemical moiety that can be incorporated into the polyacrylamide as it is polymerized.
  • the terminal ribose in an RNA molecule may be oxidized using sodium periodate (or another oxidizing agent) to produce an aldehyde, which may be crosslinked to acrylamide, or other polymer or gel.
  • chemical agents that are able to modify bases may be used, such as aldehydes, e.g. paraformaldehyde or gluteraldehyde, alkylating agents, or succinimidyl-containing groups; chemical agents that modify the terminal phosphate, such as carboiimides, e.g., EDC (l-ethyl-3-(3- dimethylaminopropyl)carbodiimide); chemical agents that modify internal sugars, such as p- maleimido-phenyl isocyanate; or chemical agents that modify terminal sugars, such as sodium periodate.
  • these chemical agents can carry a second chemical moiety that can then be directly cross-linked to the gel or polymer, and/or which can be further modified with a compound that can be directly cross linked to the gel or polymer.
  • a nucleic acid may be immobilized using anchor probes having substantially complementary portions to the DNA or RNA. There may be 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50 or more complementary nucleotides between the anchor probe and the nucleic acid.
  • the nucleic acids may be physically tangled within the polymer or gel, e.g., due to their length, and, thus, unable to diffuse from their original location within the gel.
  • Similar anchor probes may be used to immobilize other components to a polymer or gel, in other embodiments.
  • an antibody able to specifically bind to a suitable target e.g., another protein, a lipid, a carbohydrate, a virus, etc.
  • a suitable target e.g., another protein, a lipid, a carbohydrate, a virus, etc.
  • an acrydite moiety that can become incorporated within a polymer or gel.
  • the embedding of the sample within the matrix and the immobilization of nucleic acids (or other desired targets) may be performed in any suitable order in various embodiments. For instance, immobilization may occur before, during, or after embedding of the sample. In some cases, the target may be chemically modified or reacted to cross-link to the gel or polymer before or during formation of the gel or polymer.
  • Such clearance may include removal of the components, and/or degradation of the components (e.g., to smaller components, components that are not fluorescent, etc.) that are not the desired target. In some cases, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the undesired components within the sample may be cleared. Multiple clearance steps can also be performed in certain embodiments, e.g., to remove various undesired components. As discussed, it is believed that the removal of such components may decrease background during analysis (for example, by decreasing background and/or off-target binding), while desired components (such as nucleic acids) can be immobilized and thus not cleared.
  • proteins may be cleared from the sample using enzymes, denaturants, chelating agents, chemical agents, and the like, which may break down the proteins into smaller components and/or amino acids. These smaller components may be easier to remove physically, and/or may be sufficiently small or inert such that they do not significantly affect the background.
  • lipids may be cleared from the sample using surfactants or the like. In some cases, one or more of these are used, e.g., simultaneously or sequentially.
  • suitable enzymes include proteinases such as proteinase K, proteases or peptidases, or digestive enzymes such as trypsin, pepsin, or chymotrypsin.
  • Non-limiting examples of suitable denaturants include guanidine HC1, acetone, acetic acid, urea, or lithium perchlorate.
  • Non-limiting examples of chemical agents able to denature proteins include solvents such as phenol, chloroform, guanidinium isocyananate, urea, formamide, etc.
  • Nonlimiting examples of surfactants include Triton X-100 (polyethylene glycol p-(l, 1,3,3- tetramethylbutyl) -phenyl ether), SDS (sodium dodecyl sulfate), Igepal CA-630, or poloxamers.
  • Non-limiting examples of chelating agents include ethylenediaminetetraacetic acid (EDTA), citrate, or polyaspartic acid.
  • EDTA ethylenediaminetetraacetic acid
  • citrate citrate
  • polyaspartic acid a buffer solution
  • Tris or tris(hydroxymethyl)aminomethane a buffer solution
  • Non-limiting examples of DNA enzymes that may be used to remove DNA include DNase I, dsDNase, a variety of restriction enzymes, etc.
  • Non-limiting examples of techniques to clear RNA include RNA enzymes such as RNase A, RNase T, or RNase H, or chemical agents, e.g., via alkaline hydrolysis (for example, by increasing the pH to greater than 10).
  • Non-limiting examples of systems to remove sugars or extracellular matrix include enzymes such as chitinase, heparinases, or other glycosylases.
  • Non-limiting examples of systems to remove lipids include enzymes such as lipidases, chemical agents such as alcohols (e.g., methanol or ethanol), or detergents such as Triton X-100 or sodium dodecyl sulfate. Many of these are readily available commercially. In this way, the background of the sample may be removed, which may facilitate analysis of the nucleic acid probes or other desired targets, e.g., using fluorescence microscopy, or other techniques as discussed herein. As mentioned, in various embodiments, various targets (e.g., nucleic acids, certain proteins, lipids, viruses, or the like) may be immobilized, while other non-targets may be cleared using suitable agents or enzymes. As a non-limiting example, if a protein (such as an antibody) is immobilized, then RNA enzymes, DNA enzymes, systems to remove lipids, sugars, etc. may be used.
  • the nucleic acids to be determined may be, for example, DNA, RNA, epigenetic elements, or other nucleic acids that are present within a cell (or other sample).
  • the nucleic acids may be endogenous to the cell, or added to the cell.
  • the nucleic acid may be viral, or artificially created.
  • the nucleic acid to be determined may be expressed by the cell.
  • the nucleic acid is RNA in some embodiments.
  • the RNA may be coding and/or non-coding RNA.
  • Non-limiting examples of RNA that may be studied within the cell include mRNA, siRNA, rRNA, miRNA, tRNA, IncRNA, snoRNAs, snRNAs, exRNAs, piRNAs, or the like.
  • RNA present within a cell may be determined so as to produce a partial or complete transcriptome of the cell.
  • at least 4 types of mRNAs are determined within a cell, and in some cases, at least 3, at least 4, at least 7, at least 8, at least 12, at least 14, at least 15, at least 16, at least 22, at least 30, at least 31, at least 32, at least 50, at least 63, at least 64, at least 72, at least 75, at least 100, at least 127, at least 128, at least 140, at least 255, at least 256, at least 500, at least 1,000, at least 1,500, at least 2,000, at least 2,500, at least 3,000, at least 4,000, at least 5,000, at least 7,500, at least 10,000, at least 12,000, at least 15,000, at least 20,000, at least 25,000, at least 30,000, at least 40,000, at least 50,000, at least 75,000, or at
  • the transcriptome of a cell may be determined. It should be understood that the transcriptome generally encompasses all RNA molecules produced within a cell, not just mRNA. Thus, for instance, the transcriptome may also include rRNA, tRNA, siRNA, etc. In some embodiments, at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100% of the transcriptome of a cell may be determined.
  • the determination of one or more nucleic acids within the cell or other sample may be qualitative and/or quantitative. In addition, the determination may also be spatial, e.g., the position of the nucleic acid within the cell or other sample may be determined in two or three dimensions. In some embodiments, the positions, number, and/or concentrations of nucleic acids within the cell (or other sample) may be determined. In some cases, a significant portion of the genome of a cell may be determined. The determined genomic segments may be continuous or interspersed on the genome.
  • At least 4 genomic segments are determined within a cell, and in some cases, at least 3, at least 4, at least 7, at least 8, at least 12, at least 14, at least 15, at least 16, at least 22, at least 30, at least 31, at least 32, at least 50, at least 63, at least 64, at least 72, at least 75, at least 100, at least 127, at least 128, at least 140, at least 255, at least 256, at least 500, at least 1,000, at least 1,500, at least 2,000, at least 2,500, at least 3,000, at least 4,000, at least 5,000, at least 7,500, at least 10,000, at least 12,000, at least 15,000, at least 20,000, at least 25,000, at least 30,000, at least 40,000, at least 50,000, at least 75,000, or at least 100,000 genomic segments may be determined within a cell.
  • the entire genome of a cell may be determined. It should be understood that the genome generally encompasses all DNA molecules produced within a cell, not just chromosome DNA. Thus, for instance, the genome may also include, in some cases, mitochondria DNA, chloroplast DNA, plasmid DNA, etc. In some embodiments, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, or 100% of the genome of a cell may be determined.
  • the epigenome of a cell may be determined.
  • the epigenome may encompass chromatin with chemically modified DNA or chemically modified histones, or chromatin with other DNA-binding proteins. In some embodiments, at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, or 100% of the epigenome of a cell may be determined.
  • certain aspects are directed to systems and methods that allow the copy numbers and spatial localizations of thousands of RNA species, genomic segments, and/or epigenetic elements to be determined in single cells.
  • Some of these techniques are called Multiplexed Error-Robust Fluorescence in situ Hybridization or “MERFISH,” e.g., as would be known to those of ordinary skill in the art. See, e.g., U.S. Pat. No. 11,098,303, incorporated herein by reference in its entirety. In some cases, error correction can also be used.
  • codewords may be based on the binding of the plurality of nucleic acid probes, and in some cases, the codewords may define an errorcorrecting code to reduce or prevent misidentification of the nucleic acids.
  • a relatively large number of different targets may be identified using a relatively small number of labels, e.g., by using various combinatorial approaches.
  • Using error-robust encoding schemes may allow the imaging of hundreds to thousands of RNA species, genomic segments, and/or epigenetic elements in a sample.
  • nucleic acid probes may be used to determine one or more nucleic acids within a cell or other sample.
  • the probes may comprise nucleic acids (or entities that can hybridize to a nucleic acid, e.g., specifically) such as DNA, RNA, LNA (locked nucleic acids), PNA (peptide nucleic acids), or combinations thereof.
  • additional components may also be present within the nucleic acid probes, e.g., as discussed below. Any suitable method may be used to introduce nucleic acid probes into a cell.
  • the cell is fixed prior to introducing the nucleic acid probes, e.g., to preserve the positions of the nucleic acids within the cell.
  • Techniques for fixing cells are known to those of ordinary skill in the art.
  • a cell may be fixed using chemicals such as formaldehyde, paraformaldehyde, glutaraldehyde, ethanol, methanol, acetone, acetic acid, or the like.
  • a cell may be fixed using Hepes-glutamic acid buffer- mediated organic solvent (HOPE).
  • HOPE Hepes-glutamic acid buffer- mediated organic solvent
  • the nucleic acid probes may be introduced into the cell (or other sample) using any suitable method.
  • the cell may be sufficiently permeabilized such that the nucleic acid probes may be introduced into the cell by flowing a fluid containing the nucleic acid probes around the cells.
  • the cells may be sufficiently permeabilized as part of a fixation process; in other embodiments, cells may be permeabilized by exposure to certain chemicals such as ethanol, methanol, Triton, or the like.
  • techniques such as electroporation or microinjection may be used to introduce nucleic acid probes into a cell or other sample.
  • nucleic acid probes that are introduced into a cell (or other sample).
  • the probes may comprise any of a variety of entities that can hybridize to a nucleic acid, typically by Watson-Crick base pairing, such as DNA, RNA, LNA, PNA, etc., depending on the application.
  • the nucleic acid probe typically contains a target sequence that is able to bind to at least a portion of a target nucleic acid, in some cases specifically.
  • the target system may be able to bind to a specific target nucleic acid (e.g., an mRNA, or other nucleic acids as discussed herein).
  • the nucleic acid probes may be determined using signaling entities (e.g., as discussed below), and/or by using secondary nucleic acid probes able to bind to the nucleic acid probes (i.e., to primary nucleic acid probes). The determination of such nucleic acid probes is discussed in detail below.
  • more than one type of (primary) nucleic acid probe may be applied to a sample, e.g., simultaneously.
  • the target sequence may be positioned anywhere within the nucleic acid probe (or primary nucleic acid probe or encoding nucleic acid probe).
  • the target sequence may contain a region that is substantially complementary to a portion of a target nucleic acid.
  • the portions may be at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% complementary.
  • the target sequence may be at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50, at least 60, at least 65, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 250, at least 300, at least 350, at least 400, or at least 450 nucleotides in length.
  • the target sequence may be no more than 500, no more than 450, no more than 400, no more than 350, no more than 300, no more than 250, no more than 200, no more than 175, no more than 150, no more than 125, no more than 100, be no more than 75, no more than 60, no more than 65, no more than 60, no more than 55, no more than 50, no more than 45, no more than 40, no more than 35, no more than 30, no more than 20, or no more than 10 nucleotides in length.
  • the target sequence may have a length of between 10 and 30 nucleotides, between 20 and 40 nucleotides, between 5 and 50 nucleotides, between 10 and 200 nucleotides, or between 25 and 35 nucleotides, between 10 and 300 nucleotides, etc.
  • complementarity is determined on the basis of Watson-Crick nucleotide base pairing.
  • the target sequence of a (primary) nucleic acid probe may be determined with reference to a target nucleic acid suspected of being present within a cell or other sample.
  • a target nucleic acid to a protein may be determined using the protein’s sequence, by determining the nucleic acids that are expressed to form the protein.
  • only a portion of the nucleic acids encoding the protein are used, e.g., having the lengths as discussed above.
  • more than one target sequence that can be used to identify a particular target may be used. For instance, multiple probes can be used, sequentially and/or simultaneously, that can bind to or hybridize to different regions of the same target.
  • Hybridization typically refers to an annealing process by which complementary single- stranded nucleic acids associate through Watson-Crick nucleotide base pairing (e.g., hydrogen bonding, guanine-cytosine and adenine-thymine) to form doublestranded nucleic acid.
  • Watson-Crick nucleotide base pairing e.g., hydrogen bonding, guanine-cytosine and adenine-thymine
  • a nucleic acid probe such as a primary nucleic acid probe, may also comprise one or more “read” sequences. However, it should be understood that read sequences are not necessary in all cases.
  • the nucleic acid probe may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 or more, 20 or more, 32 or more, 40 or more, 50 or more, 64 or more, 75 or more, 100 or more, 128 or more read sequences.
  • the read sequences may be positioned anywhere within the nucleic acid probe. If more than one read sequence is present, the read sequences may be positioned next to each other, and/or interspersed with other sequences.
  • the read sequences may be of any length. If more than one read sequence is used, the read sequences may independently have the same or different lengths. For instance, the read sequence may be at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50, at least 60, at least 65, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 250, at least 300, at least 350, at least 400, or at least 450 nucleotides in length.
  • the read sequence may be no more than 500, no more than 450, no more than 400, no more than 350, no more than 300, no more than 250, no more than 200, no more than 175, no more than 150, no more than 125, no more than 100, be no more than 75, no more than 60, no more than 65, no more than 60, no more than 55, no more than 50, no more than 45, no more than 40, no more than 35, no more than 30, no more than 20, or no more than 10 nucleotides in length.
  • the read sequence may have a length of between 10 and 30 nucleotides, between 20 and 40 nucleotides, between 5 and 50 nucleotides, between 10 and 200 nucleotides, or between 25 and 35 nucleotides, between 10 and 300 nucleotides, etc.
  • the read sequence may be arbitrary or random in some embodiments.
  • the read sequences are chosen so as to reduce or minimize homology with other components of the cell or other sample, e.g., such that the read sequences do not themselves bind to or hybridize with other nucleic acids suspected of being within the cell or other sample.
  • the homology may be less than 10%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, or less than 1%.
  • the basepairs are sequential.
  • a population of nucleic acid probes may contain a certain number of read sequences, which may be less than the number of targets of the nucleic acid probes in some cases.
  • Those of ordinary skill in the art will be aware that if there is one signaling entity and n read sequences, then in general 2"-l different nucleic acid targets may be uniquely identified. However, not all possible combinations need be used.
  • a population of nucleic acid probes may target 12 different nucleic acid sequences, yet contain no more than 8 read sequences.
  • a population of nucleic acids may target 140 different nucleic acid species, yet contain no more than 16 read sequences.
  • each probe may contain 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, etc. or more read sequences.
  • a population of nucleic acid probes may each contain the same number of read sequences, although in other cases, there may be different numbers of read sequences present on the various probes.
  • a first nucleic acid probe may contain a first target sequence, a first read sequence, and a second read sequence
  • a second, different nucleic acid probe may contain a second target sequence, the same first read sequence, but a third read sequence instead of the second read sequence.
  • Such probes may thereby be distinguished by determining the various read sequences present or associated with a given probe or location, as discussed herein.
  • nucleic acid probes (and their corresponding, complimentary sites on the encoding probes), in certain embodiments, may be made using only 2 or only 3 of the 4 bases, such as leaving out all the “G”s or leaving out all of the “C”s within the probe. Sequences lacking either “G”s or “C”s may form very little secondary structure in certain embodiments, and can contribute to more uniform, faster hybridization.
  • the nucleic acid probe may contain a signaling entity. It should be understood that signaling entities are not required in all cases, however; for instance, the nucleic acid probe may be determined using secondary nucleic acid probes in some embodiments, as is discussed in additional detail below.
  • primer sequences may be present, e.g., to allow for enzymatic amplification of probes.
  • primer sequences suitable for applications such as amplification (e.g., using PCR or other suitable techniques). Many such primer sequences are available commercially.
  • sequences that may be present within a primary nucleic acid probe include, but are not limited to promoter sequences, operons, identification sequences, nonsense sequences, or the like.
  • a primer is a single-stranded or partially double-stranded nucleic acid (e.g., DNA) that serves as a starting point for nucleic acid synthesis, allowing polymerase enzymes such as nucleic acid polymerase to extend the primer and replicate the complementary strand.
  • a primer is (e.g., is designed to be) complementary to and to hybridize to a target nucleic acid.
  • a primer is a synthetic primer.
  • a primer is a non-naturally-occurring primer.
  • a primer typically has a length of 10 to 50 nucleotides.
  • a primer may have a length of 10 to 40, 10 to 30, 10 to 20, 25 to 50, 15 to 40, 15 to 30, 20 to 50, 20 to 40, or 20 to 30 nucleotides. In some embodiments, a primer has a length of 18 to 24 nucleotides.
  • the components of the nucleic acid probe may be arranged in any suitable order.
  • the components may be arranged in a nucleic acid probe as: primer — read sequences — targeting sequence — read sequences — reverse primer.
  • the “read sequences” in this structure may each contain any number (including 0) of read sequences, so long as at least one read sequence is present in the probe.
  • Non-limiting example structures include primer — targeting sequence — read sequences — reverse primer, primer — read sequences — targeting sequence — reverse primer, targeting sequence — primer — targeting sequence — read sequences — reverse primer, targeting sequence — primer — read sequences — targeting sequence — reverse primer, primer — target sequence — read sequences — targeting sequence — reverse primer, targeting sequence — primer — read sequence — reverse primer, targeting sequence — primer — read sequence — reverse primer, targeting sequence — read sequence — primer, read sequence — targeting sequence — primer, read sequence — primer — targeting sequence — reverse primer, etc.
  • the reverse primer is optional in some embodiments, including in all of the above-described examples.
  • the nucleic acid probes may be directly determined by determining signaling entities (if present), and/or the nucleic acid probes may be determined by using one or more secondary nucleic acid probes, in accordance with certain embodiments.
  • the determination may be spatial, e.g., in two or three dimensions.
  • the determination may be quantitative, e.g., the amount or concentration of a primary nucleic acid probe (and of a target nucleic acid) may be determined.
  • the secondary probes may comprise any of a variety of entities able to hybridize a nucleic acid, e.g., DNA, RNA, LNA, and/or PNA, etc., depending on the application.
  • a secondary nucleic acid probe may contain a recognition sequence able to bind to or hybridize with a read sequence of a primary nucleic acid probe. In some cases, the binding is specific, or the binding may be such that a recognition sequence preferentially binds to or hybridizes with only one of the read sequences that are present.
  • the secondary nucleic acid probe may also contain one or more signaling entities. If more than one secondary nucleic acid probe is used, the signaling entities may be the same or different.
  • the recognition sequences may be of any length, and multiple recognition sequences may be of the same or different lengths. If more than one recognition sequence is used, the recognition sequences may independently have the same or different lengths. For instance, the recognition sequence may be at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, or at least 50 nucleotides in length. In some cases, the recognition sequence may be no more than 75, no more than 60, no more than 65, no more than 60, no more than 55, no more than 50, no more than 45, no more than 40, no more than 35, no more than 30, no more than 20, or no more than 10 nucleotides in length.
  • the recognition sequence may have a length of between 10 and 30, between 20 and 40, or between 25 and 35 nucleotides, etc. In one embodiment, the recognition sequence is of the same length as the read sequence. In addition, in some cases, the recognition sequence may be at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 100% complementary to a read sequence of the primary nucleic acid probe.
  • nucleic acid probes are used that contain various “read sequences.”
  • a population of primary nucleic acid probes may contain certain “read sequences” which can bind certain of the secondary nucleic acid probes, and the locations of the primary nucleic acid probes are determined within the sample using secondary nucleic acid probes, e.g., which comprise a signaling entity.
  • a population of read sequences may be combined in various combinations to produce different nucleic acid probes, e.g., such that a relatively small number of read sequences may be used to produce a relatively large number of different nucleic acid probes.
  • a population of primary nucleic acid probes may each contain a certain number of read sequences, some of which are shared between different primary nucleic acid probes such that the total population of primary nucleic acid probes may contain a certain number of read sequences.
  • a population of nucleic acid probes may have any suitable number of read sequences.
  • a population of primary nucleic acid probes may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 etc. read sequences. More than 20 are also possible in some embodiments.
  • a population of nucleic acid probes may, in total, have 1 or more, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 16 or more, 20 or more, 24 or more, 32 or more, 40 or more, 50 or more, 60 or more, 64 or more, 100 or more, 128 or more, etc. of possible read sequences present, although some or all of the probes may each contain more than one read sequence, as discussed herein.
  • the population of nucleic acid probes may have no more than 100, no more than 80, no more than 64, no more than 60, no more than 50, no more than 40, no more than 32, no more than 24, no more than 20, no more than 16, no more than 15, no more than 14, no more than 13, no more than 12, no more than 11, no more than 10, no more than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, or no more than two read sequences present. Combinations of any of these are also possible, e.g., a population of nucleic acid probes may comprise between 10 and 15 read sequences in total.
  • the total number of read sequences within the population may be no greater than 4. It should be understood that although 4 read sequences are used in this example for ease of explanation, in other embodiments, larger numbers of nucleic acid probes may be realized, for example, using 5, 8, 10, 16, 32, etc. or more read sequences, or any other suitable number of read sequences described herein, depending on the application. Referring now to Fig.
  • each of the primary nucleic acid probes contains two different read sequences, then by using 4 such read sequences (A, B, C, and D), up to 6 probes may be separately identified.
  • the ordering of read sequences on a nucleic acid probe is not essential, i.e., “AB” and “BA” may be treated as being synonymous (although in other embodiments, the ordering of read sequences may be essential and “AB” and “BA” may not necessarily be synonymous).
  • up to 10 probes may be separately identified, as is shown in Fig. 13B.
  • the read sequences and/or the pattern of binding of nucleic acid probes within a sample may be used to define an error-detecting and/or an errorcorrecting code, for example, to reduce or prevent misidentification or errors of the nucleic acids.
  • an error-detecting and/or an errorcorrecting code for example, to reduce or prevent misidentification or errors of the nucleic acids.
  • the codeword may be subjected to error detection and/or correction.
  • the codewords may be organized such that, if no match is found for a given set of read sequences or binding pattern of nucleic acid probes, then the match may be identified as an error, and optionally, error correction may be applied sequences to determine the correct target for the nucleic acid probes.
  • the codewords may have fewer “letters” or positions that the total number of nucleic acids encoded by the codewords, e.g. where each codeword encodes a different nucleic acid.
  • Such error-detecting and/or the error-correction code may take a variety of forms.
  • a variety of such codes have previously been developed in other contexts such as the telecommunications industry, such as Golay codes or Hamming codes.
  • the read sequences or binding patterns of the nucleic acid probes are assigned such that not every possible combination is assigned.
  • a primary nucleic acid probe contains 2 read sequences
  • up to 6 primary nucleic acid probes could be identified; but the number of primary nucleic acid probes used may be less than 6.
  • different probes may be produced, but the number of primary nucleic acid probes that are used may be any number more or less than In addition, these may be randomly assigned, or assigned in specific ways to increase the ability to detect and/or correct errors.
  • the number of rounds may be arbitrarily chosen. If in each round, each target can give two possible outcomes, such as being detected or not being detected, up to 2" different targets may be possible for n rounds of probes, but the number of nucleic acid targets that are actually used may be any number less than 2". For example, if in each round, each target can give more than two possible outcomes, such as being detected in different color channels, more than 2" (e.g. 3", 4" ...) different targets may be possible for n rounds of probes. In some cases, the number of nucleic acid targets that are actually used may be any number less than this number. In addition, these may be randomly assigned, or assigned in specific ways to increase the ability to detect and/or correct errors.
  • the codewords or nucleic acid probes may be assigned within a code space such that the assignments are separated by a Hamming distance, which measures the number of incorrect “reads” in a given pattern that cause the nucleic acid probe to be misinterpreted as a different valid nucleic acid probe.
  • the Hamming distance may be at least 2, at least 3, at least 4, at least 5, at least 6, or the like.
  • the assignments may be formed as a Hamming code, for instance, a Hamming(7, 4) code, a Hamming(15, 11) code, a Hamming(31, 26) code, a Hamming(63, 57) code, a Hamming(127, 120) code, etc.
  • the assignments may form a SECDED code, e.g., a SECDED(8,4) code, a SECDED(16,4) code, a SCEDED(16, 11) code, a SCEDED(22, 16) code, a SCEDED(39, 32) code, a SCEDED(72, 64) code, etc.
  • the assignments may form an extended binary Golay code, a perfect binary Golay code, or a ternary Golay code.
  • the assignments may represent a subset of the possible values taken from any of the codes described above.
  • a code with the same error correcting properties of the SECDED code may be formed by using only binary words that contain a fixed number of ‘1’ bits, such as 4, to encode the targets.
  • the assignments may represent a subset of the possible values taken from codes described above for the purpose of addressing asymmetric readout errors.
  • a code in which the number of ‘1’ bits may be fixed for all used binary words may eliminate the biased measurement of words with different numbers of ‘l’s when the rate at which ‘0’ bits are measured as ‘l’s or ‘1’ bits are measured as ‘0’s are different.
  • the codeword may be compared to the known nucleic acid codewords. If a match is found, then the nucleic acid target can be identified or determined. If no match is found, then an error in the reading of the codeword may be identified. In some cases, error correction can also be applied to determine the correct codeword, and thus resulting in the correct identity of the nucleic acid target. In some cases, the codewords may be selected such that, assuming that there is only one error present, only one possible correct codeword is available, and thus, only one correct identity of the nucleic acid target is possible.
  • this may also be generalized to larger codeword spacings or Hamming distances; for instance, the codewords may be selected such that if two, three, or four errors are present (or more in some cases), only one possible correct codeword is available, and thus, only one correct identity of the nucleic acid targets is possible.
  • signaling entities are determined, e.g., to determine nucleic acid probes and/or to create codewords.
  • signaling entities within a sample may be determined, e.g., spatially, using a variety of techniques.
  • the signaling entities may be fluorescent, and techniques for determining fluorescence within a sample, such as fluorescence microscopy or confocal microscopy, may be used to spatially identify the positions of signaling entities within a cell.
  • the positions of entities within the sample may be determined in two or even three dimensions.
  • more than one signaling entity may be determined at a time (e.g., signaling entities with different colors or emissions), and/or sequentially.
  • the spatial positions of the entities may be determined at relatively high resolutions.
  • the positions may be determined at spatial resolutions of better than about 100 micrometers, better than about 30 micrometers, better than about 10 micrometers, better than about 3 micrometers, better than about 1 micrometer, better than about 800 nm, better than about 600 nm, better than about 500 nm, better than about 400 nm, better than about 300 nm, better than about 200 nm, better than about 100 nm, better than about 90 nm, better than about 80 nm, better than about 70 nm, better than about 60 nm, better than about 50 nm, better than about 40 nm, better than about 30 nm, better than about 20 nm, or better than about 10 nm, etc.
  • Non-limiting examples include STORM (stochastic optical reconstruction microscopy), STED (stimulated emission depletion microscopy), NSOM (Near-field Scanning Optical Microscopy), 4Pi microscopy, SIM (Structured Illumination Microscopy), SMI (Spatially Modulated Illumination) microscopy, RESOLFT (Reversible Saturable Optically Linear Fluorescence Transition Microscopy), GSD (Ground State Depletion Microscopy), SSIM (Saturated Structured-Illumination Microscopy), SPDM (Spectral Precision Distance Microscopy), Photo-Activated Localization Microscopy (PALM), Fluorescence Photoactivation Localization Microscopy (FPALM), LIMON (3D Light Microscopical Nanosizing Microscopy), Super-resolution optical fluctuation imaging (SOFI), or the like.
  • This example presents a method to allow three-dimensional (3D) imaging of thick tissue specimens by combination of confocal microscopy for optical sectioning and deep learning for increasing imaging speed and quality.
  • This example also presents a method to allow three-dimensional (3D) single-cell genome- scale imaging of nucleic acids such as RNA, DNA, and/or epigenetic elements in thick tissue specimens by integrating MERFISH with confocal microscopy and deep learning.
  • This example demonstrates 3D MERFISH on mouse brain tissue sections of up to 200 micrometer thickness with high detection efficiency and accuracy. It is expected that 3D MERFISH imaging of thick tissues, and 3D thick- tissue imaging in general, will facilitate a wide range of biological applications.
  • the approach in this example addresses the above-described challenges by using spinning disk confocal microscopy to eliminate out-of-focus fluorescence background, exploring deep learning to speed up confocal imaging process, utilizing an index-matched objective to remove depth-induced spherical aberration, and optimizing the MERFISH protocol for thick-tissue imaging.
  • this example presents a demonstration for determining RNA species in thick tissue samples, it should be understood that the present disclosure is not so limited, and this is generalizable, for example, to measure protein and/or other nucleic acids such as DNA and epigenetic elements, e.g., as discussed herein.
  • This example uses spinning disk confocal microscopy to achieve optical sectioning and eliminate the out-of-focus fluorescence background, which significantly improved the signal-to-noise ratio (SNR) in the MERFISH images of thick tissue sections (Fig. 4).
  • SNR signal-to-noise ratio
  • the spinning-disk confocal detection geometry also cuts a large amount of in-focus fluorescence signals, and hence a substantially longer exposure time per imaging frame or higher illumination light intensity is required to achieve a high SNR for imaging individual RNA molecules in the sample. This led to either a drastic reduction in imaging speed or a substantial photobleaching of out-of-focus fluorophores before they were imaged.
  • Deep learning which has been used to improve the quality of fluorescence microscopy images in a variety of applications, could potentially improve the SNR of confocal MERFISH images acquired at high speed or with low illumination intensity.
  • MERFISH was performed to measure 242 genes in the mouse cortex sections, imaging the same fields of view (FOVs) with both low (1 sec) and high (0.1 sec) frame rates to obtain high and low SNR image pairs, respectively.
  • FOVs fields of view
  • the low-SNR MERFISH images acquired at a frame rate of 0.1 -sec led to a substantially lower (4x lower) detection efficiency compared to the high-SNR measurement acquired at the 1-sec exposure time (Figs. 1A and IB).
  • a neural network was trained based on a subset of the short- and long-exposure-time image pairs and subsequently used this model to improve the quality of the remaining images taken with short exposure time.
  • This deep-learning approach indeed improved the SNR of the 0.1-sec images substantially (Fig. 1C).
  • the detection accuracy of MERFISH images acquired at 0.1-sec frame rate were increased and became nearly identical to that measured at 1-sec frame rate (Fig. ID).
  • This advance allowed the acquisition of high-quality confocal MERFISH images at fast speed under low illumination intensity.
  • RNAs were labeled with a library of encoding oligonucleotide probes which contains barcodedetermining readout sequences and then the barcodes were detected bit-by-bit with readout oligonucleotide probes conjugated with fluorophores.
  • probe concentrations and incubation time Figs. 6A-6E
  • RNA copy number detected for individual genes per z-plane decreased substantially with the tissue depth and exhibited a poor correlation with bulk RNA-seq data (Figs. 7A-7C). This may be due to the displacement of RNA molecules between imaging rounds in the thick tissue sample which made these molecules difficult to decode and identify from their multi-bit images.
  • fiducial beads embedded in the polyacrylamide gel were imaged, which is used for MERFISH sample embedding. A significant displacement in the positions of beads in all three dimensions (xyz) was observed between imaging rounds, especially in the deeper part of the sample (Fig. 7D).
  • RNA molecules positions between imaging rounds could be attributed to one or more factors.
  • the piezo-actuator used for z- scanning may not have consistently placed the sample at the pre-defined z position during each imaging round.
  • the polyacrylamide gel prone to expansion or shrinkage when buffer conditions alter, may have contributed to the displacement of RNA molecules across imaging rounds due to its inconsistent size changes.
  • the two-color imaging was used to measure two bits in each hybridization round, and the axial chromatic aberration between the two colors, which increases with imaging depth, may have resulted in misalignment of RNA molecules between bits.
  • the gel was allowed to relax for 10 minutes following two rounds of washing using the imaging buffer in order to completely remove any residual cleavage or probe-incubation buffer induced gel expansion (Fig. 8C).
  • the axial chromatic aberration was calibrated by imaging fiducial beads in the two-color channels, which allowed precise alignment of images between these channels.
  • RNA copy number of individual genes detected per unit area per z-plane were compared with the results from 10 micrometers thin-tissue MERFISH measurements performed previously using an epi-fluorescence setup.
  • the detection efficiency of thick-tissue measurements was -20% higher than the thin-tissue measurements (Fig. 2G) which may benefit from the background reduction by confocal optical sectioning.
  • confocal imaging provides reduced depth per z-sectioning compared to epi- fluorescence imaging, the detection efficiency of thick-tissue measurements by 3D MERFISH could potentially be even higher.
  • RNA copy number per cell detected in the 100-micrometer thick samples was compared with the results from individual 10-micrometer thick sections of the same sample (obtained by dividing the 100-micrometer z-range into 10 equal-thickness sections), it was found that the former was two-fold higher than the latter (Fig. 9C). This is believed to be because most cell bodies were completely captured within the 100 micrometerthick tissues, whereas many cells were only partially captured in the 10 micrometer-thick tissue sections.
  • RNA copy numbers of individual genes measured per cell per z-plane at different tissue depths showed excellent correlation with each other with only a slight reduction of the RNA copy number across the entire tissue depth (Figs. 11A, 11B). From the MERFISH-derived single-cell expression profiles, 21 excitatory neuronal clusters, 26 inhibitory neuronal clusters, and 7 non-neuronal cell subclasses were identified in this region (Fig. 3C, 3D; Fig. 12A).
  • FIG. 1A Deep learning enhances performance of confocal MERFISH imaging.
  • FIG. 1A A single-bit 242-gene high-pass-filtered MERFISH confocal image in a brain tissue section taken with an exposure time of 0.1 sec (left) and a magnified view of a single cell marked by the white box in the left image for a closer examination (right).
  • FIG. IB The correlation between the copy number of individual genes detected per field-of-view (FOV) using 0.1-sec frame rate and those obtained using 1-sec frame rate. The median ratio of the copy number and the Pearson correlation coefficient r are shown.
  • FIG. 1C The same image as in (Fig. 1A) but after enhancement of signal-to-noise ratio (SNR) by a deep-learning algorithm.
  • FIG. ID The same as (Fig. IB) but after deep learning was used to enhance the SNR of the 0.1-sec images.
  • FIG. 2A 3D images of DAPI and total polyA mRNA from a single FOV in a 100-micrometer thick mouse brain tissue slice (top), alongside a single z-plane at tissue depth of 50 micrometer marked by the box in the top image (bottom).
  • Fig 2B Maximum projected high-pass-filtered MERFISH bit images of ten consecutive 1-micrometer z-planes, captured for the cells marked in box in Fig. 2A, bottom.
  • Fig. 2C RNA molecules identified in the same region as (Fig. 2B) with RNA molecules shaded by their genetic identities.
  • Fig. 2E The Pearson correlation between RNA copy number for individual genes per z-plane detected by MERFISH and FPKM from bulk RNA- seq at different tissue depths.
  • Fig. 2F Number of detected RNA molecules per FOV at different tissue depths.
  • Fig. 3 Spatial organization of cell types in the mouse cortex and hypothalamus by 3D thick tissue MERFISH.
  • Fig. 3A UMAP visualization of subclasses of cells identified in a 100 micrometer-thick section in the mouse cortex. Cells are shaded by subclass identities.
  • Fig. 3B 3D spatial maps of the identified subclasses of excitatory neurons (left), inhibitory neurons (middle) and non-neuronal cells (right) within the 100-micrometer mouse cortex section.
  • Fig. 3C UMAP visualization of major cell types identified in a 200 micrometerthick section of the mouse anterior hypothalamus. Cells are shaded by cell type identities.
  • 3D 3D spatial maps of the excitatory neuronal (left) and inhibitory neuronal (middle) and non-neuronal cells (right) identified in the 200 micrometer-thick mouse hypothalamus section.
  • Fig. 3E Distributions of the nearest-neighbor distances from cells in individual inhibitory neuronal subclasses to cells in the same subclass (“to self’) or other subclasses (“to other”) in mouse cortices measured by MERFISH in the mouse cortex.
  • Fig. 3F Distributions of the nearest-neighbor distances in the mouse anterior hypothalamus as descried in (Fig. 3E). *FDR ⁇ 0.01 in (Fig. 3E) and (Fig.
  • RNA molecules at different tissue depths in 100- micrometer and 200-micrometer thick brain tissue sections are shown.
  • Fig. 5A Number of RNA molecules detected per FOV at the tissue depths of 10 micrometers and 90 micrometers of the 242-gene MERFISH measurements in a 100-micrometer thick section of the mouse cortex.
  • Fig. 5B Logarithmic distribution of integrated photon counts of individual RNA molecules at the tissue depths of 10 micrometers and 90 micrometers identified in (Fig. 5A).
  • Fig. 5C Number of RNA molecules detected per FOV at tissue depths of 10 micrometers and 190 micrometers of the 156-gene MERFISH measurements in a 200-micrometer thick section of the mouse hypothalamus.
  • Fig. 5D Logarithmic distribution of integrated photon counts of individual RNA molecules at the tissue depths of 10 micrometers and 190 micrometers identified in (Fig. 5C).
  • FIG. 6A Example bit-1 high-pass filtered MERFISH images of a 242-gene MERFISH measurement in a 100-micrometer thick section of mouse cortex stained with different concentrations of encoding probes. The concentration values refer to the concentration of each individual encoding probe.
  • FIG. 6B Distribution of integrated photon counts of individual RNA molecules identified at different encoding probe concentrations. The signals from individual RNA molecules increased with the encoding probe concentration and reached saturation at 1.0 nM per probe. Thus, a 1 nM encoding probe concentrations for staining thick tissue samples was used.
  • FIG. 6A Example bit-1 high-pass filtered MERFISH images of a 242-gene MERFISH measurement in a 100-micrometer thick section of mouse cortex stained with different concentrations of encoding probes. The concentration values refer to the concentration of each individual encoding probe.
  • FIG. 6B Distribution of integrated photon counts of individual RNA molecules identified at different encoding probe concentrations. The signals from individual RNA molecules increased with the
  • Fig. 7 Displacement of RNA molecules between different imaging rounds reduces detection accuracy and efficiency.
  • Fig. 7A Comparison of RNA copy number for individual genes per FOV per z-plane detected in a 242-gene MERFISH measurement of the 100 micrometer-thick mouse cortex tissue section with the FPKM values obtained by bulk RNA- seq. The Pearson correlation coefficient (r) is shown.
  • Fig. 7B Pearson correlation of RNA copy number of individual genes per FOV per zplane detected at different tissue depths in the 100 micrometer-thick section with FPKM measured by bulk RNA-seq.
  • Fig. 7C Total RNA copy number detected per FOV per z-plane at different tissue depths.
  • Fig. 8 A Quantification of gel expansion factor in various buffers used in the MERFISH protocol. The initial gel size was the same as the coverslip and the expansion factor after buffer exchange was determined as the ratio of the gel size after buffer exchange over the coverslip size.
  • Fig. 8B In each round of MERFISH imaging, the sample is subjected to treatment with the readout probe within a wash buffer (either 10% ethylene carbonate EC or 10% Formamide) which is left to hybridize for a duration of 15 minutes.
  • a wash buffer either 10% ethylene carbonate EC or 10% Formamide
  • the sample was rinsed off with the wash buffer to eliminate any excessive readout probes, followed by a treatment with imaging buffer (either glucose-based or Bacterial Protocatechuate 3,4-Dioxygenase rPCO-based imaging buffer).
  • imaging buffer either glucose-based or Bacterial Protocatechuate 3,4-Dioxygenase rPCO-based imaging buffer.
  • the sample is treated with Tris(2- carboxyethyl) phosphine (TCEP) cleave buffer to eliminate fluorescent signals, and finally washed with a solution of 2X Saline-Sodium Citrate (SSC) to eliminate cleavage buffer.
  • SSC 2X Saline-Sodium Citrate
  • Dash line highlights the expansion factor for 2X SSC which is the base for all other buffers, i.e. all other buffers contain 2X SSC.
  • Fig. 8C XZ projection images of fiducial beads embedded in a gel undergoing buffer exchange for the indicated time period. Wash buffer containing 15% EC in 2X SSC causes noticeable gel distortion, which is recoverable after 15 min treatment of 2X SSC.
  • FIG. 9 3D MERFISH imaging of 242 genes in a 100 micrometer-thick section of the mouse cortex.
  • Fig. 9A Example images of decoded RNA molecules at different tissue depths. Each image shows decoded barcodes in a 10-micrometer thick z-range, as indicated. Bottom panels show the zoom-in of the region marked by white boxes in the top panels for closer examination. Identified RNA molecules are shaded by its genetic identities.
  • Fig. 9B DAPI (left) and polyA mRNA (middle) images of an example field of view (FOV), which were used for cell segmentation. Cell boundary segmentation determined using a deep learning-based segmentation algorithm (Cellpose 2.0) is shown in the right panel.
  • Fig. 9A Example images of decoded RNA molecules at different tissue depths. Each image shows decoded barcodes in a 10-micrometer thick z-range, as indicated. Bottom panels show the zoom-in of the region marked by white boxes in the top panels for closer examination. Identified
  • Fig. 10 Cell type identified in the 100-micrometer thick section in the mouse cortex. UMAP visualization of excitatory (left) and inhibitory (right) neuronal clusters colored by their cluster identities identified in the mouse cortex.
  • Fig. 11 A The median RNA copy numbers per cell along the tissue depth of the 200-micrometer thick section in the mouse hypothalamus. The first and last 10 micrometers were excluded from the analysis due to their incomplete cell coverage.
  • Fig. 1 IB The Pearson correlation coefficient of the RNA copy numbers for individual genes along tissue depth versus those of the initial 1-micrometer of the 200-micrometer thick section.
  • Fig. 12 Cell type identified in the 200-micrometer thick section in the mouse hypothalamus and their spatial organization.
  • Fig. 12A UMAP visualization of excitatory and inhibitory neuronal clusters colored by their cluster identities identified in the mouse anterior hypothalamus in a 200-micrometer thick section.
  • Fig. 12B 2D spatial visualization of individual excitatory and inhibitory neuronal clusters. The hypothalamus nuclei in which each cluster is localized, and top 1 or 2 notable genes of each cluster are listed for individual clusters. Three clusters, specifically E20, II, and 15, along with the corresponding nucleus in which they are localized, are marked by dashed lines.
  • mice Animals. Adult C57BL/6J male mice aged 7-9 weeks were used in this study. Mice were maintained on a 12-h light/ 12-h dark cycle (12:00 noon to 12:00 midnight dark period), at a temperature of 22 +/- 1 °C, a humidity of 30-70%, with ad libitum access to food and water. Animal care and experiments were carried out in accordance with NIH guidelines and were approved by the Harvard University Institutional Animal Care and Use Committee (IACUC).
  • IACUC Harvard University Institutional Animal Care and Use Committee
  • Tissue preparation for 3D MERFISH Mice, aged 7-9 weeks, were deeply anesthetized using isoflurane. Subsequently, a transcardial perfusion was performed with phosphate buffered saline (PBS), followed by a 4% paraformaldehyde (PFA) solution. The brain tissue was then carefully dissected and subjected to a post-fixation process in a 4% PFA solution, which was carried out overnight at 4 °C. Following this, the brain tissue was thoroughly rinsed with PBS. Sections of 100 or 200-micrometer thickness were then prepared by embedding the brain tissue in 4% low melting point agarose (Thermo Fisher Scientific, 16520-050).
  • the sections were obtained using a Vibratome (Leica). Finally, these sections were collected in lx PBS and preserved in 70% ethanol. The sample was stored at a temperature of 4 °C, and the sections were left to rest overnight for permeabilization in 70% ethanol.
  • the encoding-probe mixture comprised approximately 1 nM of each encoding probe, 1 micromolar of a polyA-anchor probe (IDT), 0.1% wt/v yeast tRNA (15401-011, Life Technologies) and 10% v/v dextran sulfate (D8906, Sigma) in the encoding-probe wash buffer. The sample was incubated at 37 °C for 24-48h.
  • IDT polyA-anchor probe
  • D8906 10% v/v dextran sulfate
  • the polyAanchor probe sequence (/5Acryd/TTGAGTGGATGGAGTGTAATT+TT+TT+TT+TT+TT+TT+TT+TT+T (SEQ ID NO: 1)) contained a mixture of DNA and LNA nucleotides, where T+ is locked nucleic acid, and /5Acryd/ is a 5' acrydite modification.
  • the polyA anchor allows polyadenylated mRNAs to be anchored to the polyacrylamide gel during hydrogel embedding step as described below. After hybridization, the sample was washed three times for 20 min each at 47 °C in encoding-probe wash buffer to rinse off excessive probes, then three times in 2x SSC at room temperature.
  • the sample was embedded in a hydrogel to clear the tissue background and remove off-target probe binding.
  • the sample was incubated in monomer solution with 2 M NaCl, 4% (vol/vol) of 19:1 acrylamide/bisacrylamide, 60 mM Tris-HCl pH 8, and 0.2% (vol/vol) TEMED for 30 minutes at room temperature. Then 100 microliters of ice-cold monomer solution containing 0.2% (vol/vol) 488 nm fiducial beads (Invitrogen) was placed onto a 40-mm silane coated coverslip. The silane modification procedure allowed the hydrogel to covalently couple to the coverslip surface as described previously.
  • the tissue was gently transferred to the coverslip using a brush and flattened, and the excess monomer solution was carefully aspirated.
  • a hydrophobic glass plate treated with GelSlick Lionza
  • the coverslip bearing the flattened sample was then inverted onto the droplet to form a uniform layer of monomer solution.
  • a 50 g weight was placed on the top of the coverslip to ensure the tissue remained flat and fully attached to the coverslip.
  • the sample was allowed to polymerize completely for a minimum of 1 hour at room temperature which allowed mobile sample slice to be fully attached to the coverslip.
  • the coverslip bearing the polymerized sample was then removed from the glass plate with a thin razor blade.
  • the sample was then incubated in digestion buffer containing 2% (wt/vol) Sodium dodecyl sulfate (SDS) (ThermoFisher), 0.5% (vol/vol) Triton X-100 (ThermoFisher), and 1% (vol/vol) Proteinase K (New England Biolabs) in 2x SSC for 24 hours at 37 °C.
  • SDS Sodium dodecyl sulfate
  • Triton X-100 ThermoFisher
  • Proteinase K New England Biolabs
  • MERFISH encoding and readout probes Two sets of MERFISH encoding probes were used in this study: a MERFISH encoding probe set previously designed to target 242 genes in the mouse primary motor cortex, and another set targeting 156 genes in the hypothalamic preoptic area of the mouse brain. In previous studies of hypothalamic preoptic area, 135 out of 156 genes were imaged using combinatorial MERFISH imaging while the remaining 20 genes were measured in sequential rounds of multicolor smFISH. In this study, these 20 genes were incorporated into the combinatorial imaging rounds. Fluorescent readout probes, conjugated to either Cy5, Cy3B or Alexa488 dye molecules through a disulfide linkage were purchased from Bio-Synthesis, Inc.
  • 3D MERFISH imaging platform Two 3D MERFISH imaging platforms were constructed in this study.
  • One (setup 1) was built around a Nikon Ti-U microscope body equipped with a Nikon 40x 1.15 NA water immersion lens (Nikon, MRD77410) or 60x 1.2 NA water immersion lens (Olympus, UPLSAPO60XW) and a spinning disk confocal unit (Andor Dragonfly, ACC-CR-DFLY- 202-40).
  • Illumination was provided with solid-state lasers at 647 nm (MBP Communications, 2RU-VFL-P-1500-647-B1R), 561 nm (MBP Communications; 2RUVFL- P-1000-560-B1R), 488 nm (Coherent, Genesis MX488-1000 STM) and 405 nm (Coherent, Obis 405-200C).
  • the output of the 647 nm, 561 nm and 488 nm was controlled by an acousto-optic tunable filter (Crystal Technologies, AODS 20160-8 and PCAOM Vis), while the 405 nm was controlled via direct modulation.
  • the coaligned beams were coupled into the input fiber of a beam homogenizer (Andor, Borealis BCU- 120) which provided uniform illumination for the spinning disk. Fluorescence emission was isolated with a penta-bandpass dichroic (Andor, CR-DFLY-DMPN-06I) and an emission filter (Andor, TR-DFLY-P45568-600) in the imaging path. Sample position was controlled by a motorized XY stage (Prior, Proscan Hl 17E1N5/F), while z-scanning was controlled by a piezo objective nanopositioner (Queensgate, OP400 or Mad City Labs,F200S).
  • initial focus was acquired via a custom-built autofocus system that monitored the position of a reflected IR laser (Thorlabs, LP980-SF15) from the coverslip surface with a CMOS camera (Thorlabs, DCC1545M). All laser and piezo control signals were generated by a DAQ card (National Instruments, PCIe-6353) and were synced to the fire signal of the sCMOS camera (Hamamatsu, Orca-Flash4.0).
  • DAQ card National Instruments, PCIe-6353
  • a similar imaging platform was built based on an Olympus 1X71 microscope body, spinning disk confocal unit (Andor, CSU Wl), beam homogenizer (Andor, Borealis BCU 100), piezo objective nanopositioner (Mad City Labs, Nano-F200S) and XY stage (Marzhauser, Scan IM 112x74).
  • the illumination was provided by with solidstate lasers at 647 nm (MBP Communications, 2RU-VFL-P-2000-647-B1R), 561 nm (MPB Communications; 2RU-VFL-P-1000-560-B1R), 488 nm (MPB, 2RU-VFL-P-500- 488-B1R) and 405 nm (Coherent, Cube 405), and was controlled via mechanical shutters (Uniblitz, LS6T2). Details regarding which imaging platform was used to acquire specific data is listed in Table 1.
  • Fluidic system and sample chamber Fluidic system and sample chamber.
  • MERFISH samples were imaged on 40mm round cover glass (Bioptech) and mounted in a flow chamber (Bioptech, FCS2) using a 0.5 mm gasket (Bioptech, 1907-1422-500).
  • the fluidic system contained a peristaltic pump (Gilson, Minipuls 3), and four eight-way valves (Hamilton, MVP 36798 with 8-5 Distribution Valve) assembled to provide up to 24 readout bit solutions, and four additional buffers (2x SSC, wash, cleave and image). Image acquisition and fluidic control were fully automated using custom-built software.
  • 3D MERFISH imaging To prepare the sample for imaging, it was first stained with a readout hybridization mixture containing the readout probes associated with a probe complementary to the polyA-anchor probe and conjugated via a disulfide bond to the dye Alexa488 at a concentration of 25 nM per probe.
  • the readout hybridization mixture comprised the readout-probe wash buffer of 2x SSC, 10% v/v ethylene carbonate (E26258, Sigma) and 0.1% v/v Triton X-100, supplemented with the readout probes.
  • the sample was incubated in this mixed buffer for 30 min at room temperature, and then washed in the readout-probe wash buffer supplemented with 1 microgram/ml DAPI for 30 min to stain nuclei within the sample.
  • the sample was then washed in 2x SSC for 15min loaded into the flow chamber.
  • Imaging buffer containing 5 mM 3,4-dihydroxybenzoic acid (P5630, Sigma), 2 mM trolox (238813, Sigma), 50 micromolar trolox quinone, 1:500 recombinant protocatechuate 3,4-dioxygenase (rPCO; OYC Americas), 1:500 murine RNase inhibitor and 5 mM NaOH (to adjust pH to 7.0) in 2x SSC was introduced into the chamber. A time of at least 15 min was used to let imaging buffer fully penetrate into the deep part of the tissue. The sample was then imaged with a low-magnification objective 10X air objective using 405- nm illumination to produce a tiled imaged of the sample.
  • rPCO protocatechuate 3,4-dioxygenase
  • This image was then used to locate the region of interest (RO I) in each slice and to generate a grid of field-of-view (FOV) positions to cover the ROI. After determining these positions, a high-numerical aperture objective was used to image each of the FOV positions.
  • images were collected in the 488-nm and 405-nm channels to image the 488 nm fiducial beads, the total polyA mRNA stained by the polyA-anchor probe and the nucleus stained by DAPI. These two channels were later used for cell segmentation.
  • a single image of the fiducial beads was taken on the coverslip surface for every imaging round as a spatial reference to correct for slight differences in the stage position.
  • 1 -micrometer- thick z-stacks were collected for all channels in each FOV.
  • the fluorescent dyes were removed by flowing 2 mL of cleavage buffer comprising 2x SSC and 50 mM of Tris (2-carboxyethyl) phosphine (TCEP; 646547, Sigma) with a 15-min incubation in the flow chamber in order to cleave the disulfide bond linking the dyes to the readout probes.
  • TCEP Tris (2-carboxyethyl) phosphine
  • MERLISH image analysis was performed using a customized version of MERlin, a Python-based MERLISH analysis pipeline.
  • a content aware deep learning-based image restoration algorithm CSBdeep was used to enhance the quality of MERLISH images captured with short exposure times. Specifically, an individual model was trained for each MERLISH bit color channel (560 and 650) separately. To accomplish this, 50 image pairs were randomly selected for each color channel, each with low and high signal-over-noise (SNR) images.
  • RNA molecules were identified by pixel-based decoding algorithm. Briefly, barcodes were individually assigned to each pixel, then aggregated adjacent pixels with the same barcodes into putative RNA molecules and filtered the list of putative RNA molecules to enrich for correctly identified transcripts. In detail, to assign each pixel to one of barcodes, the intensity vectors measured for each pixel were compared to the vectors corresponding to the valid barcodes.
  • each image was normalized in a bit by median intensity across all FOVs in the bit to eliminate the intensity variation between hybridizations and color channels.
  • intensity normalization intensity variations across pixels were further normalized by dividing the intensity vector for each pixel by its L2 norm.
  • each of the predesigned barcodes was normalized by its L2 norm.
  • the normalized barcode vector that was closest to the pixel’s normalized intensity vector was identified. Pixels with the distance larger than 0.65 away from any valid barcode in the first step were excluded and any pixels with an intensity less than 10 were disregarded, as they are potentially off target binding probes or noise-induced artifacts amplified by the deep learning algorithm.
  • the deep learning model was then trained to learn how to transform the low- quality, noisy low-exposure images into high-quality long-exposure images.
  • the purpose of the deep learning model is to improve the quality of the image.
  • the deep learning model was a convolutional neural network.
  • 3D cell segmentation was performed on the co-staining of DAPI and total mRNA using the deep learning-based cell segmentation algorithm, Cellpose 2.0.
  • the segmentation model was fine-tuned with the user-in-the-loop approach of the Cellpose 2.0 using randomly selected z-slices containing the DAPI and polyA mRNA channels with the “CP” model as a starting point.
  • the segmentation model was applied to 3D z-stacks for each FOV to generate segmentation masks in 3D using cellpose 2.0 using its 3D mode.
  • the cell boundaries were extracted for each cell and the cell boundaries exported as polygons.
  • This method allowed duplicated cells that appeared in multiple adjacent FOVs to be identified and removed. By implementing this method, the final dataset contained only unique cells, and any inaccuracies in downstream analysis could be avoided.
  • the detected RNA molecules were assigned into the cell if the molecule position is within the boundaries of the cell to obtain the cell-by-gene matrix.
  • Unsupervised clustering analysis of 3D MERFISH data After obtaining the cell-by- gene matrix as described above, preprocessing on the matrix was performed using the following steps. Firstly, cells that were potentially artifacts due to segmentation errors were removed. Specifically, cells with a small volume ( ⁇ 300 micrometer 2 ), or low RNA count number ( ⁇ 30), or those captured in 1 micrometer z-sections fewer than 5 or more than 40 times, were excluded. These criteria were selected to exclude cells with low quality or insufficient information. Next, -10% cells were removed as putative doublets identified using doubletFinder. After the above preprocessing steps, the single cell data was analyzed using Seurat as described below.
  • the gene vector was normalized for each cell by dividing each cell by its total RNA counts sum and then multiply the resulting number with a constant number 10,000 to ensure all cells contain the same total RNA counts. Following this normalization, a log transformation was performed on the cell-by-gene matrix.
  • the normalized single-cell expression profiles were z- scored, followed by dimensionality reduction by principal component analysis.
  • a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
  • the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
  • This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified.
  • “at least one of A and B” can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Wood Science & Technology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Optics & Photonics (AREA)
  • Genetics & Genomics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne de manière générale la microscopie, y compris la microscopie confocale. Dans certains cas, des techniques telles que MERFISH peuvent être utilisées pour déterminer la présence d'acides nucléiques dans un échantillon, avec des images obtenues par microscopie confocale. Dans certains cas, les acides nucléiques peuvent être déterminés en 3 dimensions. Dans certains cas, des échantillons relativement épais, par exemple, d'au moins 100 micromètres d'épaisseur, peuvent être déterminés. Dans certains modes de réalisation, des techniques d'apprentissage profond ou d'autres techniques d'apprentissage automatique peuvent être utilisées pour améliorer la qualité d'image et/ou accélérer le processus d'imagerie confocale.
PCT/US2024/032412 2023-06-05 2024-06-04 Systèmes d'imagerie tridimensionnelle et procédés pour déterminer la présence d'acides nucléiques dans des tissus épais Pending WO2024254071A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363506283P 2023-06-05 2023-06-05
US63/506,283 2023-06-05

Publications (1)

Publication Number Publication Date
WO2024254071A1 true WO2024254071A1 (fr) 2024-12-12

Family

ID=93796397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/032412 Pending WO2024254071A1 (fr) 2023-06-05 2024-06-04 Systèmes d'imagerie tridimensionnelle et procédés pour déterminer la présence d'acides nucléiques dans des tissus épais

Country Status (1)

Country Link
WO (1) WO2024254071A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190276881A1 (en) * 2016-11-08 2019-09-12 President And Fellows Of Harvard College Multiplexed imaging using merfish, expansion microscopy, and related technologies
US20190301980A1 (en) * 2016-11-18 2019-10-03 Tissuevision, Inc. Automated tissue section capture, indexing and storage system and methods
US20200080139A1 (en) * 2013-04-30 2020-03-12 California Institute Of Technology Multiplex labeling of molecules by sequential hybridization barcoding
US20200258223A1 (en) * 2018-05-14 2020-08-13 Tempus Labs, Inc. Determining biomarkers from histopathology slide images
WO2021102122A1 (fr) * 2019-11-20 2021-05-27 President And Fellows Of Harvard College Procédés d'imagerie multifocale pour un profilage moléculaire
WO2023025318A1 (fr) * 2021-08-27 2023-03-02 Westlake University Procédés et formulations pour l'analyse protéomique par expansion d'échantillons biologiques

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200080139A1 (en) * 2013-04-30 2020-03-12 California Institute Of Technology Multiplex labeling of molecules by sequential hybridization barcoding
US20190276881A1 (en) * 2016-11-08 2019-09-12 President And Fellows Of Harvard College Multiplexed imaging using merfish, expansion microscopy, and related technologies
US20190301980A1 (en) * 2016-11-18 2019-10-03 Tissuevision, Inc. Automated tissue section capture, indexing and storage system and methods
US20200258223A1 (en) * 2018-05-14 2020-08-13 Tempus Labs, Inc. Determining biomarkers from histopathology slide images
WO2021102122A1 (fr) * 2019-11-20 2021-05-27 President And Fellows Of Harvard College Procédés d'imagerie multifocale pour un profilage moléculaire
WO2023025318A1 (fr) * 2021-08-27 2023-03-02 Westlake University Procédés et formulations pour l'analyse protéomique par expansion d'échantillons biologiques

Similar Documents

Publication Publication Date Title
US20240271193A1 (en) Multiplexed imaging using merfish, expansion microscopy, and related technologies
Goh et al. Highly specific multiplexed RNA imaging in tissues with split-FISH
US20230279465A1 (en) Methods of anchoring fragmented nucleic acid targets in a polymer matrix for imaging
EP4386761A2 (fr) Impression et nettoyage de matrice
US12098418B2 (en) RNA fixation and detection in CLARITY-based hydrogel tissue
Fang et al. Three-dimensional single-cell transcriptome imaging of thick tissues
Trcek et al. mRNA quantification using single-molecule FISH in Drosophila embryos
US20240305314A1 (en) Spectral unmixing combined with decoding for super-multiplexed in situ analysis
EP3332029B1 (fr) Imagerie à l'échelle nanométrique de protéines et d'acides nucléiques par microscopie d'expansion
JP6605452B2 (ja) 逐次ハイブリダイゼーションバーコーディングによる分子の多重標識化
US20240331348A1 (en) Multi-resolution in situ decoding
EP4490737A1 (fr) Procédés de conception de code in situ pour réduire à un minimum un chevauchement optique
WO2024254071A1 (fr) Systèmes d'imagerie tridimensionnelle et procédés pour déterminer la présence d'acides nucléiques dans des tissus épais
US20250283822A1 (en) Compositions and methods for improved multiplexed error robust fluorescence in situ hybridization
US20250146069A1 (en) Anchored primary nucleic acid probes and methods thereof; ribonuclease-insensitive methods for determining cellular nucleic acid in a biological sample
CN117460837A (zh) 与指数辐亮度拴系的连锁放大
HK40113243A (en) Matrix imprinting and clearing
US20250285229A1 (en) Multi-focus image fusion with background removal
US20250327114A1 (en) Barcode detection using argonaute proteins
Wadsworth Uniquely Quantifying Highly Similar RNA Transcripts at the Single Molecule Level

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24819862

Country of ref document: EP

Kind code of ref document: A1