[go: up one dir, main page]

WO2025166157A1 - Identification tridimensionnelle de bases dans une analyse de séquençage de nouvelle génération - Google Patents

Identification tridimensionnelle de bases dans une analyse de séquençage de nouvelle génération

Info

Publication number
WO2025166157A1
WO2025166157A1 PCT/US2025/014022 US2025014022W WO2025166157A1 WO 2025166157 A1 WO2025166157 A1 WO 2025166157A1 US 2025014022 W US2025014022 W US 2025014022W WO 2025166157 A1 WO2025166157 A1 WO 2025166157A1
Authority
WO
WIPO (PCT)
Prior art keywords
flow cell
computer
cell images
implemented method
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/014022
Other languages
English (en)
Inventor
Haosen WANG
Ryan Kelley
Connor THOMPSON
Minghao GUO
Weston DAMRON
Christopher Brown
Michael Previte
Eric KOFMAN
Amirali Kia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Element Biosciences Inc
Original Assignee
Element Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Element Biosciences Inc filed Critical Element Biosciences Inc
Publication of WO2025166157A1 publication Critical patent/WO2025166157A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1429Signal processing
    • G01N15/1433Signal processing using image recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10064Fluorescence image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR

Definitions

  • Embodiments of this disclosure relate generally to image processing and base calling in sequencing data analysis, and particularly to three-dimensional (3D) images of in situ samples.
  • next-generation sequencing or NGS-like applications such as sequencing by synthesis, sequencing by binding, or sequencing by avidity
  • NGS next-generation sequencing
  • a new strand is synthesized one nucleotide base at a time.
  • one base attaches to any given strand.
  • image(s) are recorded.
  • a base-calling algorithm is applied to the image(s) to “read” the successive signals from each cluster or polony and convert the optical signals into an identification of the nucleotide base sequence added to each DNA fragment.
  • Traditional sequencing data analysis relies on two-dimensional (2D) flow cell images.
  • flow cell images at a selected z level can include signals from out-of-focus polonies located at adjacent z levels and other undesired signals, e.g., from the cell membrane.
  • 3D three-dimensional
  • the image processing methods herein may function to reverse the imaging process of an optical system and virtually improve the full width half maximum (FWHM) of the optical system.
  • the image processing methods disclosed herein may advantageously increase detectable density of polonies or clusters in 3D samples or traditional 2D samples.
  • the methods herein may advantageously lessen the impact of color mixing of polonies that may be caused by neighboring polonies in 2D or 3D dimensions by computationally increasing the spatial resolution of the flow cell images.
  • a neural network e.g., a convolutional neural network
  • a convolutional neural network is used in generating a high-resolution z-stack of flow cell images of the 3D sample from the low-resolution z-stack that has been acquired from the sequencing system, and subsequent primary analysis can be performed based on the high-resolution flow cell images instead of the low-resolution flow cell images.
  • the neural network e.g., a convolutional neural network, is used in image processing of the high- resolution z-stacks of flow cell images of the samples to generate the base callings.
  • Embodiments of these aspects include corresponding computer systems, apparatus, and computer program product recorded on computer storage device(s), which, alone or in combination, configured to perform the operations of the methods.
  • the computer system has installed on it software, firmware, hardware, or their combinations that in operation cause the computer system to perform the operations or actions.
  • the computer program product includes instructions that, when executed, by a hardware processor, cause the hardware processor to perform the operations or actions.
  • FIG. 1 illustrates a block diagram of a sequencing system for performing sequencing, flow cell image processing, and/or primary analysis operations including base calling using flow cell images, according to some embodiments.
  • FIGS. 2A-2C show an exemplary simulated flow cell image (FIG. 2A, of a in situ cell sample) and two different images (FIGS. 2B-2C) predicted using the systems and methods herein and corresponding to the image in FIG. 2A, according to some embodiments.
  • the predicted images are at different z levels.
  • FIGS. 2D-2E show exemplary simulated flow cell image in the reference set.
  • the simulated flow cell images are generated using the methods herein with the first (FIG. 2D) and second (FIG. 2E) resolutions, according to some embodiments.
  • FIGS. 3A-3D show two exemplary flow cell images (FIGS. 3A and 3D) with multiple cells at two different z levels, and two different predicted images at different z levels (FIGS. 3B-3C) generated from the image in FIG. 3A using the systems and methods herein according to some embodiments.
  • FIGS. 3E-3F shows improved detection of targets per cell in the same imaging area (FIG. 3E) and fewer false positives (FIG. 3F) using the methods herein when compared with non-artificial intelligence-based methods; in this case, the targets are polonies or clusters within the cells.
  • FIG. 3G shows improved detection of targets in simulated flow cell images of sample(s) using the neural network herein which produces higher R 2 value than a traditional method.
  • FIG. 4 illustrates a block diagram of a computer system for performing image processing, sequencing analysis, training of neural network(s), predicting base calls, image intensities, high resolution images, and/or classifications using the pre-trained neural networks, and/or base calling, according to some embodiments.
  • FIG. 5A is a flow chart of an exemplary method of predicting 3D flow cell images of sequencing sample(s) and performing base calling using the 3D flow cell images, according to some embodiments.
  • FIG. 5B is a flow chart of an exemplary method of training a neural network that can be used to predict higher resolution flow cell images of sequencing sample(s), according to some embodiments.
  • FIG. 5C is a schematic showing of an exemplary embodiment of the first reconfigurable logic device, the integrated circuit, and their connection(s) to the processor of the sequencing system.
  • FIG. 5D is a schematic showing of an exemplary embodiment of using the first reconfigurable logic device and the integrated circuit in parallel with a sequencing run in progress within a predetermined time window.
  • FIG. 5E is a flow chart of an exemplary method of training a neural network, thereby generating a pre-trained neural network that can be used to predict higher resolution flow cell images of sequencing sample(s), base calls, intensities, and/or classifications, according to some embodiments.
  • FIG. 5F shows scatter plots for an exemplary embodiment of generating reference intensities from high resolution training flow cell images.
  • FIG. 6 is a schematic showing exemplary embodiments of padlock probes.
  • FIG. 7 is a schematic showing a workflow for generating inside a cell circularized padlock probes, comprising generating first and second cDNAs from first and second target RNA molecules (respectively), hybridizing first and second padlock probes to the first and second cDNA molecules (respectively) to generate first and second circularized padlock probes (respectively).
  • FIG. 8 is a schematic showing a rolling circle and sequencing workflow inside a cell, comprising generating first and second concatemers by conducting rolling circle amplification using first and second covalently closed circular molecules (respectively). The first and second concatemers are subjected to a sequencing workflow using universal sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents.
  • FIG. 9 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 10 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 11 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 12 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 13 is a schematic showing a workflow for generating circularized padlock probes, comprising generating first and second cDNAs from first and second target RNA molecules (respectively), hybridizing first and second padlock probes to the first and second cDNA molecules (respectively) to generate first and second circularized padlock probes (respectively).
  • FIG. 14 is a schematic showing a rolling circle and sequencing workflow comprising generating first and second concatemers by conducting rolling circle amplification using first and second covalently closed circular molecules (respectively).
  • FIG. 15 is a schematic of an exemplary low binding support comprising a glass substrate and alternating layers of hydrophilic coatings which are covalently or non- covalently adhered to the glass, and which further comprises chemically-reactive functional groups that serve as attachment sites for oligonucleotide primers (e.g., capture oligonucleotides).
  • oligonucleotide primers e.g., capture oligonucleotides
  • FIG. 16 is a schematic of various exemplary configurations of multivalent molecules.
  • Left (Class I) schematics of multivalent molecules having a “starburst” or “helter-skelter” configuration.
  • Center (Class II) a schematic of a multivalent molecule having a dendrimer configuration.
  • Right (Class III) a schematic of multiple multivalent molecules formed by reacting streptavidin with 4-arm or 8-arm PEG-NHS with biotin and dNTPs. Nucleotide units are designated ‘N’, biotin is designated ‘B’, and streptavidin is designated ‘ SA’ .
  • FIG. 17 is a schematic of an exemplary multivalent molecule comprising a generic core attached to a plurality of nucleotide-arms.
  • FIG. 18 is a schematic of an exemplary multivalent molecule comprising a dendrimer core attached to a plurality of nucleotide-arms.
  • FIG. 19 shows a schematic of an exemplary multivalent molecule comprising a core attached to a plurality of nucleotide-arms, where the nucleotide arms comprise biotin, spacer, linker and a nucleotide unit.
  • FIG. 20 is a schematic of an exemplary nucleotide-arm comprising a core attachment moiety, spacer, linker and nucleotide unit.
  • FIG. 21 shows the chemical structure of an exemplary spacer (top), and the chemical structures of various exemplary linkers, including an 11 -atom Linker, 16-atom Linker, 23-atom Linker and an N3 Linker (bottom).
  • FIG. 22 shows the chemical structures of various exemplary linkers, including Linkers 1-9.
  • FIG. 23 A shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 23B shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 23 C shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 23D shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 24 shows the chemical structure of an exemplary biotinylated nucleotide- arm.
  • FIG. 25 is a schematic of a guanine tetrad (e.g., G-tetrad).
  • FIG. 26 is a schematic of an exemplary intramolecular G-quadruplex structure.
  • FIG. 27 shows an exemplary support with multiple tiles for immobilizing 2D or 3D sample(s) thereon for sequencing, including the cellular sample(s), according to some aspects.
  • FIG. 28 shows a flow chart of an exemplary method of predicting base calls of the flow cell images (e.g., of in situ samples) using the neural network disclosed herein, according to some embodiments.
  • FIG. 29 shows a flow chart of an exemplary method of training the neural network that can be used to predict base calls or high resolution flow cell images, according to some embodiments.
  • FIGS. 30A-30B show a flow cell image (FIG. 30A) and its high resolution image predicted using the neural network that is pre-trained using reference base calls.
  • base calls are determined from the high resolution image using non-neural network based algorithm(s).
  • FIG. 31 shows a block diagram of an exemplary method of training the neural network(s) and an exemplary method of predicting high resolution flow cell images and/or predicting base calls using such pretrained neural network(s).
  • system, apparatus, method, and/or computer program product embodiments, and/or combinations and sub-combinations thereof which enables image processing of flow cell images, e.g., flow cell images obtained from in situ samples or traditional 2D samples in a sequencing run, to: 1) generate images with improved spatial resolution and improved detectable density of polonies or clusters and perform base calling using flow cell images with such improved spatial resolutions, and the generated images may be used for subsequent sequencing analysis including but not limited to base calling; or 2) to predict intensities, base call(s), or classifications of polonies or clusters.
  • the techniques herein can be used while a sequence run is still in progress to improve efficiency of sequencing and sequencing analysis, reduce data storage required during sequencing and sequencing analysis, and improve accuracy and reliability of sequencing analysis.
  • the techniques herein can be used on flow cell images obtained using various imaging and/or sequencing techniques of volumetric 3D samples and/or traditional 2D samples and/or obtained using various sequencing systems, e.g., next generation sequencing (NGS) systems.
  • NGS next generation sequencing
  • the techniques disclosed herein are useful for base calling in NGS, and NGS flow cell images will be used as the primary example herein for describing the application of these techniques.
  • image analysis techniques may also be useful in other applications where spot-detection and/or CCD imaging is used.
  • the techniques herein can be used for processing flow cell images (e.g., 2D or 3D) to generate accurate and reliable image intensities for polonies or clusters with improved spatial resolution thus improved maximum polony or cluster density detected in the sample(s) for accurate and reliable sequencing analysis.
  • the technologies disclosed herein may advantageously function to reverse the imaging process of an optical system and virtually improve the full width half maximum (FWHM) of the imager so that the density of polony locations are not limited by the optical design of the sequencing systems.
  • FWHM full width half maximum
  • the disclosed technologies herein may advantageously increase detected density of polonies, e.g., by 2x, 4x, 8x, 16x, 27x, 40x, 50x, lOOx or more than polony density detectable using traditional optical systems and image processing methods.
  • the disclosed technologies herein may advantageously increase spatial resolution of flow cell images in each of the one or more spatial dimensions by 2x, 4x, 8x, 16x, 27x, 40x, 50x, lOOx or more than flow cell images acquired using traditional optical systems and/or image processing methods.
  • the methods herein may also advantageously lessen the impact of color mixing of polonies that may be caused by neighboring polonies or clusters by computationally increasing the spatial resolution of flow cell images.
  • In situ samples such as cells or tissue can have a thickness along the axial or z direction that cannot remain in-focus within a single 2D image.
  • a z-stack of multiple 2D flow cell images may be acquired to cover clusters or polonies at different z levels, e.g., in a 3D cellular sample. Interferences may occur in the z-stack of flow cell images, such as out-of-focus polonies and background signal from cellular components.
  • a polony that locates at a first z level can appear in a first flow cell image at a first z level and it may also generate a blob of signal in a second 2D flow cell image taken at its adjacent z level where it is out-of-focus.
  • the blob of signal may interfere with intensities of polonies at or near the same x-y location in the second flow cell image, thus deteriorating the accuracy and reliability of base callings.
  • color mixing from neighboring polonies may interfere with polony intensity or polony density that can be detected for subsequent base calling.
  • the techniques disclosed herein advantageously train a neural network to efficiently and accurately predict polony or cluster locations in the sample(s).
  • the techniques disclosed herein advantageously train a neural network to efficiently and accurately predict high resolution intensities, base calls, and/or classifications for polonies or clusters in the sample(s).
  • the samples herein are not limited to 3D samples, e.g., in situ cells and/or tissue.
  • the samples herein may also include traditional 2D samples.
  • the techniques disclosed herein may advantageously utilize the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), to: 1) predict high-resolution polony or cluster locations based on low-resolution flow cell images; or 2) to predict intensities, base calls and/or classifications at the high-resolution for the polonies or clusters in the sample(s).
  • the reconfigurable logic device e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs)
  • the utilization of the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), on-board the sequencing system may advantageously reduce computational time, reduce energy consumption, improve sequencing analysis efficiency, reduce data storage space required, and reduce sequencing system cost in analysis of flow cell images when compared with sequencing analysis using existing sequencing systems.
  • FPGAs field-programmable gate arrays
  • NPUs neural processing units
  • the techniques disclosed herein advantageously train a neural network based on a loss function that is determined by comparison to reference base calls as ground truth, while the trained neural network may be used to accurately and reliably predict high resolution post image-processing flow cell images based on the flow cell images that are acquired from the sample(s).
  • the techniques disclosed herein advantageously allow a mismatch in the training outputs and the prediction outputs.
  • the neural network may be trained by generating training base calls as training outputs and comparing the training outputs to reference base calls as ground truth. The trained neural network may then be used to predict high resolution flow cell images or to predict base calls.
  • Such mismatching in training and prediction outputs may advantageously allow reference base calls to be considered in training parameters of the neural network and prediction of higher resolution higher quality version of the flow cell images that can be used to improve base calling accuracy and reliability.
  • Such training and prediction advantageously enable utilization of a simplified neural network which requires less computational burden, reduction in computational time, reduction in power consumption, and reduction in making predictions.
  • the samples herein are not limited to 3D samples, e.g., in situ cells and/or tissue.
  • the samples herein may also include traditional 2D samples.
  • the techniques disclosed herein may advantageously utilize the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), to perform one or more operations in the training and/or the prediction.
  • Primary analysis can include some or all of operations and/or steps needed to perform base calling and compute quality score of the base callings.
  • Primary analysis can involve the formation of a template image for at least part of the flow cell.
  • the template image can include the estimated locations of all detected clusters or polonies in a common coordinate system.
  • the template image can include a polony map that is 2D or 3D.
  • Template images are generated by identifying cluster or polony locations in all images in the first cycle or the first few cycles of the sequencing process. Generation of the template image may need sufficient spatial resolution to differentiate the polonies from background features, neighboring polonies, and/or duplicate polonies that are out-of-focus.
  • FIG. 1 illustrates a block diagram of a computer-implemented system 100, according to one or more embodiments disclosed herein.
  • the system 100 has a sequencing system 110 that includes a flow cell 112, a sequencer 114, an imager 116, data storage 122, and user interface 124.
  • the sequencing system 110 may be connected to a cloud 130.
  • the sequencing system 110 may include one or more of dedicated processors 118, a first reconfigurable logic device, e.g., Field-Programmable Gate Array(s) (FPGAs) 120, and a computing system 126.
  • FPGAs Field-Programmable Gate Array
  • the flow cell 112 is configured to capture DNA fragments and form DNA sequences for base-calling on the flow cell.
  • the flow cell 112 can include a support as disclosed herein.
  • the support can be a solid support.
  • the support can include a surface coating thereon as disclosed herein.
  • the surface coating can be a polymer coating as disclosed herein.
  • a flow cell 112 can include multiple tiles or imaging areas thereon, and each tile may be separated into a grid of subtiles.
  • Each subtile can include a plurality of clusters or polonies immobilized thereon.
  • a flow cell can have 424 tiles, and each tile can be divided into a 6 x 9 grid, therefore 54 subtiles.
  • the flow cell image as disclosed herein can be an image including signals of a plurality of clusters or polonies.
  • the flow cell image can include one or more tiles of signals or one or more subtiles of signals.
  • a flow cell image can be an image that includes all the tiles and approximately all signals thereon.
  • each tile may include millions of polonies or clusters.
  • a tile can include about 1 to 10 million of clusters or polonies.
  • Each polony can be a collection of many copies of DNA fragments.
  • the flow cell images may be acquired using the imager 116 at single or multiple z levels along a z axis orthogonal to the image plane of the flow cell images.
  • the flow cell images can include multiple z-levels (i.e., z levels) in order to cover the whole sample(s) in 3D.
  • the z axis can extend from the objective lens of the imager 116 disclosed herein to the support, e.g., flow cell 112.
  • the z axis can be orthogonal to the image plane of the flow cell images.
  • Each z level of flow cell images may be separated from the adjacent z level(s) for a predetermined distance, for example, ranging from about 0.1 um to about 15 urns, or from 0.02 um to 10 urns.
  • Each z level of flow cell images may be separated from the adjacent z level(s) for a distance ranging from 0.5 um to 10 urns, from 0.01 um to 5 urns, or from 0.1 um to 15 urns.
  • flow cell images can be acquired from one or more sequencing cycles and/or one or more channels.
  • Each flow cell image may include in its field of view at least part of one or more tiles or subtiles of the flow cell.
  • FIG. 27 shows a portion of a flow cell 2712 with multiple tiles 2710.
  • the image plane is defined by the x and y axis. And the z direction (i.e., z axis) is orthogonal to the x-y plane.
  • the flow cell images, samples, and the z axis are described in a Cartesian coordinate system as shown in FIG. 27, any other coordinate systems can be used to define spatial locations and relationships herein.
  • Other coordinate systems can include but are not limited to the polar coordinate system, cylindrical, or spherical coordinate systems.
  • the sequencer 114 may be configured to flow a nucleotide mixture onto the flow cell 112, cleave blockers from the nucleotides in between flowing steps, and perform other steps for the formation of the DNA sequences on the flow cell 112.
  • the nucleotides may have fluorescent elements attached that emit light or energy in a wavelength that indicates the type of nucleotide. Each type of fluorescent element may correspond to a particular nucleotide base (e.g., A, G, C, T). The fluorescent elements may emit light in visible wavelengths.
  • the sequencer 114 and the flow cell 112 may be configured to perform various sequencing methods disclosed herein, for example, sequencing-by-avidite.
  • each nucleotide base may be assigned a color. Different types of nucleotides can have different colors. Adenine(A) may be red, cytosine(C) may be blue, guanine(G) may be green, and thymine(T) may be yellow, for example.
  • the color or wavelength of the fluorescent element for each nucleotide may be selected so that the nucleotides are distinguishable from one another based on the wavelengths of light emitted by the fluorescent elements.
  • the imager 116 may be configured to capture images of the flow cell 112 after each flowing step.
  • the imager 116 includes a camera configured to capture digital images, such as a CMOS or a CCD camera.
  • the camera may be configured to capture images at the wavelengths of the fluorescent elements bound to the nucleotides.
  • the images acquired by the imager of the sample(s) immobilized on at least a portion of the flow cell can be called the flow cell images.
  • the imager 116 can include one or more optical systems disclose herein.
  • the optical system(s) can be configured to capture optical signals from the flow cell and generate corresponding flow cell images thereof. The flow cell images can then be used for base calling.
  • the images of the flow cell may be captured in groups, where each image in the group is taken at a wavelength or in a spectrum that matches or includes only one of the fluorescent elements. In another embodiment, the images may be captured as single images that capture all of the wavelengths of the fluorescent elements.
  • the resolution of the imager 116 can control the level of detail in the flow cell images, including pixel size. In existing systems, this resolution is very important, as it controls the accuracy with which a spot-finding algorithm identifies the polony or cluster centers. In some embodiments, the image resolution of flow cell images disclosed herein can be about 10 nanometers (nms) to a couple of hundreds of nms or greater.
  • the image resolution of flow cell images can be in a range from 0.1 nm to 1000 nms. In some embodiments, the image resolution of flow cell images can be in a range from 1 nm to 500 nms. In some embodiments, the image resolution of flow cell images can be in a range from 5 nm to 300 nms.
  • One way to increase the accuracy of polony or cluster finding is to improve the resolution of the imager 116, or improve the processing performed on images taken by imager 116. Detecting polony or cluster centers in pixels other than those detected by a spot-finding algorithm can be performed. These methods can allow for improved accuracy in detection of polony or cluster centers without increasing the resolution of the imager 116. The resolution of the imager 116 may even be better than existing systems with comparable performance, which may reduce the cost of the sequencing system 110.
  • the image quality of the flow cell images can control the base calling quality.
  • One way to increase the accuracy of base calling is to improve the imager 116, or improve the processing performed on images taken by imager 116 to result in a better image quality.
  • the methods described herein may predict high resolution of the flow cell images (2x, 4x, or more than existing flow cell image resolution, in a common coordinate system) so that the detectable polony or cluster density can be improved with reduced or eliminated interferences from neighboring polonies, cellular background signal, color mixing, and/or other noises in the flow cell images.
  • 3D base calling can be more accurate using the methods herein when compared with existing methods without using such high resolution flow cell images.
  • Such methods herein can allow for accurate and efficient base calling.
  • the methods can be advantageously performed in parallel with a sequencing run in the computer-implemented system 100, without interference with or delay of existing sequencing workflow of the sequencing system 110.
  • the results of predicted high resolution flow cell images can be available for making base calling in the current sequencing cycle in the sequencing workflow.
  • some or all of the operations disclosed herein can be advantageously performed by the first reconfigurable logic device, e.g., FPGA(s) or the integrated circuit, e.g., an application specific integrated circuit (ASIC) chip, neural processing unit (NPU), or artificial intelligence (Al) chip and data can be communicated between the CPU(s) and the first reconfigurable logic device or integrated circuit to reduce the total operational time from methods operating using only the CPUs.
  • ASIC application specific integrated circuit
  • NPU neural processing unit
  • Al artificial intelligence
  • the sequencing system 110 may be configured to perform operations or actions for image processing of the flow cell images across different cycles and/or channels.
  • the operations or actions disclosed herein may be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or a combination thereof.
  • One or more operations or actions in the methods 500, 600, 700, 2800, 2900 disclosed herein may be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or a combination thereof.
  • which operations or actions are to be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or their combinations can be determined based on one or more of a computation time for the specific operation(s), the complexity of computation in the specific operation(s), the need for data transmission between the hardware devices, the power required for the specific operation(s), or their combinations.
  • Image processing operations or actions of the flow cell images can be performed after the corresponding flow cell images are acquired but before base calling of the flow cell images is performed.
  • the data storage 122 is used to store information used in the methods herein. This information may include the flow cell images themselves or information and/or images derived from the flow images captured by the imager 116.
  • the DNA sequences determined from the base-calling may be stored in the data storage 122. Parameters identifying polony or cluster locations may also be stored in the data storage 122. Raw and/or processed image intensities of each polony or cluster may be stored in the data storage 122.
  • the region and/or subtile that each polony or cluster corresponds to may also be stored in the data storage 122.
  • the transformation matrix of each region and/or subtile for different cycle(s) and/or channel(s) may also be stored in the data storage 122.
  • Cell images may be stored in the data storage 122.
  • the flow cell images, the processed images, and/or the filtered images may be stored in the data storage.
  • Other information or images that can facilitate 3D base calling of the sample can be saved in the data storage.
  • the user interface 124 may be used by a user to operate the sequencing system or access data stored in the data storage 122 or the computing system 126.
  • the computing system 126 may control the general operation of the sequencing system and may be coupled to the user interface 124. It may also perform steps in image processing, base calling, their preceding operations, and/or subsequent operations including but not limited to predicting high resolution flow cell images.
  • the computing system 126 is a computer system 400, as described in more detail in FIG. 4.
  • the computing system 126 may store information regarding the operation(s) of the sequencing system 110, such as configuration information, instructions for operating the sequencing system 110, or user information.
  • the computing system 126 may be configured to pass information between the sequencing system 110 and the cloud 130.
  • the computing system 126 can include one or more general purpose computers that provide interfaces to run a variety of program in an operating system, such as WindowsTM or LinuxTM. Such an operating system typically provides great flexibility to a user.
  • the computing system 126 may include one or more processors, e.g., CPUs, the CPUs may be configured for artificial intelligence algorithm development and training (e.g., neural network training), either alone or in combination with the reconfigurable logic device and/or integrated circuit 120.
  • processors e.g., CPUs
  • the CPUs may be configured for artificial intelligence algorithm development and training (e.g., neural network training), either alone or in combination with the reconfigurable logic device and/or integrated circuit 120.
  • the sequencing system may include one or more reconfigurable logic devices 120 and/or one or more other integrated circuits 120.
  • the reconfigurable logic device 120 can include one or more FPGA devices.
  • the integrated circuit 120 herein may or may not be reconfigurable, and it may include an Al chip, an application-specific integrated circuit (ASIC) chip, a neural processing unit (NPU), or a combination thereof.
  • the reconfigurable logic device and/or integrated circuit 120 may be configured for artificial intelligence algorithm development and training (e.g., training of a neural network), either alone or in combination with the CPU and/or GPU.
  • the reconfigurable logic device and/or integrated circuit 120 include a main unit and an edge unit.
  • the main unit may be a FPGA device and the edge unit may be an ASIC or Al chip.
  • the edge unit is an additional hardware processing module that may be individually installed and/or uninstalled on the system 110.
  • the edge unit may be configured for artificial intelligence algorithm development and training.
  • the edge unit may be configured for making inferences or predictions using deployed Al algorithm(s), e.g., neural networks.
  • the edge unit may communicate electronically with the main unit e.g., data communication via DMA connections.
  • the edge unit may communicate electronically for data with other parts of the system 100 via various connections, such as a chip2chip connection.
  • the edge unit may include a neural processing unit (NPU) chip, an Al chip, or any other integrated circuit(s).
  • NPU neural processing unit
  • the dedicated processors 118 may be configured to perform operations in the methods disclosed herein.
  • the dedicated processors 118 may include one or more reconfigurable logic devices and/or integrated circuits disclosed herein.
  • the dedicated processors 118 may not include general-purpose processors, but instead custom processors with specific hardware or instructions for performing those steps.
  • Dedicated processors directly run specific software without an operating system. The lack of an operating system reduces overhead, at the cost of the flexibility in what the processor may perform.
  • a dedicated processor may make use of a custom programming language, which may be designed to operate more efficiently than the software run on general-purpose computers. This may increase the speed at which the steps are performed and allow for real time processing.
  • the reconfigurable logic device and/or the integrated circuit 120 may be configured to perform some or all of operations in the methods herein.
  • the reconfigurable logic device and/or the integrated circuit may be programmed as hardware that can perform specific task(s).
  • a special programming language may be used to transform software steps into hardware componentry.
  • Each software step may correspond to at least one operation or action in the methods disclosed herein.
  • Each software step may include at least a part of the operation or action in the methods disclosed herein.
  • the reconfigurable logic device and/or integrated circuit generally processes data faster than a general-purpose computer. Similar to dedicated processors, this may be at the cost of flexibility. The lack of software overhead may also allow the reconfigurable logic device and/or the integrated circuit to operate faster than a dedicated processor, although this will depend on the exact processing to be performed and the specific the reconfigurable logic device and/or integrated circuit and dedicated processor.
  • a group of the reconfigurable logic devices and/or integrated circuits 120 may be configured to perform the steps in parallel.
  • a number of processing engines of the FPGA(s) may be configured to perform one or more identical image processing steps for an image, a set of images, a subtile, or a select region in one or more images.
  • Each FPGA(s) 120 may perform its own part of the image processing step(s) in parallel, reducing the time needed to process data. This may allow the image processing step(s) to be completed in real time.
  • a number of processing engines of a first FPGA may be configured to generate a polony map for a tile of the flow cell.
  • Each processing engine may be responsible for generating a portion, e.g., non-overlapping portion, of the polony map at a different subtile within the tile, e.g., in parallel.
  • a second FPGA may be configured to perform intensity normalization in parallel with the generation of the polony map.
  • a number of FPGA(s) and integrated circuits, e.g., Al chips may be configured to perform one or more image processing step(s) for the flow cell images.
  • Each FPGA(s) 120 may perform its own part of the processing step(s) in parallel, reducing the time needed to process data, while each Al chip may perform polony or cluster prediction after receiving data from its corresponding FPGA. This may allow the image processing steps to be completed in real time.
  • a first and second FPGA may be configured to perform intensity registration in parallel for a different subtile or tile of the flow cell.
  • a corresponding Al chip may perform prediction of high resolution flow cell image of the corresponding subtile or tile after image registration is completed by its corresponding FPGA. Further discussion of the use of FPGAs is provided below.
  • the reconfigurable logic device and/or the integrated circuit may be configured to perform some or all of the operations or actions in the methods disclosed herein in real time. Performing the operations or actions in real time may allow the system 110 to use less memory and/or data storage, as the data may be processed as it is received. This is an improvement over conventional systems that may need to store the data before it may be processed and consequently require more memory/data storage or accessing a computer system located in the cloud 130. Further, performing the operations or actions in real time may allow more efficient sequencing analysis as it is being performing in parallel while a sequencing run is still in progress.
  • performing the processing steps using the FPGAs and Al chips may allow the system to use less power, e.g., 2x, 5x, lOx, 20x or more, thus producing less heat than performing the same processing steps using the CPUs and/or GPUs. Further discussion of the use of FPGAs is provided below.
  • the sequencing system 110 may have dedicated processors 118, the reconfigurable logic device and/or integrate circuit 120, or the computing system 126.
  • the sequencing system may use one, two, or all of these elements to accomplish one or more operations or actions in the methods disclosed herein. In some embodiments, when these hardware elements are present together, the image processing tasks are split between them.
  • the reconfigurable logic device 120 may be used to perform some or all of: the preprocessing operations, color correction, polony map generation, image registration, predicting high resolution flow cell images, training a neural network, generating the training flow cell images, base calling, and any subsequent operations, while the computing system 126 may perform other processing functions for the sequencing system 110 such as intensity normalization and registering images for base calling with cell staining image(s).
  • the computing system 126 may perform other processing functions for the sequencing system 110 such as intensity normalization and registering images for base calling with cell staining image(s).
  • one or more reconfigurable logic devices and/or integrated circuits 120 can accelerate base calling and/or any primary analysis steps of flow cell images acquired from 2D or 3D sample(s).
  • the reconfigurable logic devices and/or integrated circuits can accelerate primary analysis of 2D sample(s) or 3D volumetric sample(s) by 2x, 4x, 5x, lOx, 15x, 20x, 25x, 30x, 40x, 50x, lOOx, 200x, 400x, 500x, 800x, lOOOx, or more than traditional primary analysis methods using only CPUs and/or GPUs.
  • one or more reconfigurable logic devices and/or integrated circuits 120 herein can accelerate sequencing and sequencing analysis (including at least primary analysis) of the flow cell images acquired from 2D or 3D sample(s).
  • the reconfigurable logic devices and/or integrated circuits herein can accelerate sequencing and sequencing analysis (including at least primary analysis) of the flow cell images acquired from 2D or 3D sample(s) by 2x, 4x, 5x, lOx, 15x, 20x, 25x, 30x, 40x, 50x, lOOx, 200x, 400x, 500x, 800x, lOOOx, or more than traditional sequencing systems with only CPUs and/or GPUs.
  • making inferences or predictions of high resolution images, of base calls, or of classifications, using the neural network disclosed herein and the reconfigurable logic devices and/or integrated circuits can be less than 800 ms, 500ms, 400ms, 300ms, 200 ms, 100ms, 50ms, 20 ms, or less per tile per cycle.
  • the tile size can be varied in different flow cells.
  • the title size may be at least 0.001 2 mm, 0.01 mm 2 , 0.05 mm 2 , 0.1 mm 2 , 0.5 mm 2 1 mm 2 , 2 mm 2 , 3 mm 2 or more.
  • one or more reconfigurable logic devices and/or integrated circuits 120 can enable primary analysis (base calling) of polonies for flow cell images at multiple z levels.
  • processing time using reconfigurable logic devices can be less than 400 hours for at least 50 flow cell images (e.g., covering 50 tiles and from two or more color channels) with a FOV of at least 1 mm 2 with a resolution of 1 um or better in three dimensions for one or more flow cycles, e.g., 1-15 cycles.
  • the flow cell images can be from multiple z- levels to cover some or all of the volumetric 3D samples (e.g., completely covering at least two samples).
  • one or more reconfigurable logic devices and/or integrated circuits 120 can be used for accelerating primary analysis of 3D samples involving training neural network(s) and using the trained neural networks for making predictions or inferences.
  • neural network(s) can be used to predict polony locations and/or predict cell boundaries thereby identifying polonies within the cell(s).
  • Using the reconfigurable logic device and/or integrated circuits 120 for computations associated with neural networks can reduce the training and/or prediction time needed in comparison with usage of GPUs or other computer processors, thereby accelerating sequence analysis, and enabling sequence analysis of flow cycles while subsequent flow cycles are to be performed or in progress in the sequence run.
  • the reconfigurable logic device(s) and/or integrated circuits 120 can accelerate training and/or prediction by lOx, 20x, 50x, 80x, lOOx, 200x, 500x, 600x, 800x, lOOOx, or more than training and/or prediction using CPUs and/or GPUs.
  • the reconfigurable logic devices and/or integrated circuits 120 can be used to achieve optimal acceleration in sequencing analysis.
  • one or more FPGA chips can be used in combination with an integrated circuit specific for computations corresponding to artificial intelligence (Al) algorithms, e.g., a NPU.
  • the integrated circuit(s) can be specific circuits for Al functions.
  • the integrated circuit(s) can include applicationspecific integrated circuits (ASIC).
  • Computational tasks can be distributed to the FPGA(s) and the integrated circuit(s) to optimize computational time, energy consumption, heat dissipation, etc.
  • the Al chip may be used only for computations involving a neural network (e.g., predicting polony locations, predicting high resolution flow cell images, or training the neural network) and the FPGA(s) may be used for the rest of the primary analysis steps.
  • the primary analysis time using dual FPGA chips or single FGPA chip in connection with the Al chip(s) can be less than 400, 300, 200, 100, 50, or 20 hours for at least 50 flow cell images (e.g., covering about 50 tiles of the flow cell and from two or more color channels) with a FOV of at least 1 mm 2 with a resolution of 1 um or better for each flow cell image in three dimensions for one or more flow cycles, e.g., 1-15 cycles.
  • the flow cell images can be from multiple z-levels to cover some or all of the volumetric 3D samples (e.g., 10 to 20 z-locations to completely cover at least two samples).
  • the primary analysis time may include a total time of image processing from obtaining raw flow cell images acquired using the imager 116 to generating base calls and saving base call results.
  • the 3D samples herein includes polonies or clusters that are centered at different z levels that are spaced apart from each other with at least 0.01 um, 0.05 um, 0.1 um, 0.2 um, 0.5 um, 1 um, or more along the z direction or axial direction.
  • the cloud 130 may be a network, remote storage, or some other remote computing system separate from the sequencing system 110.
  • the connection to cloud 130 may allow access to data stored externally to the sequencing system 110 or allow for updating of software in the sequencing system 110.
  • FIG. 5C shows an exemplary embodiment of the reconfigurable logic device and the integrated circuit(s) of the sequencing system disclosed herein.
  • the sequencing system 110 may include one or more reconfigurable logic devices 120_a.
  • the sequencing system comprises a single reconfigurable logic device, i.e., a first reconfigurable logic device 120_a.
  • the sequencing system comprises multiple reconfigurable logic devices (not shown).
  • the reconfigurable logic device may comprise data processing engines 5011 configured to perform data processing in parallel. Each data processing engine may include a combination of digital logic circuit to perform its function, e.g., intensity extraction, convolution, registration, etc.
  • the sequencing system 110 may further include reconfigurable routing channels 5013 that may function as connections among the data processing engines 5011 and may also connect the data processing engines to other structural elements, e.g., the first processor and the memory device, of the sequencing system 110.
  • a neural network may be deployed at least partly on the reconfigurable logic device 120_a so that the reconfigurable logic device can be used for at least some computational tasks for generating inferences using the neural network.
  • the neural network may be pretrained using various training methods and data, for example, using the training methods and training data disclosed herein.
  • the sequencing system may further include a first processor 120_c to selectively activate or deactivate different combinations of the of data processing engines 120_a and the reconfigurable routing channels 120_b.
  • the FPGA(s) 120 as shown in FIG. 1 of the sequencing system 110 may include one or more of the reconfigurable logic device 120_a, the integrated circuit 120_b, and the processor 120_c.
  • the FPGA(S) 120 may only include the reconfigurable logic device 120_a and the processor 120_c, but not the integrated circuit(s) 120_b.
  • the different combinations of the of data processing engines 5011 and the reconfigurable routing channels 5013 may be configured to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis.
  • Such operation(s) may include one or more of (a) obtaining sensor data from one or more sensors (in the imager 116) of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; (c) predicting a second plurality of flow cell images using the neural network based on the sensor data or the first plurality of flow cell images; (d) determining polonies from the second plurality of flow cell images; and (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
  • the sensor data includes raw data that has been acquired from the sensor(s) of the imager without any additional image processing.
  • the sensor data includes raw flow cell images that have not been processed by the computing system 126, the dedicated processors 118, and/or the reconfigurable logic device and integrated circuit(s) 120 of the sequencing system 110.
  • the sequencing system comprises: a first reconfigurable logic device 120_a comprising a first plurality of data processing engines 5011 configured to perform data processing in parallel; first reconfigurable routing channels 5013 connecting at least some of the first plurality of data processing engines 5011; a neural network deployed at least partly on the first reconfigurable logic device 5011; a first processor 120_c that selectively activates or deactivates different combinations of the first plurality of data processing engines 5011 and the first reconfigurable routing channels 5013 to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis.
  • Such operation(s) may include one or more of (a) obtaining sensor data directly from one or more sensors of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; (c) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; (d) repetitively performing, for one or more times, downsampling operations comprising: (1) performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (2) performing a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result after (d); (e) performing the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; (f) repetitively performing up sampling operations comprising: (3) performing an up sampling of the fourth convolution result
  • obtaining sensor data from one or more sensors (in the imager 116) of the sequencing system may be via a direct connection.
  • the direct connection between the first reconfigurable logic device (120 and 120_a) and the sensor(s) lacks other hardware components that may process or store the sensor data thus causing undesired complexity, delay, and possible errors in sensor data communication.
  • Such hardware components include the first processor 120_c, the memory device 5030, or any processors, e.g., computing system 126, e.g., CPU, of the sequencing system.
  • the direct sensor data communication herein advantageously improves data transmission efficiency from the sensor to the FPGAs 120, frees-up the other hardware(s), e.g., CPUs, storage devices, for other data processing functions, decreases power consumption from indirect data communication, and reduces time consumption in data communication thus sequencing analysis.
  • the connection between the first reconfigurable logic device (120 and 120_a) and sensor may include other hardware components that may process or store the sensor data.
  • Such hardware components may include the first processor 120_c, the memory device 5030, or any processors, e.g., CPUs 126 of the sequencing system.
  • the sensor data may be saved into the memory device 5030, and then it can be accessed by the first reconfigurable logic device using memory controller(s) 5013.
  • the reconfigurable logic device may include digital logic circuits therein, in a sense that it is also an integrated circuit.
  • the integrated circuit herein e.g., the Al chip, NPU, etc.
  • the integrated circuit may have various difference with the reconfigurable logic device, e.g., the integrated circuit may not be as flexible in reconfiguration as the reconfigurable logic device.
  • the integrated circuit herein e.g., the Al chip, NPU, etc., may not be reconfigurable.
  • the sequencing system 110 comprises at least one reconfigurable logic device but lacks any integrated circuits, e.g., Al chips, ASIC chips, or NPUs.
  • the reconfigurable logic device may perform one or more operations in sequencing analysis and may forward its output back to the CPU as end results of primary analysis, e.g. base calls. Alternatively, the reconfigurable logic device may forward its output back to the CPU so that subsequent operations may be performed based on its output by the CPU to generate the end results of sequencing analysis.
  • the sequencing system 110 comprises at least one reconfigurable logic device, and at least one integrated circuit as shown in FIG. 5C.
  • the integrated circuit may perform one or more operations in sequencing analysis and may forward its output back to the reconfigurable logic device so that subsequent operations may be performed based on its output at the reconfigurable logic device.
  • the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
  • the data communication between any two of the reconfigurable logic device, the integrated circuits, the first processor, and the second processor may be direct such that the direct communication lacks any other hardware components that may process or store the data.
  • Such other hardware components may include memory device(s), and/or other processor(s) of the sequencing system.
  • Such direct communication may include DMA connections.
  • the data communication the data communication between any two of the reconfigurable logic device, the integrated circuits, the first processor, and the second processor may be direct such the data may not be utilized by other logic circuits or stored before reaching its communication destination, but the data may be stored in a memory device before reach its communication destination.
  • the sequencing system 110 may include a first reconfigurable logic device 120_a, e.g., FPGA, comprising a first plurality of data processing engines 5011 configured to perform data processing in parallel; an integrated circuit 120_b, e.g., an Al chip; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines alone or in combination with the fist routing channels to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis.
  • the sequencing analysis may include operations or steps of secondary analysis.
  • Such operation(s) may include one or more of: obtaining sensor data from one or more image sensors of the sequencing system; processing the sensor data to generate a first plurality of flow cell images; and communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit.
  • the sequencing system may include a second processor or the first processor to control the integrated circuit to perform one or more operations in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis and/or secondary analysis.
  • Such operation(s) may include one or more of: receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; determining polonies from the second plurality of flow cell images; performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the determined polonies; corresponding base callings of polonies in the second plurality of flow cell images to one or more of: the first reconfigurable logic device 120_a, the first 120_c or second processor, and/or one or more processors of the sequencing system 126.
  • the operation of forwarding the second plurality of flow cell images, the determined polonies; corresponding base callings of polonies in the second plurality of flow cell images comprises forward to a memory device herein, e.g., DDR memory, so that one or more of: the first reconfigurable logic device 120_a, the first 120_c or second processor, and/or one or more processors of the computing system 126 can access the data from the memory. Accessing data from the memory including reading, writing, editing, etc., may be assisted by the memory controllers disclosed herein.
  • the sequencing system 110 comprises at least one reconfigurable logic device, and at least one integrated circuit as shown in FIG. 5C.
  • the integrated circuit may perform one or more operations in sequencing analysis and may generate its output as the end results of primary analysis and forward its output to one or more devices including: the reconfigurable logic device, the first or second processor, the hardware processor of the sequencing system, etc., so that the end results can be saved or presented to a user.
  • the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support.
  • the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions.
  • the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
  • the integrated circuit may perform one or more operations in sequencing analysis and generate its output as intermediate results of primary analysis, e.g., location of polonies, and may forward its output back to one or more of: the reconfigurable logic device, the first or second processor, the hardware processor of the sequencing system, etc., so that the end results can be determined based on its output.
  • the integrated circuit may forward its output, either intermediate or end results, to be stored in a memory device, so that one or more devices including: the reconfigurable logic device, the first or second processor, and the hardware processor of the sequencing system can access the stored output whenever needed.
  • the access to the output stored in a memory device can be via a memory controller of the sequencing system, e.g., 5013.
  • the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
  • the sequencing system comprises: a first reconfigurable logic device comprising a first plurality of data processing engines configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines.
  • the different combinations of the first plurality of data processing engines may be configured to perform operations comprising: obtaining sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images; and communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit.
  • the integrated circuit may perform operations comprising: (1) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; and (2) predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; and (3) communicating the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
  • the sequencing system comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising: (a) obtaining sensor data from one or more sensors of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; and (c) communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit; wherein the integrated circuit performs operations comprising: (d) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; (e) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; (f) repetitively performing, for one or more times, down
  • the first reconfigurable routing channels comprises one or more electronic nodes, and the electronic nodes are programmable.
  • the electronic nodes here may include junction points in the circuit(s).
  • the electronic nodes may include points where two or more circuit elements are connected together.
  • the first reconfigurable routing channels comprises one or more interconnects.
  • the interconnect may include the physical wiring(s) that connects transistors and other components on an integrated circuit.
  • reconfigurable routing channels comprises one or more memory controllers, e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels comprises one or more network- on-chips (NoCs), e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels may comprise one or more of: a network-on-chip (NoC), and a memory controller.
  • the first reconfigurable routing channels may be configured to passively communicate data between components of the sequencing system.
  • the reconfigurable routing channels may be configured to communicate data bilaterally between the data processing engines, e.g., 5011 in FIG. 5C and the memory device, e.g., 5030 in FIG. 5C.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device, e.g.,120_a, and one or more memory devices, e.g., 5030.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device e.g., 120_a, and the integrated circuit, e.g., 120_b.
  • the reconfigurable logic device herein may each comprise one or more data processing engines, e.g., 5011.
  • Each data processing engine may comprise multiple digital logic circuits.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels.
  • the first reconfigurable logic device may comprise digital circuits that are integrated and forming a FPGA device.
  • the FPGA device in FIG. 5C includes the first reconfigurable logic device, the DMA connections, the first reconfigurable routing channels (e.g., NoC and memory controllers).
  • the sequencing system may further comprise one or more memory devices electrically connected for data communication with one or more components of the sequencing system, the one or more components may include one or more of: the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; a second processor; and one or more processors of the sequencing system.
  • the sequencing system further comprises one or more direct data access (DMA) connections, e.g., 5012 in FIG. 5C, that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections may be configured to actively communicate data between components of the sequencing system.
  • the DMA connections may be configured to fetch data or send data to components that are connected thereto, e.g., the data processing engines, e.g., 5011 in FIG. 5C and the reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections herein may be configured to actively request data from or actively sending data directly to: the first reconfigurable logic device; the first reconfigurable routing channels; the integrated circuit; or a combination thereof.
  • One or more direct data access (DMA) connections may be in data communication with the first reconfigurable routing channels and the integrated circuit herein.
  • the DMA connections may be configured to allow data communication based on a predetermined protocol, e.g., a PCIe protocol.
  • the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
  • the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
  • the sequencing system further comprises an integrated circuit that is different from the first reconfigurable logic device, e.g., 120_b in FIG. 5C.
  • the integrated circuit herein may not be reconfigurable.
  • the integrated circuit may comprise an application specific integrated circuit (ASIC) chip.
  • the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip.
  • the integrated circuit may comprise a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits.
  • the integrated circuit may further comprise: second plurality of data processing engines and second routing channels, each connecting at least some of the second plurality of data processing engines.
  • the sequencing system further comprises a first processor.
  • the first processor may be configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations disclosed herein.
  • the sequencing system further comprises a second processor.
  • the second processor may be configured to control digital circuits of the integrated circuit herein.
  • the first processor, or a second processor, e.g., of the integrated circuit is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations.
  • the first processor or a second processor may be configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations herein.
  • the sequencing system may further comprise a housing that encloses the first reconfigurable logic device, the first reconfigurable routing channels, the one or more DMA connections, the integrated circuit, and the first processor therein.
  • the sequencing system further comprises: a housing that encloses at least the first reconfigurable logic device therein and the integrated circuit is external to the housing.
  • the sequencing system further comprises: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit.
  • a first power level supplied by the power source to the first reconfigurable logic device may be higher than a second power level supplied to the integrated circuit while a sequencing run and/or sequencing analysis is in progress.
  • a maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers, e.g., traditional sequencers without the first reconfigurable logic device (e.g., FPGA), the integrated circuit (e.g., Al chip), or both.
  • the time consumption in performing a sequencing run and corresponding sequencing analysis (e.g., primary analysis) thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips).
  • Time consumption in performing a sequencing run and sequencing analysis of the sequencing run (e.g., primary analysis) using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and analysis using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips).
  • a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding sequencing analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts.
  • the power source may be configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • the power source may be configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • one or more components of the first reconfigurable logic device and/or integrated circuit may include a computational performance of at least 2, 4, 8, 10, 16, 20, 30, 40, 50, 60, 70, 80, or 100 Giga-operations per second (GOPs) or more.
  • one or more processing engines of the first reconfigurable logic device and/or integrated circuit may include a computational performance of at least 12, 4, 8, 10, 16, 20, 30, 40, 50, 60, 70, 80, or 100 Giga-operations per second (GOPs), or more Giga-operations per second (GOPs), or more.
  • the first reconfigurable logic device and/or the integrated circuit includes a computational performances of at least 10, 20, 40, 50, 60, 80, or 100 Tera-operations per second (TOPs).
  • one or more components are located on a first printed circuit board (PCB).
  • the one or more components may include: the first reconfigurable logic device the first reconfigurable routing channels; the first processor; and the one or more DMA connections.
  • the integrated circuit is located on a second printed circuit board (PCB) different from the first printed circuit board, e.g., as shown in FIG. 5C.
  • the integrated circuit and the second PCB may be positioned within a same housing of the sequencing system as the first PCB or external to the housing of the sequencing system. Being on a separate PCB makes connecting the first reconfigurable logic device, e.g., FPGA device with various integrated circuit on a chip convenient, efficient, and easily customizable.
  • the first PCB board may be a main board
  • the second PCB board may be a daughter board or edge unit.
  • the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs). Instead, the sequencing systems utilizes FPGAs, Al chips, NPUs, or other ASIC chips for performing the operations disclosed herein.
  • the sequencing system disclosed herein advantageously requires less power, generates less heat, and reduces the hardware complexity and costs for performing NGS sequencing runs and corresponding sequencing analysis than sequencers that use GPUs or TPUs.
  • the sequencing systems include logic devices that are not limited to reconfigurable logic devices (e.g., FPGAs) and/or other integrated circuits (e.g., Al chips, NPUs).
  • the sequencing systems include various types of processing units or processors configured for reconfigurable parallel processing,
  • the sequencing systems include various types of logic devices or integrated circuits, e.g., ASIC chips.
  • the sequencing systems include GPUs, TPUs, or other various types of processing units that are configured to perform one or more operations disclosed herein.
  • the sequencing systems include GPUs, TPUs, or other various types of processing units that are configured to perform one or more operations that can be performed by the reconfigurable logic devices (e.g., FPGAs) and/or other integrated circuits (e.g., Al chips, NPUs).
  • the reconfigurable logic devices e.g., FPGAs
  • other integrated circuits e.g., Al chips, NPUs.
  • the first processor may be positioned on the first PCB board together with the reconfigurable logic device for convenient and efficient control of the reconfigurable logic device.
  • the first processor is a separate processor from one or more processors of the sequencing system configured to control the optical system, the fluidics of the sequencing system, etc.
  • the first processor can be configured to only control the components on the first PCB board, e.g., the FPGA device, alone or in combination with components on the second PCB board, e.g., the Al chip.
  • the sequencing system may comprise a second processor that is configured to separately control the Al chip.
  • the first processor or second processor of the sequencing system e.g., 120_c, may comprise a CPU.
  • the one or more hardware processors of the sequencing system comprises a CPU.
  • the first or second processor e.g., 120_c
  • the first or second processor e.g., 120_c
  • the first or second processor comprises only CPU(s).
  • the sequencing system may further comprise a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
  • the operation for processing the sensor data to generate the first plurality of flow cell images comprises one or more of: registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
  • each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are in real time. In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are within the time window of performing sequencing reactions and/or imaging of a single sequencing cycle of the sequencing run. In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are within the time window of performing sequencing reactions and/or imaging of a single z-level of a single sequencing cycle.
  • FIG. 5D shows an exemplary embodiment of performing sequencing analysis in parallel with performing a sequencing run.
  • the sequencing run includes multiple sequencing cycles, only part of a single cycle is shown herein.
  • flow cell images are acquired at multiple z-levels from different color channels of an in situ sample .
  • the sequencing reactions are repeatedly performed for each z-level in each cycle within a time window 5601.
  • the operations of the integrated circuit are performed within a processing window 5602 within the time window 5609 of a single sequencing cycle and also within a time window 5601 for sequencing reactions and imaging at a single z-level 5601.
  • the operations of the first reconfigurable logic device are also performed with a processing window 5603 that is within the time window 5609 of each sequencing cycle.
  • the processing windows 5602 and 5603 may be of identical or different duration depending on various factors such as sequencing data, primary analysis algorithms, etc.
  • the operations are not just performed within the processing windows but completed within the processing windows with respect to the data of the current cycle, e.g., of a preceding z-level of the current cycle that sensor data has been acquired.
  • the operations are completed within the processing windows with respect to the data of a preceding cycle, e.g., the cycle immediately preceding the current cycle.
  • the operations are performed for a single z level in each cycle within a predetermined time window, e.g., 5602, 5603.
  • the predetermined time window is for a single z level in a single sequencing cycle.
  • the predetermined time window is less than 1000 ms, 900 ms, 800 ms, 700ms, 600 ms, 500 ms, 400 ms, 300 ms, 250 ms, 200 ms, or 100 ms.
  • each of the one or more operations are performed within the predetermined time window and in parallel while the sequencing run is in progress.
  • each of the one or more operations are performed in parallel within a time window that sequencing, imaging, or both of a subsequent sequencing cycle is completed.
  • the first plurality of flow cell images herein may be obtained from a single z level of a 2D or 3D sample. In some embodiments, the first plurality of flow cell images herein may be obtained from multiple z levels covering at least partly of an in situ sample, e.g., of cells or tissue(s). The first plurality of flow cell images may be obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample. In some embodiments, the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from multiple color channels. In some embodiments, the first plurality of flow cell images are from a single sequencing cycle.
  • the first plurality of flow cell images are from multiple sequencing cycles.
  • the first plurality of flow cell images may be of a first spatial resolution in x, y, and/or z directions.
  • the second plurality of flow cell images may be generated based on the first plurality of flow cell images.
  • the second plurality of flow cell images may be of a second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be lower than the second spatial resolution, and a higher resolution herein indicates that a pixel size is smaller so that the polonies in the flow cell images are of finer spatial details.
  • the first spatial resolution may be 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be at least 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x,y, and/or z directions.
  • the first and second resolution is in 3D.
  • the first resolution is in a range of 0.1 um to 5 um.
  • the second resolution is in a range of 0.01 um to 2 um.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the sequencing system further comprises one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support.
  • the support may comprise a glass or plastic substrate.
  • the support may be included in a flow cell device.
  • the one or more image sensors may be configured to generate sensor data based on the optical signals.
  • the sequencing system further comprises: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations disclosed herein.
  • the one or more data storage devices may include one or more memory devices.
  • the one or more memory devices may be accessible by the one or more processors, the first processor, the second processor, the first reconfigurable logic device, and the integrated circuit.
  • the one or more processors are separate from the first or second processors.
  • the operations performed by the one or more processors may include one or more of 1) recording sensor data generated in the sequencing system in one or more flow cycles; 2) optionally processing the recorded sensor data; 3) sending the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit; 4) receiving outcome from the first reconfigurable logic device or integrated circuit; and 5) generating sequencing analysis results based on the received outcome.
  • the operations performed by the one or more processors may include one or more of 1) receiving outcome from the first reconfigurable logic device or integrated circuit; and 2) generating sequencing analysis results based on the received outcome.
  • the sequencing analysis results comprise primary analysis results.
  • the sequencing analysis results comprise a data file in a predetermined data format.
  • the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing system further comprises: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors.
  • the optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images.
  • the support may be comprised in a flow cell device.
  • the operation(s) performed by the first reconfigurable routing channels or the integrated circuit using the neural network comprises one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings.
  • the neural network herein comprises a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network comprises a U-Net.
  • the neural network has been pretrained.
  • the neural network has been trained using the first reconfigurable logic device or the integrated circuit.
  • the neural network is a 3D neural network.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel has at least four dimensions.
  • the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
  • n is an integer from 1 to 16384.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n is in a range from 4 to 1024.
  • the neural network has been trained using the first reconfigurable logic device or the integrated circuit.
  • the neural network is a 2D neural network.
  • the first convolution comprises a 2D convolution with a convolution kernel.
  • the convolutional kernel has at least three dimensions.
  • the convolutional kernel is m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer. In some embodiments, n is an integer from 1 to 16384.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n is in a range from 4 to 1024.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n filters in a first, second, third repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, filters in a last repetition, last minus one, last minus two, repetition, respectively.
  • n is in a range from 4 to 1024.
  • the neural network is pretrained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s).
  • the neural networks pretrained with 2D flow cell images are less complex and requires less computational effort in making predictions or inferences, thereby providing higher efficiency and saving time and computational effort in its prediction of polony locations.
  • the neural network pretrained with 2D flow cell images may predict polony locations per tile per cycle in a time window that is lOx, 50x, 80x, lOOx, 200x, 400x, 600x, 800x, lOOOx, 1500x, 2000x or less than making identical predictions using neural networks trained from 3D volumes of flow cell images.
  • the neural network pretrained with 2D flow cell images may predict polony locations per tile per cycle using the reconfigurable logic device and/or other integrated circuits, e.g., FPGA and/or Al chips, in a time window that is 5x, lOx, 20x, 40x, 50x, 80x, lOOx, 200x, 400x, 600x, 800x, lOOOx or less than identical neural network using CPUs or other processors.
  • the reconfigurable logic device and/or other integrated circuits e.g., FPGA and/or Al chips
  • the operation (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
  • the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
  • the fourth plurality of flow cell images comprises the second resolution.
  • the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity. In some embodiments, the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles. In some embodiments, the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences.
  • the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
  • the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
  • the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -IO 10 per mm 2 .
  • the down-sampling factor is 2, 4, or 8. In some embodiments, the up-sampling factor is 2, 4, or 8. In some embodiments, the downsampling factor is 2, 4, 8, 16, 32, 64, or more. In some embodiments, the up-sampling factor is 2, 4, 8, 16, 32, 64, or more.
  • one or more of operations of (a) to (k) are performed while a sequencing run is being performed.
  • the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500.
  • the one or more cycles comprises a current cycle N.
  • N is in a range from 1 to 500.
  • the one or more cycles comprises a single cycle ranging from 1 to 500.
  • the one or more cycles comprises multiple cycles ranging from 1 to 500.
  • one or more of operations e.g., operations (a) to (j), are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • the training data set of flow cell images comprises z-stacks of flow cell images taken at different z-locations, and each z-stack is used as a 3D volume for training the neural network.
  • the training data set of flow cell images comprises 2D flow cell images taken at different z-locations, and individual 2D flow cell images at multiple z-levels are used as 2D images for training the neural network.
  • the training data set of flow cell images comprises simulated flow cell images of in situ samples at different z-locations. In some embodiments, the training data set of flow cell images comprises actual flow cell images acquired from in situ samples at different z-locations.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing up sampling operations comprises: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; (4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous downsample repetition, wherein the first up-sampled result has a same size as the first down- sampled result in the previous down-sampling repetition; and (5) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the different combinations of the first plurality of data processing engines are configured to perform operations further comprising: (a) receiving the second plurality of flow cell images from the integrated circuit; (b) determining polonies from the second plurality of flow cell images; and (c) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (d) forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings to the first processor or one or more hardware processors of the sequencing system or a combination thereof.
  • the one or more operations performed by the first reconfigurable logic device further comprises: forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
  • the one or more operations performed by the integrated circuit further comprises forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system.
  • the operations performed by the first reconfigurable logic device or the integrated circuit further comprising: registering the second plurality of flow cell images to a common coordinate system.
  • the operations performed by the integrated circuit further comprising one or more of: determining polonies from the second plurality of flow cell images; performing a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
  • the operation (d) or (i) of determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies.
  • the operation of generating a 3D polony map comprising spatial location of polonies based on the determined polonies may further comprise: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus.
  • the operation of determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
  • Exemplary embodiments of methods for generating the polony maps are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
  • sequencing methods comprising operations herein. Such operation may include one or more of: (a) obtaining, by a first reconfigurable logic device of a sequencing system, sensor data from one or more sensors of the sequencing system; (b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images; (c) predicting, by the first reconfigurable logic device, a second plurality of flow cell images using a neural network at least partly deployed on the first reconfigurable device and based on the sensor data or the first plurality of flow cell images; (d) determining, by the first reconfigurable logic device, polonies from the second plurality of flow cell images; (e) performing, by the first reconfigurable logic device, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (f) optionally forwarding, by the first reconfigurable logic device, the second plurality of flow cell images, the corresponding base calling,
  • sequencing methods comprising operations herein. Such operations may include one or more of (a) obtaining, by the first reconfigurable logic device, sensor data from one or more image sensors of the sequencing system; (b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images; (c) communicating, by the first reconfigurable logic device to an integrated circuit, the sensor data, the first plurality of flow cell images, or both; (d) receiving, by the integrated circuit and from the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both; (e) predicting, by the integrated circuit, a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; (f) determining, by the integrated circuit, polonies from the second plurality of flow cell images; and (g) performing, by the integrated circuit, a corresponding base calling for each of the determined polonies based on
  • sequencing methods comprising operations herein.
  • Such operation may include one or more of (a) obtaining, by the first reconfigurable logic device of a sequencing system, sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images; (b) communicating, by the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both to the integrated circuit; (c) receiving, by the integrated circuit of the sequencing system, the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; (d) predicting, by the by the integrated circuit, a second plurality of flow cell images using a neural network deployed at least partly on the integrated circuit and based on the sensor data, the first plurality of flow cell images, or both; and (e) communicating, by the integrated circuit, the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
  • the first reconfigurable routing channels comprises one or more electronic nodes, and the electronic nodes are programmable.
  • the electronic nodes here may include junction points in the circuit(s).
  • the electronic nodes may include points where two or more circuit elements are connected together.
  • the first reconfigurable routing channels comprises one or more interconnects.
  • the interconnect may include the physical wiring(s) that connects transistors and other components on an integrated circuit.
  • reconfigurable routing channels comprises one or more memory controllers, e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels comprises one or more network- on-chips (NoCs), e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels may comprise one or more of a network-on-chip (NoC), and a memory controller.
  • the first reconfigurable routing channels may be configured to passively communicate data between components of the sequencing system.
  • the reconfigurable routing channels may be configured to communicate data bilaterally between the data processing engines, e.g., 5011 in FIG. 5C and the memory device, e.g., 5030 in FIG. 5C.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
  • the reconfigurable logic device herein may each comprise one or more data processing engines.
  • Each data processing engine may comprise multiple digital logic circuits.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels.
  • the first reconfigurable logic device may comprise a first integrated circuit forming a FPGA device.
  • the FPGA device in FIG. 5C includes the first reconfigurable logic device, the DMA connections, and the first reconfigurable routing channels (e.g., NoC and memory controllers).
  • the sequencing system may further comprises one or more memory devices electrically connected for data communication with one or more components of the sequencing system, the one or more components may include one or more of the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; a second processor; and one or more processors of the sequencing system.
  • the sequencing system further comprises one or more direct data access (DMA) connections, e.g., 5012 in FIG. 5C, that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections may be configured to actively communicate data between components of the sequencing system.
  • the DMA connections may be configured to fetch data or send data to components that are connected thereto, e.g., the data processing engines, e.g., 5011 in FIG. 5C and the reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections herein may be configured to actively request data from or actively sending data directly to: the first reconfigurable logic device; the first reconfigurable routing channels; the integrated circuit; or a combination thereof.
  • One or more direct data access (DMA) connections may be in data communication the first reconfigurable routing channels and the integrated circuit herein.
  • the DMA connections may be configured to allow data communication based on a predetermined protocol, e.g., a PCIe protocol.
  • the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
  • the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
  • the sequencing system further comprises an integrated circuit that is different from the first reconfigurable logic device, e.g., 120_b in FIG. 5C.
  • the integrated circuit herein may not be reconfigurable.
  • the integrated circuit may comprise an application specific integrated circuit (ASIC) chip.
  • the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip.
  • the integrated circuit may comprise a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits.
  • the integrated circuit may further comprise: second plurality of data processing engines and second routing channels, each connecting at least some of the second plurality of data processing engines.
  • the sequencing system further comprises a first processor.
  • the first processor may be configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations disclosed herein.
  • the first processor or a second processor is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations.
  • the first processor or a second processor may be configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations herein.
  • the sequencing system may further comprise a housing that encloses the first reconfigurable logic device, the first reconfigurable routing channels, the one or more DMA connections, the integrated circuit, and the first processor therein.
  • the sequencing system further comprises: a housing that encloses at least the first reconfigurable logic device therein and the integrated circuit is external to the housing.
  • the sequencing system further comprises: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit.
  • a first power level supplied by the power source to the first reconfigurable logic device may be higher than a second power level supplied to the integrated circuit while a sequencing run and/or sequencing analysis is in progress.
  • a maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers, e.g., traditional sequencers without the first reconfigurable logic device (e.g., FPGA), the integrated circuit (e.g., Al chip), or both.
  • the time consumption in performing a sequencing run and corresponding sequencing analysis (e.g., primary analysis) thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips).
  • Time consumption in performing a sequencing run and sequencing analysis of the sequencing run (e.g., primary analysis) using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and analysis using a sequencer without the first reconfigurable logic device, the integrated circuit, or both(e.g., a traditional sequencer without FPGA and/or Al chips).
  • a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding sequencing analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts.
  • the sequencing system further comprises a power source configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • the sequencing system further comprises a power source configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • one or more components are located on a first printed circuit board (PCB).
  • the one or more components may include: the first reconfigurable logic device the first reconfigurable routing channels; the first processor; and the one or more DMA connections.
  • the integrated circuit is located on a second printed circuit board (PCB) different from the first printed circuit board, e.g., as shown in FIG. 5C.
  • the integrated circuit and the second PCB may be positioned within a same housing of the sequencing system as the first PCB or external to the housing of the sequencing system. Being on a separate PCB makes connecting the first reconfigurable logic device, e.g., FPGA device with various integrated circuit on a chip convenient, efficient, and easily customizable.
  • the first PCB board may be a main board
  • the second PCB board may be a daughter board.
  • the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs). Instead, the sequencing systems utilizes FPGAs, Al chips, NPUs, or other ASIC chips for performing the operations disclosed herein.
  • the sequencing system disclosed herein advantageously requires less power, generate less heat, and reduces the hardware costs for performing NGS sequencing runs and corresponding sequencing analysis.
  • the first processor may be positioned on the first PCB board together with the reconfigurable logic device for convenient and efficient control of the reconfigurable logic device.
  • the first processor is a separate processor from one or more processors of the sequencing system configured to control the optical system, the fluidics of the sequencing system, etc.
  • the first processor can be configured to only control the components on the first PCB board, e.g., the FPGA device, alone or in combination with components on the second PCB board, e.g., the Al chip.
  • the sequencing system may comprise a second processor that is configured to separately control the Al chip.
  • the first processor or second processor of the sequencing system e.g., 120_c, may comprise a CPU.
  • the one or more hardware processors of the sequencing system comprises a CPU.
  • the sensor data at the imager 116 can be communicated directly to the data processing engine(s) 5011 of the first reconfigurable logic device 120(a).
  • the sensor data may be saved into a memory device, e.g., 5030 so that it can be accessed by the data processing engine.
  • the first processor 120_c may control operation of the data processing engines and the routing channels to process the sensor data and generate the first plurality of flow cell images.
  • the processing may include operations disclosed herein such as intensity normalization, color correction, phasing and prephasing correction, background subtraction, etc.
  • the first plurality of flow cell images may then be communicated from the processing engines through the routing channels to the memory device 5030 so that the integrated circuit may be controlled by the first processor or a second processor to access the first plurality of flow cell images for subsequent steps in primary analysis.
  • the first plurality of flow cell images may be directly communicated to the integrated circuit 120-b via DMA connections 5012.
  • the integrated circuit is only used for prediction higher resolution polony locations using a pretrained CNN, thereby generating the second plurality of flow cell images with a resolution that is at least 8 times higher than the resolution of the first plurality of flow cell images.
  • the CNN may be pretrained using simulated images or real flow cell images.
  • the second plurality of flow cell images are transmitted back from the integrated circuit to the first reconfigurable logic device for subsequent processing steps such as base calling.
  • the base calls along with quality information may then be saved into a FastQ data file.
  • Other information including cell segmentation and staining may also be saved in the same file or another FastQ file with compatible data format.
  • the sequencing system may further comprise a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
  • the operation for processing the sensor data to generate the first plurality of flow cell images comprises one or more of registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
  • each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are performed within the time window of performing a single sequencing cycle of the sequencing run.
  • FIG. 5D shows an exemplary embodiment of performing sequencing analysis in parallel with performing a sequencing run.
  • the sequencing run include multiple sequencing cycles. For each cycle, flow cell images are acquired at multiple z-levels from different color channels. The sequencing reactions are repeatedly performed for each z- level in each cycle within a time window 5601. The operations of the integrated circuit are performed within a processing window 5602 within the time window 5609 of a single sequencing cycle and also within a time window 5601 for sequencing reactions and imaging at a single z-level 5601.
  • the operations of the first reconfigurable logic device are also performed with a processing window 5603 that is within the time window 5609 of each sequencing cycle.
  • the processing windows 5602 and 5603 may be identical or different depending on various factors such as sequencing data, primary analysis algorithms, etc.
  • the operations are not just performed within the processing windows but completed within the processing windows with respect to the data of the current cycle, e.g., a preceding z- level that sensor data has been acquired.
  • the operations are completed within the processing windows with respect to the data of a preceding cycle, e.g., the cycle immediately preceding the current cycle.
  • the operations are performed for a single z level in each cycle within a predetermined time window, e.g., 5602, 5603.
  • the predetermined time window is for a single z level in a single sequencing cycle.
  • the predetermined time window is less than 1000 ms, 900 ms, 800 ms, 700 ms, 600 ms, 500 ms, 400 ms, 300 ms, 250 ms, 200 ms, or 100 ms.
  • each of the one or more operations are performed within the predetermined time window and in parallel while the sequencing run is in progress.
  • the first plurality of flow cell images herein may be obtained from multiple z levels covering at least partly of an in situ sample, e.g., of cells or tissue(s).
  • the first plurality of flow cell images may be obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample.
  • the first plurality of flow cell images are from a single color channel.
  • the first plurality of flow cell images may be of a first spatial resolution in x, y, and/or z directions.
  • the second plurality of flow cell images may be generated based on the first plurality of flow cell images.
  • the second plurality of flow cell images may be of a second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be lower than the second spatial resolution, and a higher resolution herein indicate that a pixel size is smaller so that the polonies in the flow cell images are of finer spatial details.
  • the first spatial resolution may be 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be at least 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x,y, and/or z directions.
  • the first and second resolution is in 3D.
  • the first resolution is in a range of 0.1 um to 5 um.
  • the second resolution is in a range of 0.01 um to 2 um.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the sequencing system further comprises one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support.
  • the support may comprise a glass or plastic substrate.
  • the support may be comprised in a flow cell device.
  • the one or more image sensors may be configured to generated sensor data based on the optical signals.
  • the sequencing system further comprises: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations disclosed herein.
  • the one or more data storage devices may include one or more memory devices.
  • the one or more memory devices may be accessible by the one or more processors, the first processor, the second processor, the first reconfigurable logic device, the integrated circuit.
  • the one or more processors are separate from the first or second processors.
  • the operations performed by the one or more processors may include one or more of: 1) recording sensor data generated in the sequencing system in one or more flow cycles; 2) optionally processing the recorded sensor data; 3) sending the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit; 4) receiving outcome from the first reconfigurable logic device or integrated circuit; and 5) generating sequencing analysis results based on the received outcome.
  • the operations performed by the one or more processors may include one or more of: 1) receiving outcome from the first reconfigurable logic device or integrated circuit; and 2) generating sequencing analysis results based on the received outcome.
  • the sequencing analysis results comprise primary analysis results.
  • the sequencing analysis results comprise a data file in a predetermined data format.
  • the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing system further comprises: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors.
  • the optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images.
  • the support may be comprised in a flow cell device.
  • the output data comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data comprises identification of base calling locations in two dimensions. In some embodiments, the output data comprises identification of base calling locations in three dimensions.
  • the operation(s) performed by the first reconfigurable routing channels or the integrated circuit using the neural network comprises one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings.
  • the neural network comprises a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network comprises a U-Net.
  • the neural network has been trained using the first reconfigurable logic device or the integrated circuit.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel have at least four dimension.
  • the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer. In some embodiments, n is an integer from 1 to 16384.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n is in a range from 4 to 1024.
  • the operation (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
  • the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
  • the fourth plurality of flow cell images comprises the second resolution.
  • the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity. In some embodiments, the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles. In some embodiments, the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences.
  • the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
  • the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
  • the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -10 10 per mm 2 .
  • the down-sampling factor is 2, 4, or 8. In some embodiments, the up-sampling factor is 2, 4, or 8. In some embodiments, the downsampling factor is 2, 4, 8, 16, 32 or 64. In some embodiments, the up-sampling factor is 2, 4, 8, 16, 32, or 64.
  • one or more of operations of (a) to (k) are performed while a sequencing run is being performed.
  • the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500.
  • the one or more cycles comprises a current cycle N.
  • N is in a range from 1 to 500.
  • the one or more cycles comprises a single cycle ranging from 1 to 500.
  • the one or more cycles comprises multiple cycles ranging from 1 to 500.
  • one or more of operations e.g., operations (a) to (j), are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing up sampling operations comprises: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; (4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous downsample repetition, wherein the first up-sampled result has a same size as the first down- sampled result in the previous down-sampling repetition; and (5) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the different combinations of the first plurality of data processing engines are configured to perform operations further comprising: (a) receiving the second plurality of flow cell images from the integrated circuit; (b) determining polonies from the second plurality of flow cell images; and (c) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (d) forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
  • the one or more operations performed by the first reconfigurable logic device further comprises: forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
  • the one or more operations performed by the integrated circuit further comprises forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system.
  • the operations performed by the integrated circuit further comprising one or more of: determining polonies from the second plurality of flow cell images; performing a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
  • the operations performed by the first reconfigurable logic device or the integrated circuit further comprising: registering the second plurality of flow cell images to a common coordinate system.
  • the operation (d) or (i) of determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies.
  • the operation of generating a 3D polony map comprising spatial location of polonies based on the determined polonies may further comprise: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus.
  • the operation of determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
  • Exemplary embodiments of methods for generating 3D polony map are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
  • the method further comprises: providing the cellular sample harboring a plurality of RNA which comprises the first target RNA molecule and the second target RNA molecule. In some embodiments, the method further comprises: generating inside the cellular sample a plurality of cDNA molecules which include a first target cDNA molecule that corresponds to the first target RNA molecule and a second target cDNA molecule that corresponds to the second target RNA molecule. In some embodiments, the method further comprises: contacting the plurality of cDNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of first target-specific padlock probes and a second plurality of second target-specific padlock probes.
  • the method further comprises: contacting the plurality of RNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
  • individual padlock probes in the first plurality of first target-specific padlock probes comprise: first and second terminal regions, wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule or the first target RNA molecule, and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule or the first target RNA molecule.
  • contacting the plurality of RNA molecules in the cellular sample with the plurality of target-specific padlock probes comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule or the first target RNA molecule to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions.
  • the first targetspecific padlock probe comprises a first target barcode sequence that corresponds to and uniquely identifies the first target cDNA sequence or the first target RNA sequence.
  • the first target-specific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule or the first target RNA sequence.
  • the first target-specific padlock probe comprises at least one universal adaptor sequence.
  • the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer or a complementary sequence thereof.
  • the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site or a complementary sequence thereof.
  • the method further comprises: closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the cellular sample.
  • the method further comprises: conducting a rolling circle amplification reaction inside the cellular sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least the first concatemer molecule that corresponds to the first target RNA molecule, and the second concatemer molecule that corresponds to the second target RNA molecule.
  • the first concatemer comprises: tandem repeat units of: a first target barcode sequence that uniquely identifies the first target RNA or the first target cDNA sequence, a first insert sequences that corresponds to the first target RNA or the first target cDNA, and a first sequencing primer binding site or a complementary sequence thereof.
  • the first concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
  • the second concatemer comprises: tandem repeat units of: a second target barcode sequence that uniquely identifies the second target RNA or the second target cDNA sequence, a second insert sequences that corresponds to the second target RNA or the second target cDNA, and a second sequencing primer binding site or a complementary sequence thereof.
  • the second concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
  • conducting the one or more cycles of sequencing reactions comprises: contacting the plurality of concatemer molecules inside the cellular sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers.
  • the plurality of nucleotide reagents comprise: multivalent molecules, nucleotides, nucleotide analogs, or their combinations.
  • individual nucleotides or nucleotide analogs are detectably labeled or non-labeled.
  • the detectably labeled individual nucleotides or nucleotide analogs comprises a different detectable color label that corresponds with each different type of nucleotide base of A, G, C, and T/U.
  • an individual multivalent molecule comprise a core attached with multiple nucleotide arms and each arm of the individual multivalent molecule comprises the same type of nucleotide base.
  • generating the first plurality of flow cell images comprises: in each cycle, imaging, by an optical system, optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
  • the first plurality of flow cell images comprises optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
  • conducting the one or more cycles of sequencing reactions comprises: sequencing only the first target barcode sequence region of the first concatemer, thereby generating the first sequencing read product.
  • conducting the one or more cycles of sequencing reactions comprises: sequencing the first target barcode sequence region and at least a portion of the first insert sequence of the first concatemer, thereby generating the first sequencing read product.
  • conducting the one or more cycles of sequencing reactions comprises: sequencing only the second target barcode sequence region of the second concatemer, thereby generating the second sequencing read product. In some embodiments, conducting the one or more cycles of sequencing reactions comprises: sequencing the second target barcode sequence region and at least a portion of the second insert sequence of the second concatemer, thereby generating the second sequencing read product.
  • the method further comprises: removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample.
  • the method further comprises: reiteratively sequencing the plurality of concatemers by repeating the following operations for at least once: generating the first plurality of flow cell images of a cellular sample immobilized on a support by conducting one or more cycles of sequencing reactions thereby generating the first sequencing read product and the second sequencing product, the cellular sample comprising a plurality of concatemer molecules therewithin, wherein a first concatemer molecule of the plurality of concatemer molecules corresponds to a first target RNA molecule of the cellular sample, and a second concatemer molecule of the plurality of concatemer molecules corresponds to a second target RNA molecule of the cellular sample, wherein the first plurality of flow cell images; and removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample.
  • the first sequencing read product comprises some or all of: a first target barcode sequence in one or more tandem units of the first concatemer molecule; a first insert sequence in one or more tandem units of the first concatemer molecule; or their combinations.
  • the method further comprises: confirming presence of the first target RNA molecule, the second target RNA molecule, or both molecules in the cellular sample based on the performed base calling of the second plurality of flow cell images at the base calling locations in the base calling template.
  • the method further comprises: generating, by the sequencing system, the second plurality of flow cell images of the cellular sample immobilized on the support by conducting subsequent cycles of sequencing reactions after the one or more cycles.
  • generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the first concatemer inside the cellular sample under a condition that inhibits sequencing the second concatemer.
  • sequencing at least the first concatemer inside the cellular sample comprises: generating a plurality of first sequencing read products, and wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm presence of the first target RNA in the cellular sample.
  • generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the second concatemer inside the cellular sample under a condition that inhibits sequencing the first concatemer.
  • sequencing at least the second concatemer inside the cellular sample comprises: generating a plurality of second sequencing read products, and wherein sequences of the second sequencing read products are aligned with a second target reference sequence to confirm presence of the second target RNA in the cellular sample. Predicting high resolution flow cell images
  • FIG. 5A shows a flow chart of a computer-implemented method 500 for predicting high resolution flow cell images thereby improving detectable polony density in the flow cell images.
  • the method 500 can include some or all of the operations disclosed herein. The operations may be performed in but is not limited to the order that is described herein.
  • the method 500 can be performed by one or more processors disclosed herein.
  • the processor can include one or more of: a computing system comprising a processing unit 118, a reconfigurable logic device 120, an integrated circuit that is not reconfigurable 120, or their combinations.
  • the processing unit can include a central processing unit (CPU).
  • the reconfigurable logic device can include one or more FPGA devices.
  • the integrated circuit can include a chip such as an Al chip or an ASIC chip.
  • the one or more processors can include the computer system 400 disclosed herein.
  • some or all operations in method 500, 600, 700, 2800, and 2900 can be performed by the reconfigurable logic device, e.g., the FPGA(s), and/or the integrated circuit, e.g., the Al chip.
  • the reconfigurable logic device e.g., the FPGA(s)
  • the integrated circuit e.g., the Al chip.
  • the data produced by the reconfigurable logic device and/or integrated circuit, e.g., the FPGA(s) after performing one or more operations can be communicated to various hardware elements of the system 100, e.g., CPU(s) or GPU(s), so that subsequent operation(s) in method 500, 600, 700, 2800, and 2900 can be performed by such various hardware using the communicated data.
  • data can also be communicated in the opposite direction from various hardware e.g., CPU(s), to the reconfigurable logic device or the integrated circuit for processing.
  • all the operations in method 500, 600, 700, 2800, and 2900 can be performed by CPU(s).
  • the operations performed by CPU(s) can be performed by other processors such as the dedicated processors, or GPU(s).
  • all the operations in method 500, 600, 700, 2800, and 2900 can be performed by the reconfigurable logic device and/or the integrated circuit, e.g., FPGA(s) and/or the Al chip(s).
  • the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit, e.g., via DMA connections. In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit without being routed first to a CPU, a GPU, or any other processing units before reaching the reconfigurable logic device and/or the integrated circuit.
  • predicting high resolution flow cell images using the methods 500 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips may require at least 2x, 8x, lOx, 15x, 20x, 40x, 50x, or lOOx less power than making the same predict! on(s) using other computing hardware including but not limited to CPUs or GPUs.
  • the sequencing system herein further comprises: a power source that is configured to supply identical or different power levels to the reconfigurable logic device and the integrated circuit.
  • a maximum power output of the power source to the sequencing system in performing methods 500, 600, 700, 2800, and/or 2900 is less than 2000 Watts, 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, 300 Watts, 200 Watts, or 100 Watts.
  • the sequencing system herein comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 500, 2800) to make predictions.
  • a first reconfigurable logic device e.g., a FPGA unit
  • first reconfigurable routing channels each connecting at least some of the first plurality of data processing engines
  • a neural network deployed at least partly on the first reconfigurable logic device
  • a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 500
  • the sequencing system herein comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit in data communication with the first reconfigurable logic device; a neural network deployed at least partly on the integrated circuit and/or the first reconfigurable logic device; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations in methods herein (e.g., methods 500, 2800) to make prediction using the neural network.
  • the first reconfigurable logic device and the integrated circuit is within the same physical housing as the other elements of the sequencing system as show in FIG 1.
  • the first reconfigurable logic device and the integrated circuit are not physically external to the sequencing system 110 as shown in FIG. 1, e.g., not in the cloud 130.
  • the method 500 can comprise an operation 510 of (i) generating, by the sequencing system 110, a first plurality of flow cell images of sample(s) immobilized on a support by conducting one or more cycles of sequencing reactions.
  • the sample(s) may comprise concatemer molecules therewithin.
  • the sample(s) may include concatemer molecules from one or more different sample sources.
  • the sample(s) may include a thickness along the z-axis so that the first plurality of flow cell images may be acquired at a z-stack of different z-locations with a first resolution to cover the sample in 3D.
  • the samples may be acquired from a single z-location of a 2D or 3D sample.
  • the sample can be in situ.
  • the sample can be a 3D sample.
  • the sample can be a volumetric sample that may contain different biological information at the same x-y location but different z levels.
  • the sample can be a cellular sample including multiple cells, tissue, or their combination.
  • the sample can be any biological sample that has a thickness that is greater than a predetermined threshold along the z axis. For example, the thickness can be greater than 2 um, 3 um, 4 um, 5 um, 10 um, 20 um, or more.
  • the z axis (e.g., z axis) is orthogonal to the image plane defined by x and y axes.
  • the sample can be traditional 2D sequencing samples.
  • such computer-implemented method comprises an operation (i) of generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, wherein the first plurality of flow cell images are acquired with a first resolution.
  • Such operation is similar to operation 510 in FIG. 5 A except that the sample may be 2D or 3D sample.
  • the sample comprises concatemer molecules therewithin.
  • the sample comprises template molecules therewithin.
  • the flow cell images can be acquired using the optical system of the imager 116 disclosed herein, from the 1, 2, 3, 4, or more color channels.
  • Each flow cell image can include at least a portion of one or more tiles (e.g., imaging areas). Each tile can be divided into multiple subfiles.
  • Each tile or subtile can include a plurality of polonies or clusters. Each subtile can include multiple regions with each region including a number of polonies or clusters.
  • the flow cell image as disclosed herein can be an image that is acquired from a flow cell 112 as shown in FIG. 1 or 2712 as shown in FIG. 27.
  • the flow cell images are acquired from a single color channel, and subsequent prediction is by using a pretrained neural network corresponding to that single channel.
  • the flow cell images are acquired from 2, 3, 4, or more color channels, and subsequent prediction is by using a pretrained neural network corresponding to the multiple color channels.
  • a flow cell image herein can be an image of one or more tiles, one or more subtiles, one or more segmented regions within tile(s) or subtile(s), or their combinations.
  • Each flow cell image can comprise a field of view (FOV).
  • the FOV can be orthogonal to the z axis.
  • the FOV can be within the x-y plane.
  • the FOV of different flow cell images at different z levels can be identical within the x-y plane.
  • the FOV of different flow cell images at different z levels can have at least an overlapping portion within the x-y plane.
  • the image resolution of different flow cell images at different z levels can be about identical or exactly identical.
  • FIGS. 3A and 3D show two exemplary flow cell images acquired at two different z levels along the z axis of a same 3D sample within a same sequencing cycle.
  • the FOV can be in 3D and be of various sizes to cover the volumetric sample to be imaged.
  • the FOV along x, y, and/or z direction can be in a range from 10 um to 5 mm.
  • the FOV along x, y, and/or z direction can be in a range from about 0.1 um to about 2 mm.
  • the FOV along x, y, and/or z direction can be in a range from 0.5 um to 1 mm.
  • the FOV can be about 0.5 mm by 0.5 mm by 20 um for certain cellular samples along the x, y, and z direction, respectively.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be any integer greater than 64 or 128.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be in a range from 2 to 65536.
  • a single flow cell image can be separated into different number of regions, for example, 4, 8, 16, or even more regions, and each region may include a size of 256 by 256 by 1, 512 by 512 by 3, or other sizes.
  • the number of pixels along x, y, and/or z direction may be adjusted to maintain a particular spatial resolution in a given FOV. For example, with a spatial resolution of 0.2 um, to cover a FOV of 0.8 mm, the number of pixels may be 4000.
  • Each flow cell image at a specific z level may include intensities generated by polonies or clusters at the corresponding z level.
  • signals from polonies or clusters are small bright spots within the images.
  • Each bright spot can be of various sizes that is less than a couple of pixels, e.g., less than a pixel, about a pixel, about 2 pixels, 3 pixels, 4, pixels, 5 pixels, or more.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.01 pixel to about 100 pixels.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.1 pixel to about 16 pixels.
  • Each flow cell image can also include intensities generated by the cell and its structural elements. Such structural elements can be background objects or components, e.g., in FIG. 3 A. Each flow cell images can also include noise and/or artifacts that are not from the polonies or cellular structures.
  • the optical system when the depth of field the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis, polonies or clusters that are within the range of depth of field can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image. So, such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurry signal spots represent out-of-focus polonies or clusters. Some of the out-of- focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • the flow cell images are from multiple color channels. In some embodiments, the flow cell images are of unbalanced nucleotide diversity. In some embodiments, the flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more sequencing cycles. In some embodiments, the flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences. In some embodiments, different insert sequences correspond to different target RNA molecules or target cDNA molecules.
  • each location of the determined polonies corresponds to a location of the concatemer molecules.
  • the flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support.
  • the flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
  • the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases that is less than 20%, 15%, 10%, or 5% in the one or more sequencing cycles.
  • the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
  • bases calls from the polonies include 4 different bases, and percentage of polonies for each of the 4 different bases can be greater than about 10% so that the data are of balanced diversity.
  • bases called from the plurality of polonies includes 4 or less different bases, and percentage of polonies for one or more bases can be less than about 10%, and such data can be considered as unbalanced diversity.
  • bases called from the plurality of polonies include 4 or less different bases, and percentage of polonies for some of the bases can be less than about 5%, about 2%, or even about 1%, and such data can be considered as unbalanced diversity.
  • the unbalanced diversity data include bases A, T, C, G in the plurality of polonies, and their percentages of the total base calls are about 1%, about 2%, about 1%, and about 95%, respectively.
  • plexity can also be a factor that when plexity is lower than a number, e.g., 8 or 16, the signal is of unbalanced diversity.
  • the method 500 is configured to predict high resolution flow cell images even if the polonies in the acquired flow cell images are of unbalanced diversity in one or more sequencing cycles.
  • the method 500 comprises an operation 520 of (ii) providing, by a processor, the first plurality of flow cell images as an input to a neural network (e.g., CNN), wherein the neural network is pre-trained using a training data set of training flow cell images using a training method 600 herein.
  • the neural network is pretrained so that the values of parameters of the neural network has been optimized based on the training.
  • the neural network may be retrained when needed, for example, for predicting flow cell images from different cellular samples.
  • the computer-implemented method 500 may be used to predict high resolution flow cell images that are at higher resolution than the first plurality of flow cell images (e.g., 2x, 4x, or 6x along at least one spatial dimension) acquired by the imager 116.
  • the high resolution flow cell images may be post image-processing images of the first plurality of flow cell images, e.g., by going through the image processing part 3120 of the neural network in FIG. 31.
  • Image processing herein may include various image processing steps including but are not limited to: background removal, background reduction, artifact removal, artifact suppression, adjusting signal to noise ratio, adjusting contrast to noise ratio, intensity normalization, intensity offset correction, noise reduction, color correction, phasing or dephasing correction, image registration, and deconvolution.
  • the neural network in operation 520 is a first neural network that can be trained using method 700 disclosed herein.
  • the method 500 comprises an operation 520’ in replacement of the operation 520.
  • the operation 520 includes (ii) providing, by a processor or a first reconfigurable logic device, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images and reference base calls of the training dataset.
  • the operation 520’ is similar to the operation 520, e.g., as shown in FIG. 5 A, with the exception of a different neural network.
  • the operation 520’ may replace the operation 520 in method 500.
  • the neural network in operation 520’ is a different neural network from that in operation 520.
  • the neural network in operation 520 is a first neural network
  • the neural network in operation 520’ is a second neural network that is different from the first neural network in operation 520.
  • the difference(s) among the first and second neural networks may include but is not limited to: different types of neural networks, differences in values of parameters, number of parameters, number of convolutional layers, number of layers, or a combination thereof.
  • the second neural network in operation 520’ is a different neural network that is pretrained using the same training data set of flow cell images as that used for training the first neural network in operation 520.
  • the second neural network in operation 520’ is a different neural network that is pretrained using a different training data set of flow cell images as that used for training the first neural network in operation 520. In some embodiments, the second neural network in operation 520’ is pre-trained using reference base calls of the training dataset.
  • the first neural network in operation 520 is pretrained using reference images or reference intensities as ground truths, e.g., reference high resolution images or reference intensities in high resolution images, and the second neural network in operation 520 is pre-training using reference base calls of the training flow cell images in the training datasets as ground truths.
  • the reference base calls may be generated using various base calling methods including those methods disclosed herein in relation to training methods for predicting base calls herein. In some embodiments, the reference base calls may be generated using methods that lacks usage of a neural network. Exemplary embodiments of generating base calls from flow cell images are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
  • the second neural network in operation 520’ may be trained using a training method similar to methods 700 in FIG. 5E, In such embodiments, the reference intensities are not used in operations, e.g., operations 725, 730, and 755. Instead, reference base calls are used in such operations, e.g., operations 725’, 730’, and 755.
  • the loss function for training the second neural network in operation 520’ may be different from the loss function used in training the first neural network in the operation 520. In some embodiments, various loss functions may be used for training the second neural network in operation 520’.
  • the second neural network is pre-trained using one or more loss functions based on comparing training base calls of the training flow cell images to the reference base calls of the training flow cell images.
  • the loss function for training the second neural network in operation 520’ may be based on comparison of training outputs, e.g., base calls, to the reference base calls.
  • training of the second neural network in operation 520’ may be completed when the loss function satisfies a predetermined criteria.
  • the predetermined criteria can be customized to include various aspects of training outputs.
  • the predetermined criteria is determined based on the comparison of training base calls to reference base calls.
  • the predetermined criteria is based on the correctness of the training base calls in comparison to the reference base calls.
  • the predetermined criteria is at least based on training time that has been spent.
  • FIG. 31 is a block diagram showing an exemplary embodiment of the first and second neural networks and the method for training such neural networks.
  • neural network 3110 may be any artificial intelligence-based or machine learning based model that can include an imaging processing part 3120 and a base calling part 3130.
  • the imaging processing part 3120 and the base calling part 3130 each of them may be any artificial intelligence-based or machine learning based model that may achieve similar functions as the neural network-based equivalent
  • the neural network 3110 may be the first neural network in operation 520 or the second neural network in operation 520’.
  • the method for training the neural network 3110 may be method 700 as an example.
  • the neural network may include two separate parts, the first part is the image processing part 3120, and the second part is the base calling part 3130.
  • the image processing part 3120 is configured to perform one or more image processing steps disclosed herein, e.g., in relation to method 500, on the flow cell images herein, e.g., the first or second plurality of flow cell images.
  • the one or more image processing steps may include but are not limited to: background removal, background reduction, artifact removal, artifact suppression, adjusting signal to noise ratio, adjusting contrast to noise ratio, intensity normalization, intensity offset correction, noise reduction, color correction, phasing or dephasing correction, image registration, intensity extraction, and deconvolution.
  • the base calling part 3130 is configured to perform base calling using the output images 3150 from the image processing part 3120 of the neural network.
  • the base calling part 3130 may be configured to perform some image processing steps including but not limited to intensity extraction, color correction, and/or phasing or dephasing correction in embodiments where such image processing steps are not performed in the image processing part 3120 of the neural network.
  • the first or second part of the neural network 3120, 3130 may each include one or more structural elements of the neural network such as a convolutional layer.
  • the first or second part of the neural network 3120, 3130 may include one or more embedding layers of the neural network.
  • the first part of the neural network 3120 may include at least part of an encoder of the neural network
  • the second part of the neural network may include at least part of an decoder of the neural network.
  • the second part of the neural network may include at least part of: a convolutional layer, a pooling layer, a fully connected layer, a SoftMax layer, an input layer, an output layer, an embedding layer, an encoder, and a decoder of the neural network.
  • the base calling part 3130 may lack any structural element of a neural network, e.g., a convolutional layer or a pooling layer. In some embodiments, the base calling part 3130 may lack any artificial-intelligence based algorithm. In some embodiments, the base calling part 3130 may lack any convolutional layers of the neural network. In some embodiments, the base calling part 3130 may lack any part of an embedding layer or a decoder of the neural network. In some embodiments, the base calling part 3130 may lack any part of: a convolutional layer, a pooling layer, a fully connected layer, a SoftMax layer, an input layer, an output layer, an embedding layer, an encoder, and a decoder of the neural network.
  • the base calling part 3130 may only comprise non-neural network base calling algorithm(s).
  • the neural network 3110 that generates the output images 3150 is the second neural network in operation 520’ .
  • the neural network 3110 that generate the output base calls 3160 is the neural network disclosed in relation to method 2800.
  • training of the neural network may include training of one or more parameters of the base calling part 3130.
  • the one or more parameters may include a feature size.
  • the back propagation for finding adjustments of values for parameters of the neural network 3110 goes through the base calling part 3130 with making any adjustment to parameters of the base calling part 3130 and the image processing part 3120 (with adjustment of parameters) as the solid gray line with arrow shown in FIG. 31.
  • training of the neural network does not include training of any parameters of the base calling part 3130.
  • the back propagation for finding adjustments of values for parameters of the neural network 3110 may go through the base calling part 3130 without making any adjustment to parameters of the base calling part 3130 and then the image processing part 3120 (but with adjustment of parameters) as the solid gray line with arrow shown in FIG. 31.
  • training of the neural network may include training of the base calling part 3130 and the image processing part 3120, including adjusting parameters from both parts, as shown in FIG. 31 as the solid gray line with arrow.
  • training of the neural network may only include training of the image processing part 3120 but not training of any of the parameters in the base calling part 3130.
  • the back propagation for updating thereby training the parameters of the neural network 3110 goes directly to the image processing part 3120 without going through the base calling part 3130 as shown in FIG. 31 as the dotted grey line with arrow.
  • the base calling part 3130 is not trained and the parameters in the base calling part 3130 are fixed.
  • the loss function may be based on the output of the base calling part 3130, and the value of the loss function may be determined based on the output of the base calling part.
  • the input 3140 may go through the image processing part 3120 to generate the output images 3150.
  • the output images 3150 comprise the second plurality of flow cell images, e.g., disclosed herein in relation to methods 500.
  • the output images 3150 comprise high resolution post-processing images corresponding to the input images 3140.
  • the output images may go through the base calling part 3130 to generate the base calls 3160.
  • the output images 3150 may go through various base calling algorithms, e.g., non-neural network based traditional base calling algorithms, but not the base calling part 3130 for generating the base calls.
  • the neural network 3110 advantageously reduces the time required to make predictions, and reduces the computational burden and power required to make the prediction comparing with existing neural networks that predicts base calls.
  • the input images 3140 comprise raw flow cell images acquired at the imager 116. In some embodiments, the input images 3140 comprise the first plurality of flow cell images disclosed herein. In some embodiments, the input images 3140 may be from multiple color channels and multiple sequencing cycles. In some embodiments, the input images 3140 may be from multiple color channels and a single sequencing cycle. In some embodiments, the input images 3140 may be from a single color channel and multiple sequencing cycles. In some embodiments, the input images 3140 may be from a single z level or multiple z levels.
  • references or ground truths 3180 can be used for comparison of the output base calls 3160, and the value of the loss function 3170 can be calculated based on such comparison.
  • the value of the loss function then can be used during training for back propagation into the neural network 3110 for adjusting values of the parameters of the neural network 3110, e.g., gradients.
  • adjusting parameters of the neural network may include parameters of the base calling part 3130 and the image processing part 3120. In other words, both parts 3120, 3130 are trained during training of the neural network, e.g., using the training methods herein 700, 2900.
  • the value of the loss function then can be back propagated into the neural network 3110 for adjusting values of the parameters of only the image processing part 3120, but not the base calling part 3130.
  • only the image processing part 3120, but not the base calling part 3130 is trained during training of the neural network, e.g., using the training methods herein 700, 2900.
  • the neural network 3110 that is trained only on the image processing part 3120 is the second neural network in operation 520’.
  • the neural network 3110 that is trained on both the image processing part 3120 and the base calling part 3130 is the second neural network in operation 520’ .
  • the second neural network in operation 520’ comprises a convolutional neural network. In some embodiments, the second neural network in operation 520’ comprises a recurrent neural network. In some embodiments, the second neural network in operation 520’ comprises a U-Net, residual U-Net, ResNet (residual neural network), and/or a LSTM (long short-term memory) neural network.
  • the training flow cell images are acquired only from a same color channel.
  • each of the training flow cell images comprise flow cell images of a same field of view from a plurality of sequencing cycles stacked along a time dimension.
  • the plurality of sequencing cycles may be of a same sequencing run.
  • the plurality of sequencing cycles may be consecutive sequencing cycles in the sequencing run.
  • each of the training flow cell images comprise flow cell images of a same field of view from one or more sequencing cycles.
  • each of the training flow cell images comprise flow cell images of the sample at one or more z-levels.
  • the training flow cell images comprise flow cell images of the sample at multiple different field of views of the same sample.
  • the training flow cell images comprise flow cell images of the sample at multiple different field of views of one or more sample(s).
  • the different field of views may be at the same x, y, or z location of the same sample.
  • the different field of views may be different subtitles of the sample at the same z location, but different x,y locations.
  • the multiple different views may be adjacent to each other, with none or at least some spatial overlap with other field of views.
  • each of the training flow cell images comprise flow cell images of different field of views (e.g., adjacent FOVs of the same sample) from a plurality of sequencing cycles stacked along one or two spatial dimensions.
  • the training dataset for the neural network may only include flow cell images of the same color channel thereby the neural network is not trained on variations across different color channels that may be caused by differences in optical elements in response to different colored light signals (e.g., emission filter, illumination, etc.), differences in fluorescent dyes, or other factors of the sequencing system, etc.
  • variations may cause but is not limited to cause different background levels, different signal to noise ratio, different artifacts in the field of view, different full width at half maximum (FWHM) of emission light signals, point spread function (PSF), etc.
  • Training the neural network using flow cell images of the same color channel may advantageously remove fitting to variations across different color channels, and may simplify and speed up training the neural network and avoid possible errors in training.
  • a different neural network is trained with flow cell images of a corresponding color channel.
  • the neural network is trained to be a channel-specific neural network.
  • 3 different neural networks are trained using corresponding flow cell images of the corresponding color channels.
  • Each channel-specific neural network is used for prediction of high resolution flow cell images of the corresponding color channel.
  • a single neural network may be trained using flow cell images of such same colors from the two or more channels. Such neural network may be used to predict or make inferences of high resolution flow cell images from the two or more channels of the same color.
  • a different neural network may be trained using flow cell images of a single channel.
  • Each different neural network is a channel specific neural network that may be used for prediction or inference only of the corresponding channel.
  • the method 500 comprises an operation 530 of (iii) predicting, by the first reconfigurable device or an integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images is with a second resolution and corresponds to a corresponding image of the first plurality of flow cell images, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the operation 530 of (iii) predicting the second plurality of flow cell images using the neural network comprises predicting high resolution postprocessing images corresponding to the first plurality of flow cell images, and wherein the processing comprises various image processing or intensity processing steps.
  • the processing steps may comprise one or more of: noise reduction, background reduction; background removal; artifact removal; artifact suppression; intensity offset correction; intensity normalization; adjusting signal to noise ratio; adjusting contrast to noise ratio; color correction; phasing and/or dephasing; image registration; and deconvolution.
  • predicting high resolution post-processing images corresponding to the first plurality of flow cell images may advantageously allow a higher resolution and higher image quality version of the first plurality of flow cell images to be generated, and the higher resolution, higher quality version may be used for generating more accurate and reliable base calls.
  • the second neural network of method 500 e.g., in operation 520’, may be trained using not reference flow cell images or reference intensities, but reference base calls as ground truths.
  • the reference base calls may be generated using various methods including methods disclosed herein in relation to training neural network for predicting base calls herein.
  • the training may optimize at least some of the parameters of the neural network for producing training base calls that are similar enough to the reference base calls (e.g., determined by the value of the loss function satisfying a predetermined criteria).
  • the trained neural network may be used to predict high resolution post-processing images corresponding to the first plurality of flow cell images, and such high resolution post-processing images may be used to produce accurate and reliable base calls.
  • the embodiments of method 500 with the operation 520’ may improve base calling accuracy, reliability, and reduce computation complexity in the prediction, free up storage space, save power and time when compared with methods that predicts base calling directly.
  • training the second neural network in the operation 520’ for each corresponding color channel in comparison with training of first neural network in the operation 520 with flow cell images from multiple color channels may require less computations, require less power consumption, require less memory or data storage, reduce training time, and avoid possible training failures.
  • FIG. 30A shows an exemplary flow cell image of the first plurality of flow cell images.
  • the exemplary flow cell image is of a 2D sequencing sample, and is acquired from one of the 4 different color channels.
  • the image size is 608 pixels by 608 pixels.
  • FIG. 30B is a high resolution image of the flow cell image in FIG. 30A and it is predicted using method 500 with the second neural network in the operation 520 herein.
  • the neural network is pretrained using a training data set comprising training flow cell images.
  • the neural network is pretrained using the training data set and reference base calls instead of reference intensities.
  • the high resolution image has a size of 1216 by 1216 pixels, which provides 2x resolution of the flow cell image in FIG. 30A in x and y direction.
  • the detectable polony density in high resolution image is increased by at least 2x, 4x, or more than that in the first plurality of flow cell images, e.g., FIG. 30A.
  • the neural network predicts the high resolution image with less background noise, blurriness, bright artifacts, etc.
  • the high resolution image in combination with other high resolution images from the other 3 color channels, can then be used together to determine base calls.
  • the error rate in determining polonies, and the error rate in making base calls can be lower using the high resolution image in FIG. 30B than using the flow cell image in FIG. 30A.
  • the prediction of high resolution images rather than prediction of base calls directly may advantageously reduce computational complexity, computation burden, power consumption, storage usage for performing base calls while maintaining or improving accuracy and reliability.
  • the method 500 include an operation 540 (iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images.
  • determining the polonies comprises determining locations of the polonies, locations of the center of the polonies, size of the polonies, or a combination thereof.
  • the location of the polonies, or the locations of the center of the polonies may be 2D or 3D.
  • the polonies excludes duplicate polonies.
  • the method 500 comprises an operation 550 of (v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
  • the second neural network in the operation 520’ when it is being trained, may include one or more layers, e.g., convolutional layers, for generating base calls from the high resolution post image processing images.
  • the second neural network, after it is trained, in operation 520’ may utilize only a subset of the layers of the second neural network being trained since the neural network only predicts the high resolution post image processing images but not the base calls.
  • the neural network, after it is trained, in operation 520’ may utilize only a subset of the layers of the neural network in operation 520’ since the neural network only predicts the high resolution post image processing images but not the base calls.
  • the pretrained second neural network using reference base calls may have a first number of layers, while a second number of the layers in the pretrained second neural network is used in operation 530 in predicting the high resolution flow cell images.
  • the second number of layers is less than the first number of layers.
  • the pretrained second neural network may have 5 layers with the first 4 layers for predicting high resolution post image processing images and the last layer for predicting base calls. In the operation of 530, only the first 4 layers of the pretrained second neural network is used.
  • the operation 550 may use the last layer of the pretrained second neural network.
  • the second neural network in operation 520’ utilizes the same number of layers as the second neural network being trained.
  • the neural network in operation 520’ may lack any neural network layers that is specific for generating base calls based on the high resolution post image-processing images.
  • the neural network in operation 520’ may rely on other non-neural network based algorithms or software for base calling of the high resolution post imageprocessing images.
  • the pretrained second neural network may have 5 layers with the first 4 layers for predicting high resolution post image processing images and the last layer for predicting base calls. In the operation of 530, only the first 4 layers of the pretrained second neural network is used.
  • the operation 550 may use non-neural network based algorithms or software for base calling.
  • the neural network in operation 520’ has fewer number of convolutional layers than the number of convolutional layers in the neural network in operation 520. In some embodiments, the second neural network in operation 520’ has the same number of layers as the first neural network in operation 520.
  • the neural network herein has less than or equal to 18, 15, 12, 10, 8, 7, 6, 5, 4, 3, or 2 layers. In some embodiments, the neural network herein has 6, 5, 4, 3, or 2 layers. In some embodiments, the neural network has less than 256, 128, 96, 80, 64, or 32 features.
  • cycle N may be one of the reference cycle(s) for generating the polony map.
  • cycle N may be a cycle different from the reference cycle(s).
  • the polony map can be generated in the reference cycle(s) as a subsequent operation after the methods herein have improved the detectable polony density in flow cell images. Polonies from one or more channels within the reference cycle(s) can be included in the polony in a reference coordinate system, while base calling of cycle N is yet to be performed.
  • cycle N is the current cycle.
  • N can be any non-zero integer.
  • N can be any integer from 1 to 150, from 1 to 200, or from 1 to 1000.
  • the polony map disclosed herein can include individual regions within a subtile or tile. Each polony map can include a plurality of polonies therein. In some embodiments, the polony map can be of about the same size of a flow cell image so that all the polonies, from different tiles, and from multiple channels, can be registered to the same polony map. However, such polony map may contain polonies that will not be used in at least some operations described herein to reduce computational burden without sacrificing accuracy. In some embodiments, more than one polony map can be generated, and each corresponds to at least part of a subtile of a flow cell image from a channel. The more than one polony map may be tiled together in order to cover the entire sample region of the flow cell device.
  • the polony map disclosed herein can include polonies that are within individual cells or tissue, or on the membrane thereof. In some embodiments, the polony map disclosed herein can exclude polonies or signal spots that are outside cell boundaries. In some embodiments, the polony map disclosed herein can exclude duplicate polonies, such duplication may occur at different z-locations, with one or more in-focus and/or out-of-focus in the flow cell images. The duplicate polonies may be within the same flow cell image or in different flow cell images. [0281] The polony map herein can be initialized as a virtual image that has a black or dark background with no signals from polonies. For example, the polony map can be initialized to be zero or include otherwise minimal image intensity at all pixels.
  • the intensity of the polony can be added to the polony map at the location determined by the coordinates and with the size and shape determined based on registration.
  • the polony map can be a virtual image that combines image intensity from polonies obtained from 2, 3, 4, or even more channels at the reference cycle.
  • the pixels of the template containing no polonies in them remains to be black or dark so that the polony map can have a cleaner background without noise that appear in actual flow cell images.
  • the polony map includes a list of entries, and each entry corresponding to information for identifying a corresponding polony.
  • each entry can include spatial coordinates of the corresponding polony center in the reference coordinate system, and image intensity of the polony.
  • the entry may also include a unique identification number of the polony.
  • the polonies can be from a subtile of flow cell images within a reference cycle, and more specifically, from one or more selected regions of the subtile.
  • the flow cell images can be from different channels of 1, 2, 3, 4, or more channels of the system 100.
  • a reference cycle can be any cycle of the first 5 or 6 cycles. In some embodiments, the reference cycle can be any cycle that is greater than 0. In some embodiments, the reference cycle is the first cycle.
  • the operation 540 comprises performing image processing step(s) to adjust image intensities of polonies.
  • the image processing steps comprise one or more of the following: background subtraction; image sharpening; intensity offset adjustment; color correction; intensity normalization; phasing and prephasing correction; image registration; quality score estimation, or the like.
  • the image registration is configured to align images from different cycles and/or different channels, for example, with respect to a template image (i.e., a polony map) or a reference coordinate system.
  • the image registration herein is configured to register polonies or clusters from different cycles and different channels, to a template image or a reference coordinate system.
  • the second plurality of flow cell images may be the output of the neural network.
  • the second resolution may be 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be 4 to 32 times greater than the first resolution in 2D or 3D.
  • the operation 540 is based on a polony map that has been generated.
  • the polony map may be 2D or 3D.
  • the polony map has the second resolution.
  • the operation 540 comprises generating a polony map, and determining the polonies based on the generated polony map. The details of generating a 2D or 3D polony map has been disclosed in U.S. Patent Application Nos. 18/078,820 and 18/078,797, and are incorporated herein by reference in their entirety.
  • the base calling can be performed using polony locations in the second plurality of flow cell images from different channels in cycle N, after the second plurality of flow cell images from different channels are registered relative to the polony map disclosed herein.
  • Various existing 2D base calling algorithms can be used.
  • the base calling results can be saved with its 3D coordinates. Such 3D coordinates can be used to register the base calling across different cycles and at different z levels.
  • the method 500 can comprise an operation 550 of (v) performing, by the processor, a corresponding base calling for each of the determined polonies.
  • the operation 550 of performing base calling may be based on the second plurality of images generated in operation 530.
  • the operation 540 may be further based on the determined polony map in operation 540.
  • the base calling can be performed using intensity of the polonies from different channels per cycle per z level.
  • the method 500 may include an operation of saving the base calls obtained in operation 550 in a predetermined format, e.g., in a FastQ file compatible with subsequent operations so that subsequent analysis such as adaptor trimming and secondary analysis can be performed.
  • the neural network is a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network is a U-Net.
  • the neural network comprises a U-Net with a first predetermined repetition of down-sampling and convolution operations and then a second predetermined repetition of up-sampling, concatenation, and convolution operations.
  • the first and second predetermined repetition can have an identical quantity, e.g., 3 or 4.
  • the neural network is a U-Net with a first predetermined number of filters in each repetition of down sampling, and then a second predetermined number of filters in each repetition of up sampling and/or concatenation.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 256, 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 530 may comprise: performing, by the processor, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising: (a) performing, by the processor, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (b) performing, by the processor, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result.
  • the second convolution may comprises a corresponding number of filters, thereby generating a third convolution result after the repetitions.
  • the operation 530 may further comprise: performing, by the processor, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (c) performing, by the processor, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (d) performing, by the processor, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the second convolution may comprise a corresponding number of filters, thereby generating a sixth convolution result after the repetitions.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel may have 4 dimensions.
  • the convolutional kernel is m*m*m for the first three spatial dimensions and the size of its fourth dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be 512x512 flow cell images, and the z-stack can have 12 slices.
  • the first convolution can include 32 filters and each filter has one kernel that is 3x3x3xl.
  • the output from that convolutional block is 512x512x12x32.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x12x32 and the output is 512x512x12x32.
  • Each filter uses a kernel sized 3x3x3x3x32. The number of filters may correspond to features of the input.
  • the second convolution comprises two 3D convolutional layers, e.g., as shown in the pseudo code.
  • the second convolution comprises two repetition or blocks of the first convolution in 3D, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the first convolution comprises a 2D convolution with a convolution kernel.
  • the convolutional kernel may have 3 dimensions.
  • the convolutional kernel is m x m for the first two spatial dimensions and the size of its third dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be flow cell images with a size of 512x512x1.
  • the first convolution can include 64 filters and each filter has one kernel that is 3x3x1.
  • the output from that convolutional block is 512x512x64.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x64 and the output is 512x512x32.
  • Each filter can use a kernel sized 3x3x32.
  • the second convolution comprises two convolutional layers, e.g., as shown in the pseudo codes.
  • the second convolution comprises two repetition or blocks of the first convolution, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the second convolution in operation (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in operation (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n can be an integer in the range from 8 to 256.
  • operation (a) comprises 32, 64, 128, and 256 filters in three repetitions
  • operation (c) comprises 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • operation (a) comprises 32, 64, 128, and 256 filters in four repetitions
  • operation (c) comprises 256, 128, 64, and 32 filters in the corresponding four repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n filters in a last repetition, last minus one, last minus two, repetition, respectively.
  • operation (a) comprises 32, 64, 128 filters in three repetitions and operation (c) comprises 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 530 may further comprise: performing, by the processor, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and predicting, by the processing, the second plurality of flow cell images based on the seventh convolution result.
  • Each of the second plurality of flow cell images may correspond to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -10 10 2 per mm .
  • the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the first resolution is in a range of 0.01 um to 10 um. In some embodiments, the second resolution is in a range of 0.02 um to 2 um. In some embodiments, the second resolution is in a range of 0.001 um to 3 um. In some embodiments, the down-sampling factor is 2, 4, 6, 8, 16, or more. In some embodiments, the up-sampling factor is 2, 4, 6, 8, 16, or more.
  • one or more of operations (ii) to (v) are performed while a sequencing run is being performed. In some embodiments, one or more operations (ii) to (v) are performed in parallel as the corresponding sequencing run to reduce sequencing analysis time.
  • the one or more cycles comprises a current cycle N.
  • N may be in a range from 1 to 150, 1 to 300, 1 to 500, or 1 to 1000.
  • one or more of operations (ii) to (v) are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • the training data set of training flow cell images comprises z-stacks of training flow cell images taken at different z-locations.
  • Each z-stack may represent an individual FOV of cellular sample(s).
  • the z-axis is orthogonal to image planes of the flow cell images.
  • the training data set of training flow cell images comprises flow cell images from multiple sequencing cycles.
  • One or more sequencing cycles may be of unbalanced diversity so that image appear dimmer or the number of polonies are less than images from sequencing cycles of high nucleotide diversity.
  • the number of polonies in the training flow cell images in a particular cycle may vary from 1% to 99% of a total number of polonies within a FOV of that cycle.
  • the number of polonies in the training flow cell image of a particular cycle is from 1% to 5% or 1% to 10% of the total number of polonies within that cycle, it is of low or unbalance diversity.
  • the number of polonies in the training flow cell image of a particular cycle is greater than 10% or 15% of the total number of polonies within that cycle, it is of high or unbalanced diversity.
  • the training data set of training flow cell images comprises flow cell images from multiple samples and multiple sequencing cycles, and the training flow cell images include a subset of flow cell images with unbalanced diversity in multiple sequencing cycles and another subset of flow cell images with balanced diversity in multiple sequencing cycles.
  • the training flow cell images from one or more cycles may be transformed from other training flow cell images from different cycle(s) to simulate the transformation that may occur across cycles within a same color channel.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises performing, by the processor, the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises performing, by the processor, the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing, for one or more times, operations comprising (c), (d), and (e), wherein (e) is after operation (c) and before operation (e), and wherein (e) comprises: concatenating, by the processor, the first up- sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous down-sampling repetition.
  • operation (e) is in each repetition.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing operations comprising (c), (d), and (e) in each repetition of one or more repetitions.
  • the kernel may take any size that is smaller than the size of the flow cell image undergoing the convolution.
  • the kernel can be 2 by 2 by 2, 3 by 3 by 3, 4 by 4 by 4, 5 by 5 by 5, or 6 by 6 by 6 in the first three spatial dimensions.
  • the kernel size can be customized to remove at least some of the noise and unwanted signal that are larger than the kernel size.
  • the kernel can be circular.
  • the kernel can be in various other shapes.
  • when the focus of the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc.
  • Polonies or clusters that are within the range of focus can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image, but at different z levels.
  • Such polonies or clusters are out-of-focus.
  • bigger and blurred signal spots represent out-of-focus polonies or clusters.
  • Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • the method 500 include an operation of registering the second plurality of flow cell images.
  • the images are registered across channels and/or across different cycles.
  • the images are registered before any base calling are performed in operation 550.
  • the images are registered across channels and different cycles before generating or obtaining the polony maps.
  • the images are registered across channels and different cycles before one or more primary analysis steps here.
  • the images can be registered after one or more preprocessing operations disclosed herein are performed.
  • Various image registration techniques can be used to register the images.
  • Various image registration techniques can be used to register the images.
  • the images can be registered using 2D or 3D registration techniques.
  • the operation of registering the flow cell images is with respect to a reference coordinate system. In some embodiments, the operation of registering the flow cell images is with respect to one or more template images.
  • the operation of registering the images can comprise generating the one or more template images in a reference coordinate system. In some embodiments, the operation of registering the images can comprise registering polonies to template polonies in the one or more template images.
  • the operation of registering the images can comprise determining a plurality of transformations based on the one or more template images. Each of the plurality of transformations can corresponds to a corresponding subtile of the flow cell images, the processed images, or the filtered images and configured to register the subtile to the one or more template images. Each transformation can be used to register a corresponding subtile or tile to the one or more template images.
  • the plurality of transformations can comprise one or more affine transformations.
  • the operation of registering the images can comprise performing image registration of the polonies based on fiducial markers.
  • the fiducial markers can be located on the flow cell. Alternatively, the fiducial markers can be external to the flow cell.
  • the image registration as an image processing step herein is configured to align images from different cycles and/or different channels, for example, with respect to a template image or a reference coordinate system.
  • the image registration herein is configured to register polonies or clusters from different cycles and/or different channels, e.g., in the filtered image, to a template image or a reference coordinate system.
  • the base calling can be performed using the filtered images from different channels in cycle N after the filtered images from different channels are registered relative to the corresponding template image disclosed herein.
  • the operation 540 can comprise an operation of extracting polony intensities based on the polony map.
  • the location information of such polony can be obtained from the polony map, e.g., 2D coordinates of the polony and the z level.
  • the corresponding flow cell image and its pixel(s) can be determined. Image intensity of such pixels can be extracted from the corresponding processed image after one or more image processing steps as intensity of such pixel for performing base calling.
  • the operation of registering the flow cell images may be based on background objects in the flow cell images.
  • the background objects can be used to align the flow cell image to the cell images by using one or more transformation(s).
  • the cell staining images herein are staining images of the sample(s) immobilized on the support, with possible transformation (e.g., translation) from the sample(s) in the flow cell images.
  • the transformation may be represented by a single transformation of the whole image or be separated into multiple transformations, each representing a portion of the whole image.
  • the method 500 may include an operation of registering the base calling in 550 to the cell staining images.
  • registration may be based on fiducial markers.
  • fiducial markers can also be included in the cell staining images. Aligning the fiducial markers can generate the transformation(s) between the flow cell images or between flow cell images and cell staining images. The transformation(s) can be used to register or align polonies or clusters between the sequencing images and the cell images.
  • the simulated z-stack is 2048x2048x3, each cell may include 200 to 2000 polonies per cell.
  • the spatial resolution can be about 0.1 um.
  • Prediction is performed independently for each 512x512 region of the simulated z-stack.
  • the predicted high-resolution z-stack is 8192x8192x12.
  • FIGS. 2A-2C show simulated flow cell images, and two different predicted flow cell images with 4x resolution at different z-locations.
  • FIGS. 3 A and 3D show two actual flow cell images at different z-locations in a 512x512x3 z-stack.
  • the predicted high resolution flow cell images (2048x2048) in FIGS. 3B-3C are at two different z-locations corresponding to the low resolution image in FIG. 3A.
  • the neural network may be used to predict polony locations using z-stack(s) of flow cell images comprising flow cell images from multiple z-levels forming 3d volume(s).
  • m is in a range from 2 to 10
  • filters can be in a range from 8 to 1024
  • the fourth dimension of k size can match the number of filters in the corresponding repetition.
  • the input flow cell images can have various sizes in 3D as disclosed herein, e.g., 1024 by 1024 by 4.
  • bi conv block (inputs, filters, k size)
  • n l:m-l
  • u n upsampling3D(bm+ n )
  • cats n concatenate (u n , b m +i-n)
  • bm+n+1 double conv(cats n , filter s* 2 m ⁇ n ⁇ 1 , k size)
  • b2m+i conv block b 2m, filters, k size)
  • the neural network may be used for predicting polony locations based 2D flow cell images at different z-levels.
  • m is in a range from 2 to 10
  • filters can be in a range from 8 to 1024
  • k size can be in 3 dimensions
  • the third dimension of k size can match the number of filters in the corresponding repetition.
  • the input flow cell images can have various sizes in 2D as disclosed herein, e.g., 1024 by 1024, and there can be 3, 4, 5, or other numbers of z-levels.
  • n 2:m-l
  • u n upsampling2D(b m+n- 2 )
  • cats n concatenate (u n +i, bm-n+i)
  • bm+n-l double conv(cats n , filter s* 2 m ⁇ n ⁇ 1 , k size)
  • model tfkeras.Model(inputs, outputs)
  • the methods and systems herein can be used to predict base calls for some or all polonies of the flow cell images.
  • the systems and methods herein advantageously use a neural network that is pretrained for predicting the base calls for polonies of flow cell images.
  • the same neural network may also be advantageously used, without additional training, to generate a polony map or a template image so that the locations of the predicted base calls can be determined.
  • the embodiments herein used convolutional neural network as an example, however, it is understood that various other neural networks or machine learning models may also be used achieve prediction of base calls using the systems and methods herein.
  • the methods for predicting base calls may include one or more operations here. When there are multiple operations involved, such operations may or may not be performed in the order that is described herein.
  • FIG. 28 shows a flow chart of a computer-implemented method 2800 for predicting base calls for flow cell images of biological samples, e.g., cellular samples, thereby enabling efficient and accurate primary analysis.
  • the method 2800 can include some or all of the operations disclosed herein. The operations may be performed in but is not limited to the order that is described herein.
  • the method 2800 can be performed by one or more processors disclosed herein.
  • the processor can include one or more of: a processing unit, e.g., a CPU, a reconfigurable logic device, an integrated circuit that is not reconfigurable, or their combinations.
  • the processing unit can include a central processing unit (CPU).
  • the reconfigurable logic device can include one or more FPGA devices.
  • the integrated circuit can include a chip such as an Al chip or an ASIC chip.
  • the processor can include the computing system 400.
  • some or all operations in method 2800 can be performed by the reconfigurable logic device, e.g., the FPGA(s), and/or the integrated circuit, e.g., the Al chip(s).
  • the data produced by the reconfigurable logic device and/or integrated circuit, e.g., the FPGA(s) after performing one or more operations can be communicated to various hardware elements of the system 100, e.g., CPU(s) or GPU(s), so that subsequent operation(s) in method 500, 600, 700, 2800, and 2900 can be performed by such various hardware using the communicated data.
  • data can also be communicated in the opposite direction from various hardware e.g., CPU(s), to the reconfigurable logic device or the integrated circuit for processing.
  • CPU(s) e.g., a central processing unit
  • all the operations in the methods herein can be performed by CPU(s).
  • the operations performed by CPU(s) can be performed by other processors such as the dedicated processors, or GPU(s).
  • all the operations in the methods herein can be performed by the reconfigurable logic device and/or the integrated circuit, e.g., FPGA(s) and/or the Al chip(s).
  • the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit, e.g., via DMA connections. In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit without being routed first to a CPU, a GPU, or any other processing units before reaching the reconfigurable logic device and/or the integrated circuit.
  • making predictions or inferences using the methods 2800 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips may require at least 2x, 8x, lOx, 15x, 20x, 40x, 50x, or lOOx less power than making prediction(s) or interference(s) with the same neural network(s) with identical training images using other computing hardware including but not limited to CPUs or GPUs.
  • the sequencing system herein further comprises: a power source that is configured to supply identical or different power levels to the reconfigurable logic device and the integrated circuit.
  • a maximum power output of the power source to the sequencing system in performing methods 500, 600, 700, 2800, and/or 2900 is less than 2000 Watts, 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, 300 Watts, 200 Watts, or 100 Watts.
  • the method 2800 can comprise an operation 2810 of (i) generating, by the sequencing system 110, a first plurality of flow cell images of sample(s) immobilized on a support by conducting one or more cycles of sequencing reactions.
  • the sample(s) may be traditional 2D sequencing samples containing biological analytes.
  • the sample(s) may be cellular or tissue samples.
  • the samples may comprise concatemer molecules therewithin.
  • the sample(s) may include concatemer molecules from one or more different sample sources.
  • the sample(s) may include a thickness along the z-axis so that the first plurality of flow cell images may be acquired at a z-stack of different z-locations with a first resolution to cover the cellular sample in 3D.
  • the sample can be in situ.
  • the sample can be a 3D sample.
  • the sample can be a volumetric sample that may contain different biological information at the same x-y location but different z level.
  • the sample can include multiple cells, tissue, or their combinations.
  • the 3D sample can be any biological sample that has a thickness that is greater than a predetermined threshold along the z axis. For example, the thickness can be greater than 1 um, 2 um, 3 um, 4 um, 5 um, 10 um, 20 um, or more.
  • the z axis (e.g., z axis) is orthogonal to the image plane defined by x and y axes.
  • the sample can be traditional 2D sequencing samples.
  • the flow cell images can be acquired using the optical system of the imager 116 disclosed herein, from the 1, 2, 3, 4, or more channels.
  • Each flow cell image can include at least a portion of one or more tiles (e.g., imaging areas), and each tile can be divided into multiple subtiles.
  • Each tile or subtile can include a plurality of polonies or clusters.
  • Each subtile can include multiple regions with each region including a number of polonies.
  • the flow cell image as disclosed herein can be an image that is acquired from a flow cell 112 as shown in FIG. 1 or 2712 as shown in FIG. 27.
  • the flow cell images are acquired from a single color channel, and subsequent prediction is by using a pretrained neural network corresponding to that single channel.
  • the flow cell images are acquired from 2, 3, 4, or more color channels, and subsequent prediction is by using a pretrained neural network corresponding to the multiple color channels.
  • a flow cell image herein can be an image of one or more tiles, one or more subtiles, one or more segmented regions within tile(s) or subtile(s), or their combinations.
  • Each flow cell image can comprise a field of view (FOV).
  • the FOV can be orthogonal to the z axis.
  • the FOV can be within the x-y plane.
  • the FOV of different flow cell images at different z levels can be identical within the x-y plane.
  • the FOV of different flow cell images at different z levels can have at least an overlapping portion within the x-y plane.
  • the image resolution of different flow cell images at different z levels can be about identical or exactly identical.
  • FIGS. 3A and 3D show two exemplary flow cell images acquired at two different z levels along the z axis of a same 3D sample within a same sequencing cycle.
  • the FOV can be in 3D and be of various sizes to cover the volumetric sample to be imaged.
  • the FOV along x, y, and/or z direction can be in a range from 10 um to 5 mm.
  • the FOV along x, y, and/or z direction can be in a range from about 0.1 um to about 2 mm.
  • the FOV along x, y, and/or z direction can be in a range from 0.5 um to 1 mm.
  • the FOV can be about 0.5 mm by 0.5 mm by 20 um for certain cellular samples along the x, y, and z direction, respectively.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be any integer greater than 64 or 128.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be in a range from 2 to 65536.
  • a single flow cell image can be separated into different number of regions, for example, 4, 8, 16, or even more regions, and each region may include a size of 256 by 256 by 1, 512 by 512 by 3, or other sizes.
  • the number of pixels along x, y, and/or z direction may be adjusted to maintain a particular spatial resolution in a given FOV. For example, with a spatial resolution of 0.2 um, to cover a FOV of 0.8 mm, the number of pixels may be 4000.
  • Each flow cell image at a specific z level may include intensities generated by polonies or clusters at the corresponding z level.
  • signals from polonies or clusters are small bright spots within the images.
  • Each bright spot can be of various sizes that is less than a couple of pixels, e.g., less than a pixel, about a pixel, about 2 pixels, 3 pixels, 4, pixels, 5 pixels, or more.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.01 pixel to about 100 pixels.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.1 pixel to about 16 pixels.
  • Each flow cell image can also include intensities generated by the cell and its structural elements. Such structural elements can be background objects or components, e.g., in FIG. 3 A. Each flow cell images can also include noise and/or artifacts that are not from the polonies or cellular structures.
  • the optical system when the depth of field the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis.
  • Polonies or clusters that are within the range of depth of field can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image. Such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurry signal spots represent out-of-focus polonies or clusters. Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • base calls from the polonies include 4 different bases, and percentage of polonies for each of the 4 different bases can be greater than about 10% so that the data are relatively diverse.
  • bases called from the plurality of polonies includes 4 or less different bases, and percentage of polonies for one or more bases can be less than about 10%, and such data can be considered as data of unbalanced diversity.
  • bases called from the plurality of polonies include 4 or less different bases, and percentage of polonies for some of the bases can be less than about 5%, about 2%, or even about 1%, and such data can be considered as data of unbalanced diversity.
  • the base called for bases A, T/U, C, G in the plurality of polonies can be about 1%, about 2%, about 1%, and about 95%.
  • the base called for bases A, T/U, C, G in the plurality of polonies can be about 10%, about 10%, about 10%, and about 70%, respectively.
  • plexity can also be a factor that when plexity is lower than a number, e.g., 8 or 16, the signal could be of unbalanced diversity .
  • the method 2800 is configured to predict base calls of flow cell images, e.g., of a first resolution, even if the polonies in the flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles, and the base calls may be spatially aligned to the polonies of the flow cell images, of a second resolution.
  • the second resolution may be higher than the first resolution.
  • the method 2800 comprises an operation 2802 of (ia) generating, by a processor or a first reconfigurable logic device, a second plurality of flow cell images comprising a second resolution.
  • each of the second plurality of flow cell images corresponds to a corresponding flow cell image of the first plurality of flow cell images.
  • the second plurality of flow cell images may be generated using various up-sampling algorithms including but not limited to interpolation.
  • the second resolution may be greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be at least 2 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be 4 to 64 times greater than the first resolution in one or more spatial dimensions, e.g., along x, y, and/or z direction.
  • the second resolution may be at least 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be at least 4 to 64 times greater than the first resolution in one or more spatial dimensions.
  • the method 2800 comprises an operation 2804 of (ii) providing, by a processor, the second plurality of flow cell images as an input to a neural network, e.g., a convolutional neural network (CNN), wherein the neural network is pretrained using a training data set of training flow cell images using a training method disclosed herein, e.g., 600, 700, 2900 herein.
  • a neural network e.g., a convolutional neural network (CNN)
  • the neural network is pretrained using a training data set of training flow cell images using a training method disclosed herein, e.g., 600, 700, 2900 herein.
  • the neural network is pre-trained so that the values of parameters (e.g., weights) of the neural network has been optimized based on the training.
  • the neural network may be retrained when needed, for example, for predicting flow cell images from different cellular samples.
  • the method 2800 may include image processing step(s) that can be performed on the first or second plurality of flow cell images, optionally prior to providing any input to the neural network.
  • the processing step(s) may include: intensity normalization, background subtraction, background removal, artifact reduction, artifact removal, adjustment of signal to noise ratio, adjustment of contrast to noise ratio, color correction, adjusting intensity offset, image registration, phasing and prephasing, filtering, segmentation, noise reduction, deconvolution (e.g., to differentiate neighboring or at least partly overlapping signal spots), or a combination thereof.
  • the method 2800 comprises an operation 2804’ of providing, by the processor, the first reconfigurable logical device, or the integrated circuit, the first or the second plurality of flow cell images to a polony map generation algorithm or a base calling algorithm.
  • the polony map generation algorithm and the base calling algorithm does not include a trained neural network or an artificial intelligence-based algorithm.
  • the polony map generation algorithm and base calling algorithm does not include a trained neural network or an artificial intelligence-based algorithm.
  • Exemplary polony map generation algorithms for generating 2D or 3D polony maps and base calling algorithms for generating base calls have been disclosed in U.S. Application No. 18/078,797 and 18/078,820, and U.S. Patent No. 10,266,888, and are incorporated herein by reference in their entireties.
  • the method 2800 comprises an operation 2806 of (iia) of determining, by the first reconfigurable device or the integrated circuit, the polony map based on the second plurality of flow cell images.
  • the operation 2806 can be based on the operation of 2804 in some embodiments, and based on the operation of 2804’ in some other embodiments.
  • the polony map is 3D. In some embodiment, the 3D polony map includes multiple 2D polony maps at different z levels. In some embodiments, the 3D polony map has the second resolution. In some embodiments, generating a polony map using a polony map generation algorithm. In some embodiments, the polony map generation algorithm lacks any neural network or artificial intelligence based algorithms. In some embodiments, the polony map generation algorithm lacks any neural network that has been pretrained and can predict base calls in operation 2812 without additional training for predicting the polony map. In some embodiments, the polony map generation algorithm utilize traditional algorithms that lacks artificial intelligence.
  • the neural network in operation 2804-2806 is the same pretrained neural network used in operation 2812.
  • the same pretrained neural networks may include identical parameters, layers, and neural network structures therewithin.
  • the same pretrained neural networks may include an identical number of parameters, an identical number of layers, and neural network structures therewithin.
  • the method 2800 may further comprise an operation to train the neural network before operations 2804 and 2806.
  • the neural network is trained before operation 2804 and 2806, e.g., using method 700 or 2900 disclosed herein.
  • the pretrained neural network may be used to predict polony locations, polony shape and/or size, polony center locations, or equivalently the polony map.
  • the operation 2800 may further include one or more operations in method 500, e.g., operation 530 and 540, and/or 550 for predicting locations of the polonies, thus predicting the polony map.
  • the same neural network used in operations 2804, 2806, and 2812 may be trained using identical training data including identical flow cell images of samples.
  • the identical training data may also include identical “ground truths” or references in training.
  • the same neural networks may comprise identical values for parameters, identical number of layers, and identical neural network structures.
  • the same neural network used in operations 2804, 2806, and 2812 may be trained using at least a different portion of the identical training data.
  • the same neural networks may comprise identical parameters with identical or different values for such parameters, identical layers, and identical neural network structures. Training of the same neural network may be performed before operation 2804 and does not require retraining the neural network after operation 2806 and before operation 2812. The pretrained neural network may then be used in operations 2804-2806 and operation 2812 without retraining to allow fast and efficient prediction of the base calls using methods 2800.
  • the pretrained neural network may be used in operations 2804-2806 to update an existing polony map.
  • the existing polony map may be generated in an earlier cycle of the sequencing run.
  • the predicted polony map using the pretrained neural network may be used to update the existing polony map in a later cycle of the sequencing run.
  • an initial polony map may be generated by a non-neural network algorithm in the first cycle or first several cycles, e.g., cycles 1-4, of the sequencing run.
  • the neural network may be trained using data of the first cycles or a number of cycles, e.g., cycles 1-4 or cycles 1-5.
  • the pretrained neural network then can be used to predict a second polony map that can be used to update the initial polony map.
  • the second polony map may advantageously reselect more accurate and reliable locations of the polonies for making predictions of base calls, intensities, or classifications, e.g., in operation 2812.
  • Such prediction may be repeated by training the neural network with different cycles that has been completed in the sequencing run to improve reselection of polony locations.
  • the trained neural network may be retrained using data of cycles 1-6 or 1-7 following the training using data from cycle 1-4, and make another prediction of the polony map after the training.
  • the same neural network used in operation 2806 and 2816 may be trained using different reference information as the “ground truth” in training.
  • the training of the neural network for predicting polony locations may use reference intensities as the “ground truth,” while the training of the neural network for predicting base call may use reference base calls as the “ground truth.”
  • the same neural network used in operations 2804-2806 and 2816 may be trained using identical reference information as the “ground truth” in training.
  • the training of the neural network for predicting polony locations may use reference intensities as the “ground truth,” and the training of the neural network for predicting base call may use reference base calls that can be determined based on such reference intensities.
  • the same neural network may be trained to predict base calls using various training methods, e.g., method 2900 disclosed herein.
  • Reference base calls may be used for the training of the neural network.
  • the reference base calls used in training, e.g., using method 2900 may include spatial information thereof.
  • the reference base calls used in training, e.g., using method 2900 may be of a first resolution, a second resolution, or a third resolution. In some embodiments, the third resolution can be higher than the first and second resolution.
  • the same neural network may be trained to predict base calls, e.g., using method 2900. After being trained, such neural network may be used to predict base calls of the second plurality of flow cell images. The prediction of base calls can then be processed for determining locations of the polonies, thereby generating the polony map.
  • the polony map may be determined as the locations at which the base calls are predicted with a probability satisfying a predetermined threshold.
  • the polony locations, thus the polony map may be determined as the locations in which one or more quality metrics satisfy a predetermined threshold.
  • quality metrics can include but is not limited to maximum, medium, or average intensity of the polony among different color channels, a Q score of the base call, a clarity of the base call, and a purity of the base call.
  • the second plurality of flow cell images may be used for generating the polony map at the second resolution using operations 2804- 2806 or operations 2804’-2806.
  • the method may include an operation of generating the polony map based on the first plurality of flow cell images at the first resolution, and an operation of up-sampling to generate the polony map at the second resolution after operation 2810.
  • the first plurality of flow cell images may be provided instead of the second plurality of flow cell images in operation 2804 or operation 2804’ and then the operation 2806 may be replaced by an operation of determining the polony map based on the first plurality of flow cell images.
  • cycle N may be one of the reference cycle(s) for generating the polony map.
  • cycle N may be a cycle different from the reference cycle(s).
  • the polony map can be generated in the reference cycle(s) as a subsequent operation after the methods herein have improved the detectable polony density in flow cell images. Polonies from one or more channels within the reference cycle(s) can be included in the polony in a reference coordinate system, while base calling of cycle N is yet to be performed.
  • cycle N is the current cycle.
  • N can be any non-zero integer.
  • N can be any integer from 1 to 150.
  • N can be any integer from 1 to 20, 1 to 200, 1 to 300, 1 to 500, or 1 to 1000.
  • the polony map disclosed herein can include individual regions within a subtile or subtile. Each polony map can include a plurality of polonies therein. In some embodiments, the polony map can be of about the same size of a flow cell image so that all the polonies, from different tiles, and from multiple channels, can be registered to the same polony map. However, such polony map may contain polonies that will not be used in at least some operations described herein to reduce computational burden without sacrificing accuracy. In some embodiments, more than one polony map can be generated, and each corresponds to at least part of a subtile of a flow cell image from a channel. The more than one polony map may be tiled together in order to cover the entire sample region of the flow cell device.
  • the polony map disclosed herein can include polonies that are within individual cells or tissue, or on the membrane thereof. In some embodiments, the polony map disclosed herein can exclude polonies or signal spots that are outside cell boundaries. In some embodiments, the polony map disclosed herein can exclude duplicate polonies, such duplication may occur at different z-locations, with one or more in-focus and/or out-of-focus in the flow cell images. The duplicate polonies may be within the same flow cell image or in different flow cell images.
  • the polony map herein can be initialized as a virtual image that has a black or dark background with no signals from polonies.
  • the polony map can be initialized to be zero or include otherwise minimal image intensity at all pixels.
  • the intensity of the polony can be added to the polony map at the location determined by the coordinates and with the size and shape determined based on registration.
  • the polony map can be a virtual image that combines image intensity from polonies obtained from 2, 3, 4, or even more channels at the reference cycle.
  • the pixels of the template containing no polonies in them remains to be black or dark so that the polony map can have a cleaner background without noise that appear in actual flow cell images.
  • the polony map includes a list of entries, and each entry corresponding to information for identifying a corresponding polony.
  • each entry can include spatial coordinates of the corresponding polony center in the reference coordinate system, and image intensity of the polony.
  • the entry may also include a unique identification number of the polony.
  • the polonies can be from a subtile of flow cell images within a reference cycle, and more specifically, from one or more selected regions of the subtile.
  • the flow cell images can be from different channels of 1, 2, 3, 4, or more channels of the system 100.
  • a reference cycle can be any cycle of the first 5 or 6 cycles.
  • the reference cycle can be any cycle that is greater than 0.
  • the reference cycle is the first cycle.
  • the processing steps herein comprises performing image processing step(s) herein to adjust image intensities of polonies.
  • the image processing steps comprise one or more of the following: background subtraction; image sharpening; intensity offset adjustment; color correction; intensity normalization; phasing and prephasing correction; image registration; quality score estimation, or the like .
  • the image registration is configured to align images from different cycles and/or different channels, for example, with respect to a template image (i.e., a polony map) or a reference coordinate system.
  • the image registration herein is configured to register polonies or clusters from different cycles and different channels, to a template image or a reference coordinate system.
  • the method 2800 can comprise an operation 2812 of: (iii) predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network; or predicting, by the first reconfigurable device or the integrated circuit, one or more classifications corresponding to one or more pixels of the second plurality of flow cell images using the neural network.
  • the operation 2812 of performing base calling may be based on the second plurality of flow cell images.
  • the operation 2812 may be further based on the determined polony map in operation 2804 or 2804’.
  • the second plurality of flow cell images may be from one or more color channels, one or more z levels, and/or one or more cycles.
  • the prediction of base calls in operation 2812 can be performed using intensity of the polonies.
  • the second plurality of flow cell images may be from a single color channel, a single z level, and/or a single cycle.
  • the prediction of base calling can be performed using intensity of the polonies from a single color channel and one or more cycles. For example, flow cell images acquired from each color channel of the multiple color channels in multiple cycles may use a different pre-trained neural network for predicting the polony intensity of the corresponding channel.
  • the prediction in operation 2812 of base calling can then be performed using intensities of the polony from different color channels.
  • flow cell images from a single z level may require a different pre-trained neural network for predicting the base calls from a different z level using operation 2812.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from different color channels, multiple z levels, and multiple cycles.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from different color channels, a single z level, and multiple cycles.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from a single color channel, one or more z levels, and one or more cycles.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from one or more color channels, one or more z levels, and one or more cycles.
  • the operation 2812 (iii) may include generating outputs that includes base calls, e.g., A, T, C, G, and/or U for one or more pixels of the second plurality of flow cell images.
  • the one or more pixels may be determined using a polony map or a location list of polonies disclosed herein so that each pixel of the one or more e pixels is comprised in at least one polony in the polony map.
  • the operation 2812 of (iii) may comprise generating outputs that includes base calls, e.g., A, T, C, G, and/or U for one or more pixels of the second plurality of flow cell images.
  • the operation 2812 of (iii) may comprise generating outputs that includes classifications, e.g., A, T, C, G, U, and/or background for one or more pixels of the second plurality of flow cell images.
  • the one or more pixels may include pixels that are not included in the polony map or the location list disclosed herein. For example, the one or more pixels may include all pixels within the FOV of the second plurality of flow cell images.
  • the one or more pixels include at least one pixel that is not comprised in any polony of the polony map. In some embodiments, the one or more pixels include at least one pixel that is comprised in the background of the polonies comprise noise signal(s). In some embodiments, the one or more pixels include at least one pixel that is not comprised in any polony in the polony map and at least one pixel that is comprised in at least one polony in the polony map. In some embodiments, the one or more pixels include at least one pixel that is not within a cell membrane or on the cell membrane.
  • FIGS. 3E -3F show comparison of accuracy of identifying transcripts (corresponding to polonies) using the neural network methods herein (“new algorithm”), e.g., 2800, and a traditional non-neural network based algorithm (“POR-YOLO”).
  • new algorithm e.g., 2800
  • POR-YOLO traditional non-neural network based algorithm
  • simulated flow cell images of in situ sample with multiple cells are used. Each area may include a number of targets ranging from 0 to 4000. Such targets can be transcripts.
  • the neural network herein, e.g., in method 2800, and a classic non-neural network based algorithm are used to predict/detect transcripts in such cells. And the prediction/determination is then compared with ground truths (or equivalently, the reference polony map) for accuracy.
  • the correct number of targets per area is higher using the neural network and method disclosed herein than using the non-neural network based algorithm.
  • the detected targets per area using the neural network and methods herein are much higher (2x or 3x higher) than that detected by the non-neural network based algorithm when the target density per area is greater than 2000 per area.
  • FIG. 3F shows the false negative per cell for both the neural network (“new algorithm”) and non- neural network based algorithm (“POR-YOLO”).
  • the false negative per area using the neural network and methods herein are much lower (lOx or more) than that detected by the non-neural network based algorithm when the target density per area is greater than 1000 per area.
  • FIG. 3G shows comparison of accuracy of identifying transcripts using the methods, e.g., 2800, and a traditional non-neural network based algorithm.
  • simulated flow cell images of in situ sample with multiple cells are used. Each cell may include a number of transcripts ranging from 0 to 6000.
  • the neural network herein, e.g., in method 2800, and a classic non-neural network based algorithm are used to predict/detect transcripts in such cells. And the prediction/determination is then compared with ground truths (or equivalently, the reference polony map) for accuracy.
  • the R 2 values show correlations of the prediction/determination with the references.
  • the neural network and method herein e.g., method 2800, showed consistently higher correlation with all the different numbers of transcripts per cell than the correlation using classic non- neural network based algorithm, thereby indicating higher accuracy in identifying polonies or clusters in flow cell images (e.g., transcripts) of in situ samples.
  • the method 500, 2800 may include an operation of determining a biological analyte including but not limited to a morphological feature, a transcript, a RNA, a mRNA, a protein, or their combinations based on the base calling or classification of the polony in one or more sequencing cycles.
  • base calling or classification sequence of a polony in 6 consecutive sequencing cycles of ATTCGA may indicate a cellular protein that may be labeled by the unique barcode of “ATTCGA.”
  • the method 2800 further include an operation (iv) of: in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background (e.g., the classifications may include A, T, C, G, U, or background), determining a first morphological feature, a first RNA or mRNA, or a first protein based on the one or more predicted classifications.
  • the method 2800 further include an operation (v) of in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification (e.g., the classifications may include A, T, C, G, U, or background), determining a second morphological feature, a second RNA or mRNA, or a second protein based on the one or more predicted classifications.
  • the classifications may include A, T, C, G, U, or background
  • the method 2800 further include an operation (iv) of determining a first morphological feature, a first RNA or mRNA, or a first protein based on predicted base calls of a first pixel in one or more cycles. In some embodiments, the method 2800 further include an operation (v) of determining a second morphological feature, a second RNA or mRNA, or a second protein based on predicted base calls of a second pixel in one or more cycles.
  • the method 2800 further include an operation of determining a spatial relationship of the first pixel and the second pixel which may include one or more of visualizing the first and second pixels within a common coordinate system, calculating a spatial distance in 2D or 3D between the first and second pixels; and determining whether the first and second pixels are within a same polony or not.
  • the method 2800 further comprises: (iv) in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background classification, determining at least a first target of a first morphological feature; a first RNA or mRNA; and a first protein based on the one or more predicted classifications; and (v) in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification, determining at least a second target different from the first target from: the first morphological feature; the first RNA or mRNA; and the first protein based on the one or more predicted classifications.
  • the second target is of a different type of target from the first target (e.g., a protein vs. a morphological feature) thereby advantageously enable multi-omics analysis and research of the biological analyte(s) of interest using the methods herein.
  • the first target and the second target correspond to the biological analyte(s) of the sample.
  • the method 2800 further comprises: spatially aligning the location of the first and the second targets based on the one or more predicted classifications; and determining a biological analyte of the sample immobilized on the support based on the spatial alignment.
  • the methods 2800 herein advantageously allow spatial alignment or in other words, co-localization of two or more different biological analytes using the neural network disclosed herein.
  • Such different biological analyte may be of a different type.
  • a first biological analyte may be a morphological feature
  • a second biological analyte may be a protein or mRNA.
  • Such different biological analytes may be sequenced within a same sequencing run in same or different sequencing cycles.
  • Exemplary embodiment of staining and sequencing different target analytes within cells or tissue are disclosed in PCT application No. PCT/US2025/10310, filed January 3, 2025, the contents of which are incorporated by reference in their entireties.
  • the number of different biological analytes may be limited by the availability of unique barcodes that may be used to differentiate the biological analyte from others.
  • the number of different biological analytes can be in a range from 2 to 100, 4 to 350, 10 to 500, 50 to 1000, or more.
  • protein A may be localized to be within the nucleus of a specific cell type, while protein B may be localized to be adjacent to a certain transcript within the mitochondria but not within the cytosol based on the prediction of intensities, base calling, and/or classification in one or more cycles using methods 500 or 2800.
  • Identification of such different biological analytes may advantageously provide more information, e.g., spatial relationships, which may facilitate biological, physiological, or pathological analysis of the sample(s) being sequenced.
  • the biological analytes herein may be any physical features of the sample(s) or source of sample(s).
  • the detection, localization, and spatial alignment of the biological analytes may correspond to various physiological, biological, pathological characteristics of cells or tissue which may advantageously provide information that may advance understanding of cellular function, regulation, and interactions which in turn may advance existing biomedical research, including but not limited to, more effective disease modeling and drug discovery efforts.
  • the method 2800 further comprises an operation of (iv) determining a location of one or more of a first morphological feature, a first RNA or mRNA, a first transcript, and a first protein based on the corresponding location of the one or more predicted base calls or predicted classifications. In some embodiments, the method 2800 further comprises an operation of (v) determining a location of one or more of: a second morphological feature, a second RNA or mRNA, a second transcript, and a second protein based on the corresponding location of one or more second predicted base calls or predicted classifications.
  • the method 2800 further comprises an operation of (vi) spatially aligning the location of one or more of: a second morphological feature, a second RNA or mRNA, and second protein with the location of one or more of: the first morphological feature, the first RNA or mRNA, and the first protein; and an operation of (vii) determining a biological character of the sample immobilized on the support based on the spatial alignment.
  • the method 2800 may include an operation of saving the base calls obtained in operation 2812 in a predetermined format, e.g., in a FastQ file compatible with subsequent operations so that subsequent analysis such as adaptor trimming and secondary analysis can be performed.
  • the method 2800 may include an operation 2812 of (iii) performing, by the processor, a corresponding base calling for each of the determined polonies.
  • the operation 2812 comprises extracting a plurality of patches from the second plurality of flow cell images based on the polony map.
  • the polony map may be generated using various algorithms, for example, from operation 2804 or 2804’.
  • the operation 2812 further comprises providing input to the neural network, the input comprising the plurality of patches, wherein each patch comprises one or more patch images from the multiple color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images; and predicting a plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch.
  • each corresponding patch comprises a polony located at or in close vicinity to a center of the corresponding patch.
  • the polony may be no more than 1 to 10 pixels away from the center of the corresponding patch.
  • each patch comprises 3 to 128 pixels along a spatial dimension, e.g., along x or y direction.
  • the size of the patches are maintained to be relatively small comparing to the size of the flow cell images, e.g., lOx, 20x, 50x, lOOx, 500x, lOOOx or less than the size of the flow cell image.
  • the plurality of patches comprises 100 to 10 8 patches.
  • each patch may contain more than one, two, three, five, or ten polonies therewithin, but only the pixel(s)of the single polony at its center is used for generating base call(s) corresponding to the patch.
  • a first patch may include pixels 1- 32 in both x and y directions to cover a polony centered at pixels (16, 16) of the flow cell images
  • a second patch may include pixels 2-33 in both x and y directions to cover a second polony centered at pixels (17, 17.5)
  • a third patch may include pixels 5-36 in both x and y directions to cover a third polony centered at pixels (19, 19) of the flow cell images.
  • a very limited number of polonies in each patch may be used instead of using only the single polony for generating reference base calls.
  • the very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100.
  • the very limited number of polonies can be lOOx, lOOOx, 10 4 x, 10 5 x, 10 6 x, 10 7 x, or 10 8 x less than a total number of polonies in a corresponding flow cell image.
  • the number of pixels within each patch can be optimized to balance the computational complexity and spatial context information to be included for training the neural network(s).
  • the number of patch images within each patch can be optimized to balance the computational complexity and the spatial context information within each patch for accurate and reliable prediction using the neural network.
  • the number of pixels within each patch can be at least partly based on polony density of the sample being imaged.
  • each patch may include multiple pixels, but prediction may only be performed for a single polony at or near the center of the patch. In training the neural network, e.g., using methods 2900, for predicting the base call, similarly reference base calls are only for a single polony at or near the center of the patch.
  • a very limited number of polonies in each patch may be used for training the neural network(s) or making predictions.
  • the very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100.
  • the very limited number of polonies can be lOOx, lOOOx, 10 4 x, 10 5 x, 10 6 x, 10 7 x, or 10 8 x less than a total number of polonies in a corresponding flow cell image.
  • each patch may comprise multiple patch images corresponding to different color channels.
  • each patch may comprise a patch image covering same pixels within the x-y plane in three different color channels. The same pixels may be pixels determined after registration to correct for the spatial offset across different color channels.
  • each patch may comprise multiple patch images corresponding to different cycles, e.g., continuous cycles n-1, n, n+1, within a sequencing run.
  • each patch may comprise 3 images, each from a different color channel in 4 adjacent cycles, so that each patch may comprise 12 patch images in total.
  • each patch may include 5 different z levels to make the total number of patch images of 60.
  • At least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise some identical pixels.
  • each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
  • the first plurality of flow cell images are acquired only from a single color channel so that flow cell images acquired from different color channels may require different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
  • the first plurality of flow cell images are acquired only from a single z level, so that flow cell images acquired at different z levels of 3D sample(s), e.g., in situ cells, may require different neural network for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
  • the first plurality of flow cell images are acquired from the one or more cycles.
  • the one or more cycles comprises a plurality of cycles in a sequencing run.
  • the one or more cycles comprises a current cycle N, and the first plurality of flow cell images are acquired from at least one cycle prior to the current cycle N.
  • the current cycle N is a cycle in which sequencing is currently being performed in of a sequencing cycle.
  • the flow cell images may have been acquired in the current cycle N, but no flow cell images have been acquired in the next cycle N+1.
  • the operation 2802 (ii) of providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network comprises: (ii) providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network without providing a polony map or locations of polonies in the second plurality of flow cell images as the input to the neural network.
  • the operation (ii) of method 2800 does not require the input of a polony map, a location list of polonies, or the like to be provided as input to the neural network in order to predict the base calls.
  • the spatial location of the polonies within the flow cell images are not used in predicting the base calling using the neural network.
  • each patch may contain relative spatial information of the polony with respect to the rest of the pixels in the same patch(es) that may be used for predicting the base calling using the neural network.
  • the method 2800 may predict base calling, e.g., in operation 2812, without using the input of a polony map, a location list of polonies, or the like. Instead, the polony map, the location list of polonies, or the like may be used to extract the plurality of patches from the second plurality of flow cell images.
  • the operation of predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: predicting a probability map for each channel of the multiple color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the probability maps. For example, for flow cell images from 4 different color channels, 4 different probability maps may be generated. Each probability map may have the same size and dimension as the flow cell images or covering at least a portion of the flow cell images. Each pixel in the probability map may a probability value corresponding to the channel.
  • pixel (12,12) may have a probability value of 0.2, 0.01, 0.2, and 0.59 in 4 different channels representing nucleotides A, T, C, and G, and the base call of pixel (12, 12) may be determined as the largest probability among probabilities of different color channels, which is 0.59 and correspond to nucleotide G for its base calling.
  • the neural network may be trained to predict probability maps.
  • training of the neural network to predict probability maps can be based on reference polony maps or any equivalent information indicative of polony locations, e.g., a location list of polonies.
  • the neural network to predict probability maps can be trained by comparing each probability map to a corresponding reference polony map.
  • the neural network may be trained to minimize a loss function based on the comparison of the probability map and the corresponding reference polony map.
  • a probability map may be initialized to have random values in each pixel, and the neural network may be trained to produce higher value for pixel(s) corresponding to polonies than pixels corresponding to non-polony structure(s) in the probability map.
  • the sum of values for each pixel in all probability maps of different color channels may add up to a fixed number, e.g., 1, 10, 100, etc.
  • pixel (24, 25) in 3 probability maps corresponding to 3 different color channels may be 0.24, 0.51, and 0.25, which adds up to 1.
  • each base call corresponds to a corresponding patch which includes one or more patch images.
  • the operation of predicting the plurality of base calls using the neural network and based on the input comprises: generating a first single intensity for a first channel of the multiple color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the single intensity.
  • a first single intensity of a first color channel may be determined using prediction by the neural network disclosed herein.
  • the first single intensity may or may not be normalized.
  • the first single intensity may correspond to the single polony of the corresponding patch containing one or multiple patch images of the same polony at adjacent cycles of a sequencing run.
  • the first single intensity may correspond to one of the adjacent cycles, e.g., a current cycle.
  • a base call may be determined based on the first single intensity of the current cycle, e.g., by comparing the first single intensity with other intensities of the same polony from other color channels.
  • the other intensities may be predicted similarly using the same or different neural networks.
  • the method further comprises an operation of predicting a second single intensity for a second channel of the multiple color channels corresponding to the corresponding patch using a second neural network; and determining the base call of the corresponding patch based on at least the first single intensity and the second single intensity.
  • the method further comprises an operation of predicting a second single intensity for a second channel of the multiple color channels corresponding to the corresponding patch using a second neural network or the same first neural network; and an operation of predicting a third single intensity for a third channel of the multiple color channels corresponding to the corresponding patch using a third neural network or the same first neural network; and determining the base call of the corresponding patch based on at least the first, second, and third single intensities.
  • the first, second, and third intensities may be predicted using different neural networks (e.g., each of the neural networks may be trained using different training data but with identical neural network layers and numbers of parameters) to be 50, 690, 80 for the same polony.
  • the base call of the polony may correspond to the nucleotide that lights up in the second color channel with an intensity of 690 but not the first or third color channel.
  • the operation (iii) of predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network comprises: determining two or more pixels of the second plurality of flow cell images as duplications of a single polony; and selecting one pixel of the two or more pixels as a center of the single polony.
  • the two or more pixels may be at a same z level. In some embodiments, the two or more pixels may be at different z levels.
  • Exemplary embodiments of the operation of determining two or more pixels of the second plurality of flow cell images as duplications of a single polony and selecting one pixel of the two or more pixels as a center of the single polony are disclosed in PCT Application No. PCT/US23/76125, and is incorporated herein by reference in its entirety.
  • the methods 500 and 2800 herein may be performed using artificial intelligence-based models other than neural networks.
  • the methods 600, 700 and 2900 may be used to train artificial intelligencebased models other than neural networks for making predictions or inferences using methods 500 or 2800.
  • Some non-limiting examples of the artificial intelligence-based models include: random forest, decision tree, k-mean clustering, and gradient boosted tree.
  • the artificial intelligence-based models may be used to predict intensities, classifications, or base calls by working on intensities from flow cell images and/or the high resolution flow cell images.
  • the artificial intelligence-based models other than neural networks may predict intensities, classifications, or base calls using information only including intensities, and such information may lack spatial context of the intensities, shapes of the polonies, background noise, signal from other cellular structures, etc.
  • the neural networks herein predict intensities, classifications, or base calls by advantageously using the flow cell images or high resolution flow cell images which not only include the intensities but also other information including but not limited to background noise, polony sizes and shapes, spatial relationship among polonies, etc. for more accurate predictions or inferences.
  • the neural network herein is a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network is a 3D CNN.
  • the neural network is a 2D CNN.
  • the neural network comprises one or more convolutional layers.
  • the neural network is a recurrent neural network (RNN).
  • RNN recurrent neural network
  • the neural network is a 3D RNN.
  • the neural network is a 2D RNN.
  • the neural network comprises one or more long short-term memory (LSTM) layers.
  • LSTM long short-term memory
  • the neural network is a U-Net.
  • the neural network includes a residual network (ResNet).
  • the neural network can include a transformer based model like a vision transformer (ViT).
  • the neural network comprises a U-Net with a first predetermined repetition of down-sampling and convolution operations and then a second predetermined repetition of up-sampling, concatenation, and convolution operations.
  • the first and second predetermined repetition can have an identical quantity, e.g., 3 or 4.
  • the neural network is a U-Net with a first predetermined number of filters in each repetition of down sampling, and then a second predetermined number of filters in each repetition of up sampling and/or concatenation.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 256, 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 2812 may comprise: performing, by the processor, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising: (a) performing, by the processor, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (b) performing, by the processor, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result.
  • the second convolution may comprises a corresponding number of filters, thereby generating a third convolution result after the repetitions.
  • the operation 2812 may further comprise: performing, by the processor, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (c) performing, by the processor, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (d) performing, by the processor, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the second convolution may comprise a corresponding number of filters, thereby generating a sixth convolution result after the repetitions.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel may have 4 dimensions.
  • the convolutional kernel is m*m*m for the first three spatial dimensions and the size of its fourth dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be 512x512 flow cell images, and the z-stack can have 12 slices.
  • the first convolution can include 32 filters and each filter has one kernel that is 3x3x3xl.
  • the output from that convolutional block is 512x512x12x32.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x12x32 and the output is 512x512x12x32.
  • Each filter uses a kernel sized 3x3x3x3x32. The number of filters may correspond to features of the input.
  • the second convolution comprises two 3D convolutional layers, e.g., as shown in the pseudo code.
  • the second convolution comprises two repetition or blocks of the first convolution in 3D, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the first convolution comprises a 2D convolution with a convolution kernel.
  • the convolutional kernel may have 3 dimensions.
  • the convolutional kernel is m x m for the first two spatial dimensions and the size of its third dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be flow cell images with a size of 512x512x1.
  • the first convolution can include 64 filters and each filter has one kernel that is 3x3x1.
  • the output from that convolutional block is 512x512x64.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x64 and the output is 512x512x32.
  • Each filter can use a kernel sized 3x3x32.
  • the second convolution comprises at least two convolutional layers or exactly two convolutional layers, e.g., as shown in the pseudo codes.
  • the second convolution comprises two repetition or blocks of the first convolution, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the second convolution in operation (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in operation (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n can be an integer in the range from 8 to 256.
  • operation (a) comprises 32, 64, 128, and 256 filters in three repetitions
  • operation (c) comprises 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • operation (a) comprises 32, 64, 128, and 256 filters in four repetitions
  • operation (c) comprises 256, 128, 64, and 32 filters in the corresponding four repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n filters in a last repetition, last minus one, last minus two, repetition, respectively.
  • operation (a) comprises 32, 64, 128 filters in three repetitions and operation (c) comprises 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 2800 may further comprise: performing, by the processor, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and predicting, by the processing, the second plurality of flow cell images based on the seventh convolution result.
  • Each of the second plurality of flow cell images may correspond to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -10 10 2 per mm .
  • the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the first resolution is in a range of 0.01 um to 10 um. In some embodiments, the second resolution is in a range of 0.02 um to 2 um. In some embodiments, the second resolution is in a range of 0.001 um to 3 um. In some embodiments, the down-sampling factor is 2, 4, 6, 8, 16, or more. In some embodiments, the up-sampling factor is 2, 4, 6, 8, 16, or more.
  • one or more of operations are performed while a sequencing run is being performed. In some embodiments, one or more operations are performed in parallel as the corresponding sequencing run to reduce sequencing analysis time.
  • the sequencing analysis time includes a total time required from when the raw flow cell images are acquired in each cycle of a sequencing run to when the base calls for each cycle of the sequencing run are generated. [0410] In some embodiments, the sequencing analysis time includes a total time required from when a sequencing run starts to when the base calls for each cycle of the sequencing run are generated.
  • the sequencing analysis time includes a first time duration to complete a sequencing run and a second time duration to generate base calls for the sequencing run.
  • the first and second time durations may overlap at least partly with each other (e.g., performing base calling while the sequencing run is still in progress) to reduce the sequencing analysis time.
  • the one or more cycles comprises a current cycle N.
  • N may be in a range from 1 to 1000.
  • one or more of operations are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • the training data set of training flow cell images comprises z-stacks of training flow cell images taken at different z-locations.
  • Each z-stack may represent an individual FOV of a 3D sample(s), e.g., an in situ cellular sample.
  • the z-axis is orthogonal to image planes of the flow cell images.
  • the training data set of training flow cell images comprises flow cell images from multiple sequencing cycles.
  • One or more sequencing cycles may be of unbalanced nucleotide diversity so that image appear dimmer or the number of polonies are less than images from sequencing cycles of high nucleotide diversity.
  • the number of polonies in the training flow cell images in a particular cycle may vary from 1% to 99% of a total number of polonies within a FOV of that cycle.
  • the number of polonies in the training flow cell image of a particular cycle is from 1% to 5% or 1% to 10% of the total number of polonies within that cycle, it is of low or unbalance diversity.
  • the number of polonies in the training flow cell image of a particular cycle is greater than 10% or 15% of the total number of polonies within that cycle, it is of high or unbalanced diversity.
  • the training data set of training flow cell images comprises flow cell images from multiple samples and multiple sequencing cycles, and the training flow cell images include a subset of flow cell images with unbalanced diversity in multiple sequencing cycles and another subset of flow cell images with balanced diversity in multiple sequencing cycles.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises: performing, by the processor, the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises: performing, by the processor, the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing, for one or more times, operations comprising (c), (d), and (e), wherein (e) is after operation (c) and before operation (e), and wherein (e) comprises: concatenating, by the processor, the first up- sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous down-sampling repetition.
  • operation (e) is in each repetition.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing operations comprising (c), (d), and (e) in each repetition of one or more repetitions.
  • the kernel may take any size that is smaller than the size of the flow cell image undergoing the convolution.
  • the kernel can be 2 by 2 by 2, 3 by 3 by 3, 4 by 4 by 4, 5 by 5 by 5, 6 by 6 by 6, 10 by 10 by 10 in the first three spatial dimensions.
  • the kernel size can be customized to remove at least some of the noise and unwanted signal that are larger than the kernel size.
  • the kernel can be circular.
  • the kernel can be in various other shapes.
  • the focus of the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis.
  • Polonies or clusters that are within the range of focus can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image, but at different z levels. So, such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurred signal spots represent out-of-focus polonies or clusters. Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • the method 2800 include an operation of registering the second plurality of flow cell images.
  • the images are registered across channels and/or across different cycles.
  • the flow cell images are registered before any base calling are performed in operation 2812 or 2804, 2804’.
  • the images are registered across channels and different cycles before generating or obtaining the 3D polony maps.
  • the flow cell images are registered across channels and different cycles before one or more primary analysis steps here.
  • the flow cell images can be registered after one or more preprocessing operations disclosed herein are performed.
  • Various image registration techniques can be used to register the flow cell images.
  • Various image registration techniques can be used to register the images.
  • the flow cell images can be registered using 2D or 3D registration techniques.
  • the operation of registering the flow cell images is with respect to a reference coordinate system. In some embodiments, the operation of registering the flow cell images is with respect to one or more template images.
  • the operation of registering the images can comprise generating the one or more template images in a reference coordinate system. In some embodiments, the operation of registering the images can comprise registering polonies to template polonies in the one or more template images.
  • the operation of registering the images can comprise determining a plurality of transformations based on the one or more template images. Each of the plurality of transformations can corresponds to a corresponding subtile of the flow cell images, the processed images, or the filtered images and configured to register the subtile to the one or more template images. Each transformation can be used to register a corresponding subtile or tile to the one or more template images.
  • the plurality of transformations can comprise one or more affine transformations.
  • the operation of registering the images can comprise performing image registration of the polonies based on fiducial markers.
  • the fiducial markers can be located on the flow cell. Alternatively, the fiducial markers can be external to the flow cell.
  • the image registration herein is configured to align images from different cycles and/or different channels, for example, with respect to a template image or a reference coordinate system. In some embodiments, the image registration herein is configured to register polonies or clusters from different cycles and/or different channels, e.g., in the filtered image, to a template image or a reference coordinate system.
  • the base calling can be performed using the filtered images from different channels in cycle N after the filtered images from different channels are registered relative to the corresponding template image disclosed herein.
  • the location information of such polony can be obtained from the polony map, e.g., 2D coordinates of the polony and the z level.
  • the corresponding flow cell image and its pixel(s) can be determined. Image intensity of such pixels can be extracted from the corresponding processed image after one or more primary analysis steps as intensity of such pixel for performing base calling.
  • the operation of registering the flow cell images may be based on background objects in the flow cell images.
  • the background objects can be used to align the flow cell image to the cell images by using one or more transformation(s).
  • the cell staining images herein are staining images of the sample(s) immobilized on the support, with possible transformation (e.g., translation) from the sample(s) in the flow cell images.
  • the transformation may be represented by a single transformation of the whole image or be separated into multiple transformations, each representing a portion of the whole image.
  • the polonies or clusters can be registered to the cell staining images.
  • the method 2800 may further include an operation of registering the base callings, e.g., of a 3D sample, to the cell staining images containing morphological information of the sample.
  • such registration may be based on fiducial markers.
  • fiducial markers can also be included in the cell staining images. Aligning the fiducial markers can generate the transformation(s) between the flow cell images or between flow cell images and cell staining images. The transformation(s) can be used to register or align polonies or clusters between the sequencing images and the cell images.
  • the fiducial markers can be within the sample or external to the sample.
  • the fiducial markers can be biological features inherent to the sample(s).
  • the fiducial markers may be immobilized on the flow cell but external to the sample.
  • the method 2800 further comprises an operation of determining a location of one or more of: a morphological feature, a RNA or mRNA , and a protein based on the corresponding location of each predicted base call.
  • the samples may be labeled so that the base calls may uniquely identify a morphological feature, a RNA or mRNA, or a protein of the sample in 3D. Such information can be used to advantageously provide nucleotide sequencing in spatial context of the sample.
  • the same pretrained neural network (e.g., with same parameters and neural network structure) can be advantageously used for predicting the polony map and for predicting the base calls.
  • the same neural network herein can be trained before operation 2806 and 2812, and requires no additional training in between the operations of 2806 and 2812.
  • the operation 2806 further comprises predicting, by the first reconfigurable device or the integrated circuit, a base call corresponding to each polony of the second plurality of flow cell images using the neural network at a third resolution; and determining the polony map based on the predicted base calls and a corresponding quality index of each predicted base call at the third resolution.
  • the third resolution is at least 2 to 32 times greater than the first or second resolution in one or more spatial dimensions. In some embodiments the third resolution is greater than the first and second resolution in one or more spatial dimensions. In some embodiments, the third resolution is identical to the first or second resolution in one or more spatial dimensions.
  • the different patches may include some overlapped pixels.
  • the different patches does not include any overlapped pixel.
  • patch 1 may include 12 different patch images, each from one of the 4 different color channels and one of the three consecutive cycles in a sequence run.
  • Patch 2 may also include 12 different patches cropped from non-overlapped pixels of the same flow cell images.
  • Patch 3 may include 12 different patched images, each patch image with more than half of the pixels being identical to the patch images of patch 1.
  • a neural network e.g., CNN
  • the sequencing system herein comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 600, 700, 2900) to train the neural network.
  • a first reconfigurable logic device e.g., a FPGA unit
  • first reconfigurable routing channels each connecting at least some of the first plurality of data processing engines
  • a neural network deployed at least partly on the first reconfigurable logic device
  • a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g
  • the sequencing system herein comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit in data communication with the first reconfigurable logic device; a neural network deployed at least partly on the integrated circuit and/or the first reconfigurable logic device; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations in methods herein (e.g., methods 600, 700, 2900) to train the neural network.
  • methods herein e.g., methods 600, 700, 2900
  • the first reconfigurable logic device and the integrated circuit is within the same physical housing as the other elements of the sequencing system as show in FIG 1. In some embodiments, the first reconfigurable logic device and the integrated circuit is not physically external to the sequencing system 110 as show in FIG 1, e.g., not in the cloud 130.
  • FIG. 5B shows an exemplary method 600 for training the neural network, e.g., CNN, which can be used to predict high resolution flow cell images with improved detectable polony density.
  • CNN neural network
  • training can be done onboard using the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system. In such cases, training may be done using hardware elements within the physical housing of the sequencing system 110 shown in FIG. 1. In some embodiments, training can be performed external to the sequencing system 110. For example, training may be performed using hardware elements over the cloud 130. In some embodiments, training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • the neural network is trained with the same type of flow cell images as which the neural network may make predictions on after being trained.
  • the neural network is trained with 2D flow cell images at multiple z levels and then may be used to predict base calls for 2D flow cell images at multiple z levels to cover a 3D in situ sample.
  • the neural network is trained with 2D flow cell images from a single organ origin and then may be used to predict base calls for 2D flow cell images of samples extracted from the same organ, e.g., liver.
  • the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with z-stacks of flow cell images, training the neural networks with 2D flow cell images reduces the amount of computational effort, and reduces training time and cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
  • the sequencing system comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform operations to train the neural network comprising: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution; (c) generating a training output by inputting the training set to the neural network; (d) repeatedly performing, until the output error
  • the sequencing system comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising: processing sensor data to generate the first plurality of flow cell images, wherein the integrated circuit is configured to perform operations including: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the
  • the system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution; (c) generating a training output by inputting the training set to the neural network; (d) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error; and (e) generating a trained neural network with adjusted parameters.
  • the method 600 for training the neural network comprises an operation 610 of generating a corresponding plurality of training flow cell images for one or more sample(s) with a first resolution.
  • the operation 610 may be performed by simulation, thus the corresponding plurality of training flow cell images may be simulated images of 2D or 3D samples.
  • the simulation can be based on characteristics of actual flow cell images of sample(s). Such characteristics may include but is not limited to: image resolution, FOV, pixel size, and/or characteristics of the optical system, field of depth, point spread function, etc.
  • the operation 610 may be performed using the imager 116 of the sequencing system.
  • the corresponding plurality of training flow cell images may be real images of 2D or 3D samples with a first resolution. It is worth noting that the training flow cell images may be generated based on the characteristics of the sample(s) that predictions are going to be made. For example, for predicting polony locations in 3D samples, the training flow cell image may only include images (simulated or real images) of 3D samples of similar characteristics, e.g., liver samples, kidney samples, etc. As another example, for prediction polony locations in traditional 2D samples, the training flow cell images may only include 2D flow cell images with similar plexity and/or polony density. In some embodiments, the training flow cell images may include a combination of flow cell images, either of 2D or 3D samples, and with or without similar characteristics.
  • the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels.
  • the corresponding plurality of training flow cell images may include z-stacks of flow cell images, each z-stack may include a 3D volume made up from multiple z-levels of flow cell images comprised in the z-stack.
  • the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels (2D images) but not a z-stack of flow cell images.
  • the training data set of flow cell images comprises simulated flow cell images of in situ samples at different z-locations.
  • the training data set of flow cell images comprises actual flow cell images acquired from in situ samples at different z-locations.
  • polony locations are identified in such actual flow cell images at a sub-pixel resolution to provide the high resolution “truth maps” in the training data set. Identification of polony or cluster locations at a sub-pixel resolution, e.g., at 0.02 pixel, 0.05 pixel, 0.1 pixel, 0.25 pixel, etc., may be performed using various image processing methods. For example, embodiments of identification of polony or cluster locations at a sub-pixel resolution has been disclosed in U.S. Patent No. 11,200,446, and is incorporated herein by reference in its entirety.
  • the method 600 comprises an operation 620 of (1) up- sampling, by the processor, the corresponding plurality of training flow cell images for each cellular sample to a second resolution to generate a reference set comprising high resolution training flow cell images or (2) generating, by the processor, a reference set of reference flow cell images at a second resolution higher than the first resolution, each reference flow cell image in the reference set corresponding to an individual image of the corresponding pluralities of training flow cell images.
  • the operation of up-sampling in (1) can be based on the imaging process. For example, the point spread function can be virtually improved by 4x if the up-sampling is to achieve 4x spatial resolution. In some embodiments, the operation of up-sampling is in 2D.
  • each corresponding plurality of training flow cell images may include a z-stack with more than one z levels to cover a 3D volumetric sample.
  • the resolution in x and y may be different from the resolution in z direction.
  • FIGS. 2A and 2D-2E show exemplary flow cell images that are generated for training the neural network, e.g., CNN.
  • the simulated flow cell images with higher resolution e.g., FIG. 2E
  • Such images are used as “ground truth.”
  • such images have no signal originating from pixels other than the polonies.
  • such images have no signal originating from cellular background in the sample(s).
  • such images may include features that are specific to polonies in flow cell images during sequencing runs, such as polony intensity, polony shape, pattern of distribution (e.g., within regions determined by the cell boundaries).
  • the method includes generating simulated flow cell images with low resolution, e.g., FIG. 2D, which mimic the real flow cell images that a user would acquire during sequencing of cells and are included in a training set.
  • Such simulated flow cell images may have polony features, cell features, background, noise, etc.
  • the low resolution simulated flow cell images may then be up-sampled to be at the high resolution.
  • the simulated images may include a z-stack of flow cell images taken at different z-locations to simulate flow cell images of a volumetric sample.
  • generating simulated images may add additional computational load to the training process, and may require specific criteria in order to mimic polony features and other information may be contained within the real flow cell images during sequencing.
  • simulated images may remove possible imaging artifacts, e.g., caused by vibration, over-heating, bubbles, etc., and avoid training on such distracting features that are not part of the polonies in the sample and may reduce accuracy and reliability of training the neural network.
  • the training set may include flow cell images from different cell geometries, different in situ samples, different image intensities, different polony densities, different nucleotide diversities, etc.
  • the method 600 comprise an operation 630 of providing, by the processor, the training set as inputs to the neural network to generate corresponding training outputs.
  • Each corresponding training output may include output flow cell images, e.g., a z-stack of output images.
  • the method 600 comprises the operation 640 of repeatedly training the neural network, e.g., CNN, by performing one or more operations until the output error satisfies a stopping criterion.
  • the training operation 640 comprises one or more operations including: the operation 655 of determining an output error by comparing the training output and the reference set; and the operation 660 of adjusting current values of parameters of the convolutional neural network based on the output error. Determining the output error can be based on various metrics.
  • the metrics can include minimum mean square error of images intensities from some or all of the pixels of the training output to the corresponding z-stack in the reference set.
  • Values of the parameters of the neural network can be adjusted based on the output error or one or more previous output errors.
  • the stopping criterion can be customized based on but not limited to training time, computational complexity, required accuracy, power consumption, and/or convergence rate.
  • the stopping criterion can be (1) stop after 10 epochs to reduce training time.
  • the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0.
  • z-stacks of training flow cell images from a same color channel can be used to train the neural network, e.g., CNN, for that particular channel.
  • a certain percentage, e.g., 80%, of the training set may be used for training, and the rest of the training set, e.g., 20%, may be used for validation.
  • Batch size can be one
  • Epochs can be about 10, 12, 15, 20, or more.
  • Various optimizers can be used.
  • the convolutional neural network comprises one or more U-Net units.
  • comparing the training output to the reference set comprises: calculating mean square error in image intensity of one or more pixels in each pair of an image from the reference set and a corresponding image from the training output. In some embodiments, comparing the training output to the reference set comprises: determining one or more values of a loss function. In some embodiments, each pair of the image from the reference set and the corresponding image from the training output comprises a same image size, a same field of view, a same resolution, or a combination thereof. In some embodiments, the one or more pixels excludes pixels that are outside of cell boundaries. In some embodiments, the cell boundaries are determined based on image segmentation of cell boundaries of the high resolution flow cell images in the reference set.
  • the method 600 includes an operation 670 of generating a trained neural network with the adjusted values in parameters obtained in operation 660.
  • the trained neural network may be used to predict high resolution intensities that can be used to determine high resolution base calls of flow cell images, e.g., using method 500.
  • FIG. 5E shows an exemplary method 700 for training the neural network, e.g., CNN, which can be used to predict high resolution flow cell images with improved detectable polony density.
  • CNN neural network
  • training of the neural networks using the methods 600, 700 may utilize training images that are real flow cell images of samples, simulated flow cell images, or a combination thereof.
  • Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue.
  • Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity.
  • the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc.
  • the values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s). For example, the error rate of base calling using a first neural network trained on simulated flow cell images can be determined in comparison with base calling using an existing primary analysis method without neural networks. The error rate in base calling using a second neural network trained using real images of in situ sample can also be obtained in comparison with base calling using the existing primary analysis method without neural network.
  • the error rate in base calling using the first neural network can be higher than the error rate in base calling using the second neural network.
  • the error rate in base calling using the first neural network can be 2x, 3x, 4x, 5x, 6x, lOx, or higher than the error rate in base calling using the second neural network.
  • training of the neural networks herein can be completed using only the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system 100.
  • training can be performed at least partly external to the sequencing system. For example, at least part of the training may be performed using hardware over the cloud.
  • the sequencing system 110 comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network (e.g., trained neural network) deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations of the sequencing method 600, 700; a second processor or the first processor to control the integrated circuit to perform one or more operations of the sequencing methods 600, 700 to facilitate generating the sequencing analysis result(s).
  • a first reconfigurable logic device e.g., a FPGA unit
  • an integrated circuit e.g., a NPU chip or Al
  • the operations performed by the first reconfigurable logic device may comprise processing or receiving sensor data to generate the first plurality of flow cell images after operation 705.
  • the first reconfigurable logic device or the integrated circuit is configured to perform operations including operation 715 of up-sampling the corresponding plurality of training flow cell images to generate high resolution training flow cell images having a second resolution.
  • the sequencing system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform one or more operations of the methods 600, 700, 2800, and/or 2900.
  • training the neural network using the methods 600, 700, and/or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 20x, 40x, 60, 80x, lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods 600, 700, and/or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips may require at least 2x, 8x, 10, 15x, 20x, 40x, 50x, or lOOx less power than training the same neural network(s) with identical training images using CPUs or GPUs.
  • the sequencing system further comprises: a power source that is configured to supply identical or different power levels to the first reconfigurable logic device and the integrated circuit.
  • a maximum power output of the power source to the sequencing system in training the neural network using methods 600, 700, and/or 2900 is less than 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, or 300 Watts.
  • the neural network is trained with traditional 2D flow cell images at a single z-level.
  • each neural network is trained with 2D flow cell images at a single z-level, and multiple neural networks may be trained to cover a 3D volumetric sample, e.g., in situ sample.
  • the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with 3D flow cell images (3D volumetric image), training the neural networks with 2D flow cell images reduces the amount of computation, training time and training cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
  • the sequencing method 700 comprises an operation 705 of acquiring, by the imager 116 of the sequencing system 110, a training set comprising corresponding a plurality of training flow cell images with a first resolution.
  • the first resolution can be a standard resolution that can be achieved using the imager disclosed herein.
  • the first resolution can be within the range from 0.01 um to 15 um.
  • the first resolution can be within the range from 0.1 um to 5 um.
  • the plurality of training flow cell images in the training set can be from one or more color channels.
  • the plurality of training flow cell images in the training set can be from 2, 3, 4, or more color channels.
  • the plurality of training flow cell images in the training set can be from one or more cycles.
  • the one or more cycles can be any number ranging from 1 to 10, 1 to 20, Ito 30, 1 to 50, 1 to 100, 1 to 200, or 1 to 500.
  • the plurality of flow cell images can be at a single z level or multiple z levels.
  • the sequencing method 700 comprises an operation 715 of up-sampling, by the sequencing system, the corresponding plurality of training flow cell images to generate high-resolution training flow cell images having a second resolution.
  • the second resolution can be 2x, 4x, 8x, 16x, or higher than the first resolution.
  • the first resolution is in the range from 0.01 um to 5 um
  • the corresponding second resolution that is 4x higher than the first resolution can be in the range from 0.0025 um to 1.25 um.
  • Various up-sampling methods can be used for generating the high- resolution training flow cell images.
  • Each high-resolution training flow cell image corresponds to a training flow cell image at the first resolution.
  • the operation 715 is optional.
  • the high resolution images may be directly generated via computer simulation or acquisition using the sequencing system disclosed herein.
  • the sequencing method 700 comprises determining, by the sequencing system, a location list of polonies in the plurality of flow cell images; and extracting, by the sequencing system, intensities in the plurality of flow cell images based on the location list. [0475] In some embodiments, the sequencing method 700 comprises determining, by the sequencing system, a location list of polonies in the high resolution training flow cell images; and extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
  • the sequencing method 700 comprises an operation of processing the high resolution training flow cell images to determine a location list of the polonies (e.g., bright spots in the image) and their processed intensities. Their processed intensities may have been processed using standard image processing such as background noise reduction, filtering, and intensity normalization.
  • the operation of processing the training flow cell images or the high resolution training flow cell images can include polony map generation using the methods disclosed in details in U.S. Patent No.
  • the method 700 comprises an operation 725 of generating, by the sequencing system, reference intensities corresponding to the intensities (e.g., processed intensities) in the high resolution training flow cell images based on base calls of the high resolution training flow cell images.
  • the operation 725 may be based on the location list so that only signals from polonies identified are used for generating the reference intensities, other signals, including background noise, possible artifacts from cellular structures in the images can be excluded.
  • the operation 725 may be based on one or more image processing steps of the training flow cell images (e.g., cell segmentation, cell contouring, noise removal) so that only signals from polonies that are within an area of interest (e.g., within cells) are used for generating the reference intensities.
  • image processing steps of the training flow cell images e.g., cell segmentation, cell contouring, noise removal
  • At least part of the one or more samples comprises predetermined bases in the one or more cycles.
  • the base calls for at least some of the polonies in the flow cell images in cycle(s) are predetermined.
  • the base calls can be predetermined by sequencing known barcode sequences in the one or more cycles.
  • the operation of generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity.
  • the intensities may undergo color correction, phasing/dephasing, normalization, and/or other corrections to reach the reference intensities.
  • the intensities may undergo de-noising to generate the reference intensities. As a nonlimiting example, as shown in FIG.
  • the intensities of the high resolution training flow cell images from two different channels are plotted. Each plot is plotted as a dot with its corresponding intensity in channels 1, 2, 3, and 4. Based on the predetermined base call, the polonies within area 790 would have a base call of A, thus, corresponding reference intensity of each polony having a base call of A can be obtained by projecting the dots to the fitted line in the region 790, e.g., projection with the shortest distance. Then vertical axis of the projected intensity on the line may be the reference intensity of the polonies in channel 2, and the horizontal axis of the projected intensity would be the reference intensities in channel 1.
  • corresponding reference intensity of each polony in area 791 can be obtained by projecting the dots to the fitted line in the region 791, e.g., projection with a shortest distance. Then horizontal axis of the projected intensity on the line may be the reference intensities of the corresponding polonies in area 790 in channel 1, and the vertical axis of the projected intensity on the fitted line in area 791 may be the reference intensity for the corresponding polonies within area 791 in channel 2. Similar projection may be performed for polonies plotted in the right panel for channels 3 and 4.
  • noises and artifacts such as noise correlated with different channels, e.g., channel optics, illumination, etc.
  • reference intensity determination can be based on various methods for noise reduction and is not limited to the shortest distance projection in FIG. 5F.
  • the algorithm for determining the reference intensity may be iterative such that the reference intensities obtained in earlier iteration(s) can be improved based on customized quality criteria in later iterations.
  • the number of repetitions can be various numbers in a range from 1 to 10, 1 to 100, or more.
  • later iterations can use a different projection method that generates a smaller total distance to the fitted line as shown in FIG. 5F than the projection method that was used in earlier iteration(s).
  • the sequencing methods 700 may include an operation 730 of providing the reference intensities for comparison to training output(s) of the neural network.
  • the reference intensities may be provided as flow cell image(s).
  • the reference intensities may be provided as a list of intensities corresponding to their locations in the flow cell images, e.g., as a array with a first column of reference intensity values and a second column with corresponding spatial coordinates of the reference intensity value. It is advantageous to use the list of intensities to save storage space, reduce data size, and allow efficient data communication.
  • the input to the neural network may also include the location list.
  • the operation 730 comprises an operation of providing the reference intensities in a plurality of patches for comparison to training output(s) of the neural network, wherein each patch comprise one or more patch images from one or more color channels, one or more cycles, one or more z-levels, or a combination thereof .
  • the patches of the flow cell images may be used for training.
  • Each patch may comprise one or more patch images cropped from the flow cell images (e.g., the second plurality of flow cell images).
  • the training method 700 is configured to train the neural network for predicting one or more base calls within each individual patch, e.g., a single base call at or close to the center of the patch.
  • the one or more base calls may be much less than the total number of base calls in the flow cell images.
  • the one or more base calls may be lOx, lOOx, 500x, lOOOx, 5000x, 10 4 x, 10 5 x, 10 6 x, or more times less than the total number of base calls in the corresponding flow cell images.
  • the method of training using patches of flow cell images does not require training of a large number of polonies (e.g., 1000 polonies) within a patch, thus may advantageously reduce computational complexity and increase training efficiency and accuracy.
  • the sequencing method 700 herein include an operation 740 of repeatedly performing, until the output error satisfies a stopping criterion, one or more training operations comprising: an operation 755 of determining an output error by comparing the training output to the reference intensities; and an operation 760 of adjusting current values of parameters of the neural network, e.g., CNN, based on the output error.
  • the operation 740 repeats itself using its output (e.g., adjusted parameters of the neural network) from the previous iteration as input to the current iteration.
  • the output error may be based on a comparison between the reference intensities and the predicted intensities during an iteration of training.
  • the comparison may be limited to those intensities and locations included in the location list. In some embodiments, the comparison may be limited to only a subset of intensities and corresponding locations in the location list.
  • the operation of 740 may stop when a stop criterion is met.
  • the stop criterion can be customized.
  • the stopping criterion can be customized based on training time, computational complexity, convergence rate, and/or various other metrics.
  • Exemplary stopping criterion may include a fixed number of iterations, a fixed duration of training time, or a loss function belong a threshold.
  • the stopping criterion can be (1) stop after 10 epochs to reduce training time.
  • the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0. Determining the output error can be based on various metrics, e.g., a loss function.
  • Nonlimiting examples of the loss function can include: the sum of root mean square of the difference between the predicted intensities and the corresponding reference intensities based on the location list, or the sum of mean square errors.
  • the method 700 may further comprises an operation 770 of generating the trained neural network with the adjusted parameters obtained in operation 760, e.g., in the last iteration or any other iterations during the repetition of operation 740.
  • the trained neural network may then be used to predict high resolution intensities that can be used to determine high resolution base calls of flow cell images, e.g., using methods 500.
  • FIG. 29 shows an exemplary method 2900 for training the neural network, e.g., CNN, which can be used to predict polony locations (e.g., in operation 2804-2806), intensities of polonies, base calls, and/or classifications of one or more pixels.
  • the prediction using the neural network trained by method 2900 may advantageously allow improved detectable polony density in the sample(s).
  • predicting base calls using method 2800 (with operation 2804’, without predicting the polony map using the neural network in operation 2804, and with predicting the base calls using the neural network in operation 2812) at a polony density of 300,000/mm 2 or greater, e.g., 750,000/mm 2 , produces an error rate in base calling that is lower than the error rate of base calling using the classic non-neural network based algorithm.
  • predicting base calls using method 2800 (with operation 2804’, without predicting the polony map using the neural network in operation 2804, and with predicting the base calls using the neural network in operation 2812) at a polony density of 750,000/mm 2 , produces an error rate in base calling that is 40%, 50%, 60%, 70% or less of the error rate of base calling using the classic non-neural network based algorithm.
  • predicting base calls using method 2800 (with predicting the polony map using the neural network in operation 2804 and predicting the base calls using the neural network in operation 2812) at a polony density of 300,000/mm 2 or greater, e.g., 750,000/mm 2 produces an error rate in base calling that is lower than the error rate of base calling using the classic non-neural network based algorithm.
  • predicting base calls using method 2800 (with predicting the polony map using the neural network in operation 2804 and predicting the base calls using the neural network in operation 2812) at a polony density of 750,000/mm 2 produces an error rate in base calling that is 50%, 40%, 30%, 20%, 10%, 5% or less of the error rate of base calling using the classic non-neural network based algorithm.
  • training of the neural networks using the methods 600, 700, or 2900 may use training images that are real flow cell images of samples, simulated flow cell images with distribution of signal spots and noise level that is similar to real flow cell images, or a combination thereof.
  • Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue.
  • Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity.
  • the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc.
  • the values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s). For example, the error rate of base calling using a first neural network trained on simulated flow cell images can be determined in comparison with base calling using an existing primary analysis method without using any neural networks.
  • the error rate in base calling using a second neural network trained using real flow cell images of in situ sample can also be obtained in comparison with base calling using the same existing primary analysis method without using any neural networks.
  • the error rate in base calling using the first neural network can be higher than the error rate in base calling using the second neural network.
  • the error rate in base calling using the first neural network can be 2x, 3x, 4x, 5x, 6x, lOx, or higher than the error rate in base calling using the second neural network.
  • training of the neural networks herein can be done using only the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system 110. In such cases, training may be done using hardware elements within the physical housing of the sequencing system 110 shown in FIG. 1. In some embodiments, training can be performed at least partly external to the sequencing system. For example, at least part of the training may be performed using hardware over the cloud 130.
  • the sequencing system 110 comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network (e.g., trained neural network) deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations of the sequencing method 600, 700, or 2900; a second processor or the first processor to control the integrated circuit to perform one or more operations of the sequencing methods 600, 700, or 2900 to facilitate generating the sequencing analysis result(s).
  • a first reconfigurable logic device e.g., a FPGA unit
  • an integrated circuit e.g.
  • the sequencing system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform one or more operations of the sequencing method 600, 700, or 2900.
  • training the neural network using the methods 600, 700, or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 20x, 40x, 60x, 80x, lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • the neural network is trained with the same type of flow cell images as which the neural network may make predictions on after being trained.
  • the neural network is trained with 2D flow cell images at multiple z levels and then may be used to predict base calls for 2D flow cell images at multiple z levels to cover a 3D in situ sample.
  • the neural network is trained with 2D flow cell images from a single organ origin and then may be used to predict base calls for 2D flow cell images of samples extracted from the same organ, e.g., liver.
  • the neural network is trained with traditional 2D flow cell images at a single z-level.
  • each neural network is trained with 2D flow cell images at a single z-level, and multiple neural networks may be trained to cover a 3D volumetric sample, e.g., in situ sample.
  • the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with 3D flow cell images (3D volumetric image), training the neural networks with 2D flow cell images reduces the amount of computation, training time and training cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
  • the sequencing method 2900 comprises an operation 705 of acquiring, by the imager 116 of the sequencing system 110, a training set comprising a plurality of training flow cell images with a first resolution.
  • the plurality of training flow cell images may be real images that are acquired using a sequencing system disclosed herein.
  • the plurality of training flow cell images may be real images of one or more samples immobilized on a support, e.g., a flow cell device.
  • the training flow cell images may be of 2D or 3D samples as disclosed herein in operation 705 relative to methods 700.
  • the plurality of training flow cell images may include simulated flow cell images disclosed herein.
  • the training flow cell images may be generated based on the characteristics of the sample(s) that predictions are going to be made. For example, for predicting polony locations in cellular samples, the training flow cell image may only include images (simulated or real images) of 3D samples of similar characteristics, e.g., liver samples, kidney samples, etc. As another example, for prediction polony locations in traditional 2D samples, the training flow cell images may only include 2D flow cell images with similar plexity and/or sample density. In some embodiments, the training flow cell images may include a combination of flow cell images, either of 2D or 3D samples, and with or without similar characteristics.
  • the training flow cell images may be generated at multiple z-locations in order to cover characteristics of the sample at different z levels.
  • the corresponding plurality of training flow cell images (simulated or real images) may include flow cell images at multiple z-levels.
  • the corresponding plurality of training flow cell images may include z- stacks of flow cell images, each z-stack may include a 3D volume made up from multiple z-levels of flow cell images comprised in the z-stack.
  • the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels (2D images) but not a z-stack of flow cell images (e.g., a 3D volume ).
  • the systems and methods can be used to train neural networks to predict base calls for flow cell images acquired from one or more color channels, one or more cycles, and/or one or more z-levels in a sequence run.
  • the training data used to train the neural networks herein may be generated using real flow cell images, and the reference intensities of the training data are advantageously determined after removing errors therein that may be caused by various sources including but not limited to: color cross-talk, spatial misalignment of polonies, and/or phase and dephasing, blurriness of out-of-focus polonies, thereby allowing more reliable training.
  • the training data used to train the neural network herein does not include full flow cell images. Instead, the training data include patches (e.g., 16 pixels by 16 pixels patches) of the flow cell images from one or more color channels, one or more cycles, and/or one or more z-levels to provide spatial and temporal context for training.
  • the training using method 700 or 2900 may be training per polony as each patch only contain a very limited number of polonies, e.g., a single polony. The very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100.
  • the very limited number of polonies can be lOOx, lOOOx, 10 4 x, 10 5 x, 10 6 x, 10 7 x, or 10 8 x less than a total number of polonies in a corresponding flow cell image.
  • Each patch may include a patch image per color channel, per cycle, and per z- level.
  • Each patch image may share the same pixels of the corresponding portion of the flow cell images.
  • Each patch image may include a single polony at or near the center of the patch image or a very limited number of polonies.
  • Such training data may advantageously allow less complicated and more reliable training than training using flow cell images of one or more subtiles (e.g., 6000 pixels by 8000 pixels).
  • Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue.
  • Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity.
  • training with real flow cell images may advantageously allow reduced complexity of the neural network to achieve the predetermined quality than the neural network trained using simulated data.
  • the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc.
  • the values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s).
  • the methods 600, 700, 2900 may be used to train a neural network or any other artificial intelligence-based models using various references or ground truth that are not limited to reference base calls or reference intensities, e.g., in the second resolution.
  • the methods 2900 may include an operation 2925’ of generating, by the sequencing system, references corresponding to the intensities in the high resolution training flow cell images.
  • the references have the same spatial resolution as the high resolution training flow cell images.
  • the plurality of training flow cell images are acquired from one or more color channels, and the references comprises reference base calls. Each reference base call may correspond to a polony in the plurality of high resolution training flow cell images.
  • the references may be generated using various algorithms. The references may be based on existing datasets that are publicly available.
  • the plurality of training flow cell images are acquired from one or more color channels, and the references comprises reference classifications.
  • Each reference classification may correspond to a pixel in the plurality of high resolution training flow cell images from the one or more color channels.
  • Exemplary classifications may include nucleotides A, T, C, G, U, and background.
  • the classification of background can be for pixels that are not classified as any type of nucleotides, e.g., not classified as A, T, C, G, or U.
  • the plurality of training flow cell images are acquired from one or more color channels in one or more cycles at one or more z-levels, and the references comprise reference classifications.
  • a first reference classification may correspond to a pixel of a polony, and may have a classification that is a base call of that polony
  • a second reference classification may correspond to a pixel outside any polony in the plurality of high resolution training flow cell images from multiple color channels, e.g., a background classification.
  • the background classification may or may not be within a cell boundary of in situ cellular sample(s).
  • the plurality of training flow cell images are acquired from a single color channel from one or more sequencing cycles at one or more different z- levels
  • the references comprise reference polony maps.
  • Each reference polony map may correspond to at least a portion of an image of the plurality of high resolution training flow cell images in a sequencing cycle.
  • each reference polony map may correspond to a patch extracted from the high resolution training flow cell images so that each pixel in the polony map corresponds to a corresponding pixel of the patch, and the reference polony map indicates which pixel(s) are within a polony, and which pixel(s) are not.
  • the reference polony maps are generated using various algorithms for polony map generation.
  • Exemplary polony map generation algorithms for generating 2D or 3D polony maps have been disclosed in U.S. Application No. 18/078,797 and 18/078,820, and U.S. Patent No. 10,266,888, and are incorporated herein by reference in their entireties.
  • the first resolution can be a standard resolution that can be achieved using the imager disclosed herein.
  • the first resolution can be within the range from 0.01 um to 15 um.
  • the first resolution can be within the range from 0.01 um to 5 um.
  • the plurality of training flow cell images in the training set can be from one or more color channels.
  • the plurality of training flow cell images in the training set can be from 4 color channels.
  • the plurality of training flow cell images in the training set can be from one or more cycles.
  • the one or more cycles can be any number ranging from 1 to 10, 1 to 20, Ito 30, 1 to 50, 1 to 100, 1 to 200, or 1 to 500.
  • the plurality of flow cell images can be at a single z level or multiple z levels.
  • the sequencing method 2900 comprises an operation 715 of up-sampling, by the sequencing system, the plurality of training flow cell images to generate high-resolution training flow cell images having a second resolution.
  • the second resolution can be 2x, 4x, 8x, 16x, or higher than the first resolution.
  • the first resolution is in the range from 0.01 um to 5 um
  • the corresponding second resolution that is 4x higher than the first resolution can be in the range from 0.0025 um to 1.25 um.
  • Various up-sampling methods can be used for generating the high- resolution training flow cell images. Each high-resolution training flow cell image corresponds to a training flow cell image at the first resolution.
  • the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies in the plurality of flow cell images (e.g., a polony map containing locations of polonies or a polony map containing a location list of the polonies); and optionally extracting, by the sequencing system, intensities in the plurality of flow cell images based on the location list.
  • locations of polonies in the plurality of flow cell images e.g., a polony map containing locations of polonies or a polony map containing a location list of the polonies
  • intensities in the plurality of flow cell images based on the location list.
  • the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies (e.g., a polony map containing locations of polonies); in the high resolution training flow cell images; and optionally extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
  • locations of polonies e.g., a polony map containing locations of polonies
  • the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies (e.g., a polony map containing locations of polonies); in the high resolution training flow cell images; and optionally extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
  • the sequencing method 2900 comprises an operation of processing the high resolution training flow cell images to determine location of the polonies (e.g., bright spots in the image) and their processed intensities.
  • Their processed intensities may have been processed using image processing steps including but not limited to background removal, noise reduction, filtering, intensity normalization, intensity offset adjustment, phase and paraphrasing, image registration, color correction, and deconvolution.
  • the operation of processing the training flow cell images or the high resolution training flow cell images can include polony map generation.
  • Exemplary polony map generation embodiments are disclosed in details in U.S. Patent No. 11,200,446 and U.S. patent application Nos.18/078,820 and 18/078,797, which are incorporated herein by reference in their entireties.
  • the method 2900 comprises an operation 2925 of generating, by the sequencing system, reference base calls of the high resolution training flow cell images.
  • the operation 2925 may be based on locations of polonies (e.g., the polony map) so that only signals from polonies identified in the polony map are used for generating the reference base calls, other signals, including background noise, possible artifacts from cellular structures in the images can be excluded.
  • the reference base calls of the high resolution training flow cell images may be generated based on multiple patches, and each patch comprises one or more patch images from one or more color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images.
  • the patches may be generated based on the location list or the polony map so that each patch image has a single polony at or near its center pixel(s).
  • Each patch image corresponds to a reference base call of the single polony at or near its center pixels.
  • each patch image corresponds to a very limited number of reference base calls in the patch image, e.g., in a range from 1 to 10 or from 1 to 100.
  • the method 2900 comprises an operation 2925’ (which replaces operation 2925) of generating, by the sequencing system, references, instead of reference base calls, of the high resolution training flow cell images.
  • the operation 2925’ may be based on locations of polonies (e.g., the polony map) so that only signals from polonies identified in the polony map are used for generating the references, including background noise, possible artifacts from cellular structures in the images can be excluded.
  • the operation 2925’ may generate the references for some or all of the pixels of the flow cell images without requiring locations of the polonies or the polony map.
  • the operation 2925’ may generate the references as reference classifications of A, T, C, G, U, or background for each pixel of the flow cell images after aligning flow cell images from different color channels.
  • the patches extracted from the training flow cell images have properties that are similar as patches that predictions are going to be generated, e.g., using methods 2800.
  • properties can include patch size, location of the single polony within the patch, range of intensities for pixels within patches.
  • each patch comprises a single polony located at or in close vicinity to a center of the corresponding patch.
  • the polony may be no more than 1 to 10 pixels away from the center of the corresponding patch.
  • each patch comprises 3 to 128 pixels along a spatial dimension, e.g., along x or y direction.
  • the size of the patches are maintained to be relatively small comparing to the size of the flow cell images, e.g., lOx, 20x, 50x, lOOx, 500x, lOOOx or less than the size of the flow cell image.
  • the plurality of patches comprises 100 to 10 8 patches.
  • each patch may or may not contain more than one, two, three, five, or ten polonies therewithin, but only the pixel(s) of the single polony at its center is used for generating base call(s) corresponding to the patch.
  • a first patch may include pixels 1-32 in both x and y directions to cover a polony centered at pixels (16, 16) of the flow cell images
  • a second patch may include pixels 2-33 in both x and y directions to cover a second polony centered at pixels (17, 17.5)
  • a third patch may include pixels 5-36 in both x and y directions to cover a third polony centered at pixels (19, 19) of the flow cell images.
  • the number of pixels within each patch can be optimized to balance the computational complexity and spatial context information to be included for training the neural network(s).
  • the number of pixels within each patch can be at least partly based on polony density of the sample being imaged.
  • the number of patch images within each patch can be optimized to balance the computational complexity and the spatial context information within each patch for accurate and reliable prediction using the neural network.
  • each patch may comprise multiple patch images corresponding to different color channels.
  • each patch may comprise a patch image covering same pixels within the x-y plane in three different color channels. The same pixels may be pixels determined after registration to correct for the spatial offset across different color channels.
  • each patch may comprise multiple patch images corresponding to different cycles, e.g., continuous cycles n-1, n, n+1, within a sequencing run.
  • each patch may comprise 3 images, each from a different color channel in 4 adjacent cycles, so that each patch may comprise 12 patch images in total.
  • each patch may include 5 different z levels to make the total number of patch images of 60.
  • At least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise some identical pixels.
  • each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
  • the training flow cell images are acquired only from a single color channel in one or more sequencing cycles and/or one or more z-levels, so that training flow cell images acquired from different color channels may be used to train different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein, for a single color channel.
  • the training flow cell images are acquired only from a single z level from one or more color channels in one or more sequencing cycles, so that training flow cell images acquired at different z levels of 3D sample(s), e.g., in situ cells, may be used to train different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
  • the training flow cell images are acquired from the one or more cycles from one or more color channels and at one or more z-levels.
  • the one or more cycles comprises a plurality of consecutive cycles in a sequencing run.
  • the operation 2925 or 2925’ of generating the reference base calls or references of the high resolution training flow cell images is for each patch of the plurality of patches.
  • reference intensities of the high resolution training flow cell images may be determined using an operation similar to operation 725 disclosed herein.
  • the operation of generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity.
  • the intensities may undergo color correction, phasing/dephasing, normalization, and/or other corrections to reach the reference intensities.
  • FIG. 5F the intensities of the high resolution training flow cell images from two different channels are plotted. Each plot is plotted as a dot with its corresponding intensity in channels 1, 2, 3, and 4.
  • the polonies within area 790 would have a base call of A, thus, corresponding reference intensity of each polony having a base call of A can be obtained by projecting the dots to the fitted line in the region 790, e.g., projection with shortest distance.
  • vertical axis of the projected intensity on the line may be the reference intensity, and the horizontal axis of the projected intensity would be close to zero. It is understood that the reference intensity determination is not limited to the shortest distance projection in FIG. 5F.
  • the algorithm for determining the reference intensity may be iterative such that the reference intensities obtained in earlier iteration(s) can be improved based on customized quality criteria in later iterations.
  • the number of repetitions can be various numbers in a range from 1 to 10, 1 to 100, or more.
  • later iterations can use a different projection method that generates a smaller total distance to the fitted line as shown in FIG. 5F than the projection method that was used in earlier iteration(s).
  • the plurality of patches can be extracted from the high resolution training flow cell images after reference intensities are generated.
  • Each patch may include a patch image corresponding to a different color channel, and reference base calls may be determined based on the reference intensities from all color channels.
  • reference classifications of the patch may be determined similarly except that patches that satisfy certain customized conditions are background but not any type of nucleotides. For example, for a patch with 4 different patch images each corresponding to a color channel, if the reference intensities from all 4 channels are very similar to each other and all below a predetermined signal level, the patch then can have background classification.
  • each patch may only include a single patch image from a single color channel.
  • the operation 2925 or 2925’ may comprise generating a first single reference intensity for a first channel of the multiple color channels corresponding to the corresponding patch in a single sequencing cycle.
  • each patch may include multiple patch images from the same single color channel but from different sequencing cycles.
  • the operation 2925 or 2925’ may comprise generating reference intensities for a first channel of the multiple color channels corresponding to the corresponding patch in one or more sequencing cycles.
  • the operation 2930 or 2930’ may include providing the reference base calls or references so that they are available for comparison to training output(s) of the neural network, depending on how the user may want to train the neural network which may include: a single intensity of a single color channel at one sequencing cycle for each patch (for training a different neural network for each color channel), multiple intensities of a single color channel at multiple sequencing cycles (for training a different neural network for each color channel), or multiple intensities of multiple different color channels at one or more sequencing cycles (for training a single neural network for different color channels).
  • patches of reference base calls or references may also be separated based on z-levels of a 3D sample in order to train different neural networks at different z levels.
  • a single neural network may be trained using patches from different z levels.
  • the method 2900 may include an operation 2930’ of providing, by the processor, the references for comparison to training output(s) of the neural network.
  • the high resolution training flow cell images are also provided in operation 2930 or 2930’ for comparison to training output(s) of the neural network.
  • the method 2900 may include an operation 2955’ of determining an output error by comparing the training output and the references, instead of operation 2955.
  • At least part of the one or more samples comprises predetermined nucleotide bases in the one or more cycles.
  • the base calls for at least some of the polonies in the flow cell images in cycle(s) are predetermined.
  • the base calls can be predetermined by sequencing known barcode sequences in the one or more cycles.
  • the operation of generating the reference base calls in the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity used for generating reference base calls.
  • the algorithm for determining the reference base calls is based on determination of the reference intensities as disclosed herein, e.g., in methods 700.
  • the sequencing methods 2900 may include an operation 2930 of providing the reference base calls so that they can be compared to the training output(s) of the neural network for training.
  • the operation 2930 is similar to operation 730 in method 700.
  • the reference base calls may be provided as flow cell image(s) or alternatively as patches, each patch may comprise one or more patch images, and each patch image have a polony at or near its center.
  • the reference base calls may be provided as a list of base calls corresponding to their locations in the flow cell images.
  • the operation 2930 comprises an operation of providing the reference base calls in a plurality of patches for comparison during training to the training output(s) of the neural network, wherein each patch comprise one or more patch images from multiple color channels.
  • the patches of the flow cell images may be used for training per polony.
  • Each patch may comprise one or more patch images cropped from the flow cell images (e.g., the second plurality of flow cell images).
  • the training method 2900 may be configured to train the neural network, e.g., CNN for predicting a single base call at or close to the center of the patch.
  • the method of training 2900 using patches of flow cell images does not require training of a large number of polonies within a patch, thus may advantageously reduce computational complexity and increase training efficiency and accuracy.
  • the input to the neural network may also include the location list, e.g., a polony map.
  • the sequencing method 2900 herein include an operation 740 of repeatedly performing, until the output error satisfies a stopping criterion, one or more training operations comprising: an operation 2955 of determining an output error by comparing the training output and the reference base calls; and an operation 760 of adjust current values of parameters of the convolutional neural network based on the output error.
  • the operation 740 repeats itself using its output (e.g., adjusted parameters of the neural network) from the previous iteration as input to the current iteration.
  • output e.g., adjusted parameters of the neural network
  • the output error may be based on a comparison between the reference base calls and the predicted base calls during an iteration of training.
  • the comparison may only include locations in the location list, e.g., the polony map.
  • the comparison may include a subset of location in the location list.
  • the operation of 740 may stop when a stop criterion is met.
  • the stop criterion can be customized.
  • the stopping criterion can be customized based on training time, computational complexity, and convergence rate.
  • Exemplary stopping criterion include a fixed number of iterations, a fixed duration of training time, or a minimized loss function.
  • the stopping criterion can be (1) stop after 10 epochs to reduce training time.
  • the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0. Determining the output error can be based on various metrics, e.g., a loss function.
  • Nonlimiting examples of the loss function can include: the sum of root mean square of the difference between the predicted intensities and the corresponding reference base calls based on the location list, or the sum of mean square errors.
  • the method 2900 may further comprises an operation 770 of generating the trained neural network with eh adjusted parameters obtained in operation 760.
  • Values of the parameters of the neural network can be adjusted based on the output error or one or more previous output errors.
  • z-stacks of training flow cell images from a same channel can be acquired, e.g., in operation 705, to train the neural network, e.g., CNN, for that particular channel.
  • a certain percentage, e.g., 80%, of the training set may be used for training, and the rest of the training set, e.g., 20%, may be used for validation.
  • Batch size can be one
  • Epochs can be about 10, 12, 15, 20, or more.
  • various optimizers can be used.
  • the neural network comprises one or more convolution neural networks. In some embodiments, the neural network comprises one or more U-Net units.
  • comparing the training output to the reference set comprises: calculating mean square error in the predicted intensities generated by the neural network being trained and the corresponding reference intensities based on the location list.
  • the sequencing system is configured to acquire one or more cell images that may include images of the cell and/or tissue with various types of staining, e.g., fluorescent staining, configured to show morphological information of the sample.
  • the one or more cell images can comprise staining of cellular structures that help locate polonies or clusters relative to the stained structures.
  • staining can be of cellular structures or components including but not limited to membranes, nuclei, and mitochondria. Different staining colors may be used to stain different components of the cell.
  • the cell membrane after sequencing analysis and imaging using the sequencing system and reactions can be permeabilized.
  • the one or more cell images can comprise staining of lipids, such as lipids comprised in the cell membrane.
  • the one or more cell images can comprise staining of one or more transmembrane proteins.
  • the transmembrane proteins can be proteins embedded in the permeabilized membrane.
  • the one or more cell images comprise fluorescence or luminescence signals from cell membranes.
  • the one or more cell images can be microscopic images.
  • the one or more images can be fluorescent images.
  • different fluorescent colors can be included in the cell images.
  • the nuclei and the cell membrane can be stained with different colors.
  • the one or more cell images can comprise segments of: cells, membranes, nuclei, and/or other morphological structures.
  • the edge(s) of each segment encompass the entire membrane of the cell within the segment. There can be only one cell in each segment. Some segments may not have any cell in them. In some embodiments, adjacent segments do not overlap with each other. In some embodiments, adjacent segments only overlap with each other by sharing one or more edges. In some embodiments, various segmentation algorithms can be used for segmenting the cells.
  • the cell images disclosed herein are stained.
  • the staining can occur after acquiring flow cell images using the sequencing system 110. In some embodiments, the staining can occur before acquiring sequencing images.
  • the methods of staining the 3D sample such as the cells, tissue can include one or more operations disclosed herein.
  • the staining of the 3D sample can use various methods that can specifically label one or more cell protein(s) that are located mostly in the membrane but with negligible occurrence in other regions of the cell (e.g., less than 10%, 5%, 2% in amount or concentration).
  • the cell images may be acquired using the sequencing system 100 herein without moving the sample(s) from its position during sequencing. It is advantageous to stain the sample after sequencing and acquire the cell images while keep the samples immobilized to the sample stage of the sequencing system. Some transformation, e.g., rotation, translation, shearing may still occur so that there is a need to registered the flow cell images during sequencing to the cell images acquired after sequencing and staining.
  • the cell images may be acquired using optical device(s) external to the sequencing system 100 after the sequencing run has been completed and after moving the sample away from the sequencing system 100.
  • the sequencing system including optical system advantageously enables sequencing and imaging of target analyte(s) or features while they remain intact inside the cell or tissue.
  • the cell or tissue and the targets e.g., target analytes, structure elements, organelles, etc.
  • the one or more samples being imaged using the optical systems herein can be 2D or 3D samples.
  • the 2D sample(s) may include traditional nucleotide acid molecules extracted from various sources.
  • the 3D samples can include various samples in which polonies within the sample does not fit into a single z level while keeping the polonies in focus.
  • the 3D samples may include in situ samples such as cells and/or tissues.
  • the cells or tissue samples are immobilized on the flow cell device or otherwise substrate for sequencing and/or imaging without modifying the spatial locations of targets within the cells or tissue.
  • the cells or tissue samples are immobilized on the flow cell device or otherwise substrate for sequencing or imaging without modifying the spatial relationship of targets or target analytes within the cells or tissue.
  • the cells and/or tissue are immobilized with the morphological features, RNA, mRNA, and protein targets of the samples intact inside the cell(s) or tissue during sequencing and/or imaging.
  • the spatial locations or relationships of the target analytes or targets remain intact during sequencing and/or imaging.
  • the spatial locations or relationships of the target analytes or targets during sequencing and/or imaging are not manually reconstructed using artificially added structure or features in the sample.
  • the nucleus, cell membrane, mitochondria, and extracellular matrix can retain their relative spatial relationship to each other in the sample(s) during imaging and/or sequencing.
  • the one or more samples include target analyte(s) that are located inside the sample(s) or on the membrane of the sample(s). In some embodiments, the one or more samples include target analyte(s) that are on the exterior or interior surface of the cell. In some embodiments, the one or more samples include target analyte(s) that are on the exterior or interior surface of the cell membrane. In some embodiments, the one or more samples include target analyte(s) that are part of the extracellular matrix. In some embodiments, the one or more samples include target analyte(s) that are part of and/or located on one or more organelles within the cell or tissue. In some embodiments, the one or more samples include target analytes that are on or in the glycocalyx or belong to part of the glycocalyx.
  • the target analyte(s) comprise at least one polypeptide, lipid, nucleic acid or polysaccharide. In some embodiments, the target analyte(s) comprise at least one polypeptide, enzyme or lipid located anywhere in the sample(s) including the cytoplasm and nucleus. In some embodiments, the target analyte(s) comprise at least one polypeptide, enzyme or lipid located in or on a cellular structure including without limits any cellular membrane, nucleus, nucleolus, mitochondria, chloroplast, Golgi apparatus, ribosome, endoplasmic reticulum, microtubules, peroxisome and lysosome.
  • the methods, devices, and systems disclosed herein allow sequencing and analysis of various samples and sources.
  • the samples may include nucleic acids extracted from any of a variety of biological samples, e.g., blood samples, saliva samples, urine samples, cell samples, tissue samples, and the like.
  • the samples here may include a variety of different cell, tissue, or sample types known to those of skill in the art.
  • the sample(s) may be from eukaryotes (such as animals, plants, fungi, protista), archaebacteria, or eubacteria.
  • the sample(s) may include prokaryotic or eukaryotic cells, such as adherent or non-adherent eukaryotic cells.
  • the sample(s) may be from, for example, primary or immortalized rodent, porcine, feline, canine, bovine, equine, primate, or human cell lines.
  • the sample(s) may include a variety of different cell, organ, or tissue types (e.g., white blood cells, red blood cells, platelets, epithelial cells, endothelial cells, neurons, glial cells, astrocytes, fibroblasts, skeletal muscle cells, smooth muscle cells, gametes, or cells from the heart, lungs, brain, liver, kidney, spleen, pancreas, thymus, bladder, stomach, colon, or small intestine).
  • the sample(s) may include normal or healthy cells.
  • the sample(s) may include diseased cells, such as cancerous cells, or from pathogenic cells that are infecting a host.
  • the sample(s) may include a distinct subset of cell types, e.g., immune cells (such as T cells, cytotoxic (killer) T cells, helper T cells, alpha beta T cells, gamma delta T cells, T cell progenitors, B cells, B-cell progenitors, lymphoid stem cells, myeloid progenitor cells, lymphocytes, granulocytes, Natural Killer cells, plasma cells, memory cells, neutrophils, eosinophils, basophils, mast cells, monocytes, dendritic cells, and/or macrophages, or any combination thereof), undifferentiated human stem cells, human stem cells that have been induced to differentiate, rare cells (e.g., circulating tumor cells (CTCs), circulating epithelial cells, circulating endothelial cells, circulating tumor cells (CTCs), circulating epi
  • the methods disclosed herein may comprise an operation of registering, by the reconfigurable logic device and/or the integrated circuit, the one or more cell images (e.g., with staining) to sequencing images or results of the sample, e.g., base calls of the determined polonies.
  • such operation is performed by the different combinations of the first plurality of data processing engines and the first reconfigurable routing channels after the operation of determining polonies from the second plurality of flow cell images or after the operation of performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
  • such of operation of registering the cell images to flow cell images or base calls may be performed by the integrated circuits, and the registration results, e.g., the transformation(s), may be communicated from the integrated circuit to the reconfigurable logic device or the one or more processors of the sequencing system.
  • the methods herein include saving the registration results, by the reconfigurable logic device, the integrated circuit, or the one or more processors into a predetermined file format, e.g., a FastQ data file, so that it can be accessed using similar software that is configured to access sequencing results such as base calls.
  • the methods further include an operation of accessing both the registration results of the cell images and other sequencing results to present sequencing results in correspondence with the morphological information of the sample, e.g., to a user.
  • the methods may include an operation of displaying a base calling results in color that is spatially registered to cellular features, e.g., the nucleus, so that the aligned results can conveniently allow the user to identify base calls in relation to the morphological information of cells.
  • saving and access the registration results of the cell images and other sequencing results may be performed by the one or more processors, the reconfigurable logic device, and/or the integrated circuit.
  • the registration results of the cell images and other sequencing results may be saved into a memory device that is within the housing of the sequencing system. In some embodiments, the registration results of the cell images and other sequencing results may be saved into a memory device that is on the cloud 130 external to the sequencing system.
  • the fiducial markers can be internal or external to the sample.
  • internal fiducial markers can include at least some of the polonies or clusters or background objects in the sample.
  • external fiducial markers can be microspheres coated on the flow cell so that the signal from the microspheres can function similarly as internal fiducial markers for registration.
  • the same fiducial markers can appear in sequencing images, e.g., the flow cell image(s), the cell images so that transformation(s) can be derived from aligning the fiducial markers in different images.
  • the transformation(s) can be used for registering or aligning the sequencing image(s) and cell image(s) and objects that appear in them. Exemplary embodiments of image registration methods are described in PCT patent application No. PCT/US2023/067931 (where the contents of the patent are hereby incorporated by reference in its entirety).
  • a polony or other object e.g., background objects as fiducial markers
  • image intensity I centers at location (xl,yl) in a sequencing image can appear at location (x2, y2) with intensity F in a cell image, where (x2,y2) Mr *(xl,yl), and Mr is the transformation matrix.
  • the inverse transformation matrix Mr 1 can be determined such that (xl,y 1) — Mr -1 *(x2,y2).
  • the registration of images can be in 2D and can include translation, scaling, rotation, and/or shearing of flow cell images among different channels. Multiple points in the sequencing image and their corresponding points in the cell image can be used to determine the transformation. The minimum number of points that is needed can be determined by the degree of freedom in the transformation.
  • the image registration can be 3D with coordinates in x, y, and z axes.
  • an image e.g., a flow cell images, a cell images, etc.
  • a transformation can be determined for each subtile to represent the transformation of the whole image.
  • the image transformation of each subtile can be uniquely represented by a transformation matrix.
  • the transformation matrix can be determined as below:
  • M M21 M22 M23 (2) 31 M32 M33
  • the transformation matrix can be defined as the inverse matrix of M, i.e., M’ 1 , so that equation (1) can be expressed differently as
  • the transformation matrix M is an estimation in equations (1) and (3) based on the 2D shifts.
  • the value of n may affect the accuracy of the estimation.
  • more than one region can be selected within a subtile for cross correlation calculation, and more than one 2D shift can be calculated for each subtile and used for estimating the transformation of the subtile.
  • n in equation (1) can be replaced by a larger number, e.g., 2*n when 2 regions are selected per subtile, and the transformation matrix M can be estimated using equations (1) and (2).
  • (al, bl) . . . (an, bn) in equations (l)-(3) are coordinates for selected region(s) (e.g., coordinates of a center pixel of the corresponding region(s)) after transformation, (xl, yl). . . (xn, yn) are coordinates of the selected region(s) before transformation, e.g., coordinates of a center pixel.
  • n is a number that is no less than 3. The larger the n, the more information can be used to estimate the transformation matrix M. In some embodiments, n is not greater than 9.
  • the transformation of one or more subtiles is linear. In some embodiments, the transformation of all subtiles is linear. In some embodiments, the transformation matrix is a matrix in which M31 and M32 is equal to 0, and M33 is 1. In some embodiments, one or more of the transformations per subtile is an affine transformation and the transformation matrix of the entire flow cell image is an affine matrix.
  • the transformation matrix M is an estimation in equations (1) and (3) based on the size of the selected region(s).
  • the size of selected region may affect the accuracy of the estimation.
  • the size of the select region can be about 128 x 128.
  • the size of the selected region can be about 32 x 32, 48 x 48, 64 x 64, 96 x 96, 160 x 160, 196 x 196, 256 x 256, or of various different sizes.
  • the transformations per subtile as disclosed herein can be calculated using a selected region within a subtile, the selected region can be equal to or smaller than the subtile.
  • the transformation estimated using the region can be used to estimate the transformation of the entire subtile given the intrinsic characteristics of image transformation across sequencing cycles.
  • the image transformation between cycles and/or between neighboring pixels can be relatively small, e.g., with less than about 8%, 5% or less than about 1% of scaling, rotation, and/or shearing.
  • the transformations disclosed herein can include an image translation with greater than about 5% difference between cycles and/or between neighboring pixels.
  • the transformation of the entire flow cell image can be accurately and reliably estimated by transforming individual subtiles using the plurality of transformations and combining the transformed subtiles into a transformed flow cell.
  • the techniques disclosed herein advantageously estimate the transformation of the flow cell image by determining a plurality of transformations of its individual subtiles.
  • the plurality of transformations can be linear and yet accurately and reliably estimate the transformation of the flow cell image even if the transformation is non-linear.
  • the techniques disclosed herein advantageously eliminate the need to calculate the transformation of the entire images to be registered or aligned which can be more computationally intensive and timeconsuming and prone to failure than estimating a plurality of transformations for the corresponding subtiles of the entire images.
  • Various aspects of the methods described herein, such as methods 500, 600, 700, 2800 and 2900, may be implemented, for example, using one or more computer systems, such as computer system 400 shown in FIG. 4.
  • One or more computer systems 400 may be used, for example, to implement any of the aspects discussed herein, as well as combinations and sub-combinations thereof.
  • Computer system 400 may include one or more hardware processors 404.
  • the hardware processor 404 may be central processing unit (CPU), graphic processing units (GPU), or their combination.
  • Processor 404 may be connected to a bus or communication infrastructure 406.
  • Computer system 400 may also include user input/output device(s) 403, such as monitors, keyboards, pointing devices, etc., which may communicate with communication infrastructure 406 through user input/output interface(s) 402.
  • the user input/output devices 403 may be coupled to the user interface 124 in FIG. 1.
  • One or more units of processors 404 may be a graphics processing unit (GPU).
  • a GPU may be a processor that is a specialized electronic circuit designed to process mathematically intensive applications.
  • the GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, vector processing, array processing, etc., as well as cryptography (including brute-force cracking), generating cryptographic hashes or hash sequences, solving partial hashinversion problems, and/or producing results of other proof-of-work computations for some blockchain-based applications, for example.
  • the GPU may be particularly useful in at least the image recognition and machine learning aspects described herein.
  • processors 404 may include a coprocessor or other implementation of logic for accelerating cryptographic calculations or other specialized mathematical functions, including hardware-accelerated cryptographic coprocessors. Such accelerated processors may further include instruction set(s) for acceleration using coprocessors and/or other logic to facilitate such acceleration.
  • Computer system 400 may also include a data storage device such as a main or primary memory 408, e.g., random access memory (RAM).
  • Main memory 408 may include one or more levels of cache.
  • Main memory 408 may have stored therein control logic (i.e., computer software) and/or data.
  • Computer system 400 may also include one or more secondary data storage devices or secondary memory 410.
  • Secondary memory 410 may include, for example, a main storage drive 412 and/or a removable storage device or drive 414.
  • Main storage drive 412 may be a hard disk drive or solid-state drive, for example.
  • Removable storage drive 414 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
  • Removable storage drive 414 may interact with a removable storage unit 418.
  • Removable storage unit 418 may include a computer usable or readable storage device having stored thereon computer software and/or data.
  • the software may include control logic.
  • the software may include instructions executable by the hardware processor(s) 404.
  • Removable storage unit 418 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device.
  • Removable storage drive 414 may read from and/or write to removable storage unit 418.
  • Secondary memory 410 may include other means, devices, components, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 400.
  • Such means, devices, components, instrumentalities or other approaches may include, for example, a removable storage unit 422 and an interface 420.
  • Examples of the removable storage unit 422 and the interface 420 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
  • Computer system 400 may further include a communication or network interface 424.
  • Communication interface 424 may enable computer system 400 to communicate and interact with any combination of external devices, external networks, external entities, etc. (individually and collectively referenced by reference number 428).
  • communication interface 424 may allow computer system 400 to communicate with external or remote devices 428 over communication path 426, which may be wired and/or wireless (or a combination thereof), and which may include any combination of LANs, WANs, the Internet, etc.
  • Control logic and/or data may be transmitted to and from computer system 400 via communication path 426.
  • communication path 426 is the connection to the cloud 130, as depicted in FIG. 1.
  • the external devices, etc. referred to by reference number 428 may be devices, networks, entities, etc. in the cloud 130.
  • Computer system 400 may also be any of a personal digital assistant (PDA), desktop workstation, laptop or notebook computer, netbook, tablet, smart phone, smart watch or other wearable, appliance, part of the Internet of Things (loT), and/or embedded system, to name a few non-limiting examples, or any combination thereof.
  • PDA personal digital assistant
  • desktop workstation laptop or notebook computer
  • netbook tablet
  • smart phone smart watch or other wearable
  • appliance part of the Internet of Things (loT)
  • embedded system to name a few non-limiting examples, or any combination thereof.
  • the framework described herein may be implemented as a method, process, apparatus, system, or article of manufacture such as a non-transitory computer-readable medium or device.
  • the present framework may be described in the context of distributed ledgers being publicly available, or at least available to untrusted third parties.
  • distributed ledgers being publicly available, or at least available to untrusted third parties.
  • blockchainbased systems One example as a modern use case is with blockchainbased systems.
  • present framework may also be applied in other settings where sensitive or confidential information may need to pass by or through hands of untrusted third parties, and that this technology is in no way limited to distributed ledgers or blockchain uses.
  • Computer system 400 may be a client or server, accessing or hosting any applications and/or data through any delivery paradigm, including but not limited to remote or distributed cloud computing solutions; local or on-premises software (e.g., “onpremise” cloud-based solutions); “as a service” models (e.g., content as a service (CaaS), digital content as a service (DCaaS), software as a service (SaaS), managed software as a service (MSaaS), platform as a service (PaaS), desktop as a service (DaaS), framework as a service (FaaS), backend as a service (BaaS), mobile backend as a service (MBaaS), infrastructure as a service (laaS), database as a service (DBaaS), etc.); and/or a hybrid model including any combination of the foregoing examples or other services or delivery paradigms.
  • “as a service” models e.g., content as a service (CaaS), digital
  • Any applicable data structures, file formats, and schemas may be derived from standards including but not limited to JavaScript Object Notation (JSON), Extensible Markup Language (XML), Yet Another Markup Language (YAML), Extensible Hypertext Markup Language (XHTML), Wireless Markup Language (WML), MessagePack, XML User Interface Language (XUL), or any other functionally similar representations alone or in combination.
  • JSON JavaScript Object Notation
  • XML Extensible Markup Language
  • YAML Yet Another Markup Language
  • XHTML Extensible Hypertext Markup Language
  • WML Wireless Markup Language
  • MessagePack XML User Interface Language
  • XUL XML User Interface Language
  • Any pertinent data, files, and/or databases may be stored, retrieved, accessed, and/or transmitted in human-readable formats such as numeric, textual, graphic, or multimedia formats, further including various types of markup language, among other possible formats.
  • the data, files, and/or databases may be stored, retrieved, accessed, and/or transmitted in binary, encoded, compressed, and/or encrypted formats, or any other machine-readable formats.
  • Interfacing or interconnection among various systems and layers may employ any number of mechanisms, such as any number of protocols, programmatic frameworks, floorplans, or application programming interfaces (API), including but not limited to Document Object Model (DOM), Discovery Service (DS), NSUserDefaults, Web Services Description Language (WSDL), Message Exchange Pattern (MEP), Web Distributed Data Exchange (WDDX), Web Hypertext Application Technology Working Group (WHATWG) HTML5 Web Messaging, Representational State Transfer (REST or RESTful web services), Extensible User Interface Protocol (XUP), Simple Object Access Protocol (SOAP), XML Schema Definition (XSD), XML Remote Procedure Call (XML- RPC), or any other mechanisms, open or proprietary, that may achieve similar functionality and results.
  • API application programming interfaces
  • Such interfacing or interconnection may also make use of uniform resource identifiers (URI), which may further include uniform resource locators (URL) or uniform resource names (URN).
  • URI uniform resource identifier
  • URL uniform resource locators
  • UPN uniform resource names
  • Other forms of uniform and/or unique identifiers, locators, or names may be used, either exclusively or in combination with forms such as those set forth above.
  • Any of the above protocols or APIs may interface with or be implemented in any programming language, procedural, functional, or object-oriented, and may be compiled or interpreted.
  • Non-limiting examples include C, C++, C#, Objective-C, Java, Scala, Clojure, Elixir, Swift, Go, Perl, PHP, Python, Ruby, JavaScript, WebAssembly, or virtually any other language, with any other libraries or schemas, in any kind of framework, runtime environment, virtual machine, interpreter, stack, engine, or similar mechanism, including but not limited to Node.js, V8, Knockout, j Query, Dojo, Dijit, 0penUI5, AngularJS, Expressjs, Backbone js, Ember.js, DHTMLX, Vue, React, Electron, and so on, among many other non-limiting examples.
  • a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon may also be referred to herein as a computer program product or program storage device.
  • control logic software stored thereon
  • control logic when executed by one or more data processing devices (such as computer system 400), may cause such data processing devices to operate as described herein.
  • the RNA is not extracted from the cellular sample and sequencing information does not need to be tracked and mapped back to an image of the cellular sample. Rather, RNA is retained inside the cellular sample to permit direct imaging of the spatial location of target RNAs within the cells. Additionally, RNA within the cellular sample is not fragmented and enrichment of target RNA is not necessary.
  • Use of target-specific and/or random-sequence reverse transcription primers enables detection of both poly-A and non-poly-A RNAs in either uni-plex or multi-plex modes.
  • the methods comprise repeatedly conducting a short number of sequencing cycles of the same region of the template molecules (e.g., concatemer molecules).
  • the RNA content of the cellular sample can be discovered.
  • the reiterative short sequencing cycles described herein use a reduced amount of sequencing reagents which reduces cost and saves time.
  • Methods for conducting reiterative short sequencing cycles has many uses including but not limited to detecting specific RNAs of interest, mutant RNA sequences, splice variants, and their abundance levels thereof.
  • the concatemers carry tandem repeat units of a cDNA-of-interest, the universal sequencing primer binding site, and the target barcode sequence.
  • the concatemers are sequenced inside the cellular sample where a short number of sequencing cycles are conducted for each round and multiple rounds of short read sequencing is conducted.
  • the full length of the target barcode and cDNA region are not sequenced. Instead, at least a portion of the target barcode region is reiteratively sequenced. In some embodiments, it is not necessary to sequence the cDNA region. In some embodiments, the target barcode and a portion of the cDNA region are reiteratively sequenced. It is not necessary to sequence the entire length of the cDNA region.
  • a short portion of the cDNA region in the concatemer is resequenced at least once (e.g., reiterative sequencing) from the same start position to generate overlapping sequencing reads that can be aligned to a reference sequence.
  • the same portion of the concatemer molecule can be sequenced at least two, three, four, five, or up to 50 times.
  • the start sequencing site can be any location of the concatemer and is dictated by the sequencing primers which are designed to anneal to a selected position within the concatemer.
  • the reiterative short sequencing reads increase the redundancy of sequencing information for individual bases in the cDNA region. Reiteratively sequencing one strand of the concatemer template molecule provides enough base coverage to reveal the presence of target RNAs in the cellular sample so that pairwise sequencing of the complementary strand is not necessary.
  • a concatemer template molecule includes multiple sequencing primer binding sites along the same concatemer molecule which can be used to generate multiple usable sequencing reads for increased sequencing depth. Together, reiteratively sequencing one strand of the concatemer templates increases sequencing base coverage and sequencing depth compared to sequencing a one-copy template molecule.
  • the methods described herein can be conducted in uni-plex or multi-plex modes. Two or more different target RNAs can be detected and imaged simultaneously inside a cellular sample using different reverse transcription primers, different target-specific padlock probes, and universal sequencing primers. For example, the presence of a housekeeping RNA and at least one target RNA in a cellular sample can be simultaneously detected and imaged using any of the reiterative short read sequencing methods described herein.
  • the present disclosure provides methods for detecting in situ at least two different target RNA molecules in a cellular sample comprising step (a): providing a cellular sample harboring a plurality of RNA which comprises at least a first target RNA molecule and a second target RNA molecule.
  • the cellular sample is fixed and permeabilized.
  • the cellular sample harbors 2-25 different target RNA molecules, or harbors 25-50 different target RNA molecules, or harbors 50-75 different target RNA molecules, or harbors 75-100 different target RNA molecules.
  • the cellular sample harbors more than 100 different target RNA molecules, or more than 250 different target RNA molecules, or more than 500 different target molecules, or more than 1000 different target RNA molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target RNA molecules. In some embodiments, the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor. In some embodiments, the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, an FFPE cellular sample, or a sectioned FFPE cellular sample. In some embodiments, the cellular sample is deposited onto a solid support.
  • the cellular sample is deposited onto a solid support which is passivated with a coating that promotes cell adhesion. In some embodiments, the cellular sample is deposited on a support that lacks immobilized capture oligonucleotides. In some embodiments, the cellular sample is cultured before or after depositing the cellular sample onto the solid support. In some embodiments, the cellular sample is cultured prior to conducting step (b) which is described below. In some embodiments, the cellular sample comprises an expanded cellular sample that has been cultured in a simple or complex cell culture media. In some embodiments, the cellular sample is not cultured or expanded prior to conducting step (b).
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (b): generating inside the cellular sample a plurality of cDNA molecules which include at least a first target cDNA molecule that corresponds to the first target RNA molecule, and the plurality of cDNA molecules includes a second target cDNA molecule that corresponds to the second target RNA molecule.
  • the method comprises generating at least 2-10,000 different target cDNA molecules that correspond to 2-10,000 different target RNA molecules.
  • the generating of step (b) comprises contacting the plurality of RNA inside the cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample (e.g., FIG. 7).
  • a plurality of reverse transcription primers e.g., a plurality of reverse transcriptase enzymes, and iii) a plurality of nucleotides
  • the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA, and comprises a second sub -population of targetspecific reverse transcription primers that hybridize selectively to the second target RNA.
  • the first and second sub-population of target-specific reverse transcription primers have the same sequence or different sequences.
  • the entire length of the first sub-population of targetspecific reverse transcription primers hybridize to a first target RNA molecule.
  • the first sub-population of target-specific reverse transcription primers comprise tailed primers having a portion that hybridizes to a first target RNA molecule and a portion that does not hybridize to a first target RNA molecule.
  • the first sub-population of target-specific reverse transcription primers comprise at least a portion having a poly-T sequence.
  • the first subpopulation of target-specific reverse transcription primers comprise at least a portion having a random sequence and/or at least a portion having a target-specific sequence.
  • the entire length of the second sub-population of targetspecific reverse transcription primers hybridize to a second target RNA molecule.
  • the second sub-population of target-specific reverse transcription primers comprise tailed primers having a portion that hybridizes to a second target RNA molecule and a portion that does not hybridize to a second target RNA molecule.
  • the second sub-population of target-specific reverse transcription primers comprise at least a portion having a poly-T sequence.
  • the second sub-population of target-specific reverse transcription primers comprise at least a portion having a random sequence and/or at least a portion having a target-specific sequence.
  • a target RNA molecule that is hybridized to a cDNA molecule can be subjected to enzymatic degradation using a ribonuclease under a condition suitable for degrading RNA in an RNA/DNA duplex.
  • a target RNA molecule that is hybridized to a cDNA molecule is not subjected to enzymatic degradation.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (c): contacting the plurality of cDNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes.
  • cDNA is not generated from RNA inside the cellular sample.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise contacting RNA inside the cell with a plurality of target-specific padlock probes and generating circularized padlock probes.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (c): contacting the plurality of RNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of targetspecific padlock probes.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different targetspecific padlock probes.
  • a target RNA molecule can be subjected to enzymatic degradation using a ribonuclease. In some embodiments, a target RNA molecule is not subjected to enzymatic degradation.
  • individual padlock probes in the plurality of first targetspecific padlock probes comprise first and second terminal regions (e.g., first and second padlock binding arms), wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule (or the first target RNA molecule), and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule (or the first target RNA molecule).
  • first and second terminal regions e.g., first and second padlock binding arms
  • the contacting of step (c) comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule (or the first target RNA molecule) to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions (e.g., FIG. 7, left).
  • the first target-specific padlock probe comprises a first target barcode sequence (target BC-1) that corresponds to and uniquely identifies the first target cDNA sequence (or the first target RNA sequence).
  • the first targetspecific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule (or the first target RNA sequence).
  • the first target-specific padlock probe comprises at least one universal adaptor sequence, such as for example a universal sequencing primer binding site (or a complementary sequence thereof).
  • the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof).
  • the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • individual padlock probes in the plurality of second targetspecific padlock probes comprise first and second terminal regions (e.g., first and second padlock binding arms), wherein the first terminal region selectively hybridizes to a first region of the second target cDNA molecule (or the second target RNA molecule), and the second terminal region selectively hybridizes to a second region of the second target cDNA molecule (or the second target RNA molecule).
  • first and second terminal regions e.g., first and second padlock binding arms
  • the contacting of step (c) comprises: hybridizing the first and second terminal regions of the second target-specific padlock probes to proximal positions on the second target cDNA molecule (or the second target RNA molecule) to form a circularized second targetspecific padlock probe having a nick or gap between the hybridized first and second terminal regions (e.g., FIG. 7, right).
  • the second target-specific padlock probe comprises a second target barcode sequence (target BC-2) that corresponds to and uniquely identifies the second target cDNA sequence (or the second target RNA sequence).
  • the second target-specific padlock probe comprises a second target barcode sequence that is located adjacent to one of the regions of the second target-specific padlock probe that selectively hybridizes to the second target cDNA molecule (or the second target RNA sequence).
  • the second targetspecific padlock probe comprises at least one universal adaptor sequence, such as for example a universal sequencing primer binding site (or a complementary sequence thereof).
  • the second target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof).
  • the second target-specific padlock probe comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the first target barcode sequence (target BC-1) and the second target barcode sequence (target BC-2) have different sequences and can be used to conduct multiplex RNA detection and sequencing. In some embodiments, the first target barcode sequence (target BC-1) and the second target barcode sequence (target BC-2) have the same sequence and can be used to conduct uni-plex RNA detection and sequencing.
  • the first and second target-specific padlock probes comprise a universal sequencing primer binding site and a target barcode sequence that are adjacent to each other so that the target barcode region of the concatemer is sequenced first.
  • the target barcode sequence can be any length, for example 3-15 bases, or 15-25 bases, or 25-40 bases, or longer.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (d): closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the cellular sample.
  • the closing the nick in the first and second circularized padlock probes comprises conducting an enzymatic ligation reaction.
  • closing the gap in the first and second circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the first or second target cDNA molecule (or the first or second RNA molecule) as a template, and conducting an enzymatic ligation reaction.
  • the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting one or more enzymatic reactions, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (e): conducting a rolling circle amplification reaction inside the cellular sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least a first concatemer molecule that corresponds to a first target RNA molecule, and the plurality of concatemer molecules includes at least a second concatemer molecule that corresponds to a second target RNA molecule.
  • the first concatemer molecule comprises tandem repeat units, wherein a unit comprises a sequence that corresponds to the first target cDNA (or the first target RNA), the first target barcode sequence, and the universal sequencing primer binding site (or a complementary sequence thereof).
  • the second concatemer molecule comprises tandem repeat units, wherein a unit comprises a sequence that corresponds to the second target cDNA (or the second target RNA), the second target barcode sequence, and the universal sequencing primer binding site (or a complementary sequence thereof).
  • the rolling circle amplification reaction of step (e) comprises contacting the covalently closed circularized padlock probes with an amplification primer (e.g., a universal rolling circle amplification primer), a stranddisplacing DNA polymerase, and a plurality of nucleotides, under a condition suitable for hybridizing individual amplification primers to a covalently closed padlock probe, and under a condition suitable for conducting primer extension using the covalently closed padlock probe as a template molecule to generate a nucleic acid concatemer.
  • an amplification primer e.g., a universal rolling circle amplification primer
  • a stranddisplacing DNA polymerase e.g., a stranddisplacing DNA polymerase
  • the method comprises conducting a rolling circle amplification reaction inside the cellular sample using the at least 2-10,000 covalently closed circular padlock probes as template molecules, thereby generating at least 2-10,000 concatemer molecules that correspond to at least 2-10,000 target RNA molecules.
  • the plurality of concatemers that are generated inside the cellular sample collapse into a DNA nanoball having a shape and size that is more compact compared to a non-collapsed concatemer.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (f): sequencing the plurality of concatemer molecules inside the cellular sample, which comprises sequencing the first concatemer molecule by conducting no more than 2-30 sequencing cycles to generate a plurality of first sequencing read products, and sequencing the second concatemer molecule by conducting no more than 2-30 sequencing cycles to generate a plurality of second sequencing read products (FIG. 8).
  • the sequencing of step (f) comprises sequencing no more than 2-30 bases of the first concatemer molecules to generate a plurality of first sequencing read products, and which comprises sequencing no more than 2-30 bases of the second concatemer molecules to generate a plurality of second sequencing read products.
  • the method comprises sequencing the at least 2-10,000 concatemer molecules inside the cellular sample, which comprises conducting no more than 2-30 sequencing cycles on the 2-10,000 concatemer molecules to generate a plurality of sequencing read products.
  • only the first target barcode region of the first concatemer molecules are sequenced (e.g., FIG. 8, top). In some embodiments, at least a portion or the full length of the first target barcode of the first concatemer molecules are sequenced (e.g., FIG. 8, top). In some embodiments, the first target barcode is sequenced and a portion of the first cDNA region (or the first RNA region) of the first concatemer molecules are sequenced. In some embodiments, at least a portion of the first cDNA region (or the first RNA region) of the first concatemer molecules are sequenced.
  • only the second target barcode region of the second concatemer molecules are sequenced (e.g., FIG. 8, bottom). In some embodiments, at least a portion or the full length of the second target barcode of the second concatemer molecules are sequenced (e.g., FIG. 8, bottom). In some embodiments, the second target barcode is sequenced and a portion of the second cDNA region (or the second RNA region) of the second concatemer molecules are sequenced. In some embodiments, at least a portion of the second cDNA region (or the second RNA region) of the second concatemer molecules are sequenced.
  • the sequencing of step (f) comprises contacting the plurality of concatemer molecules inside the cellular sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers.
  • the sequencing of step (f) further comprises conducting no more than 2-30 sequencing cycles to generate at least a first plurality of sequencing read products by sequencing at least the first target barcode region (Target BC-1), and optionally conducting no more than 2-30 sequencing cycles to generate at least a second plurality of sequencing read products by sequencing at least the second target barcode region (Target BC-2).
  • the nucleotide reagents comprise multivalent molecules, nucleotides and/or nucleotide analogs.
  • the sequencing of step (f) comprises sequencing at least a portion of the first and second nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm 2 .
  • FOV field-of-view
  • the plurality of first and second sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of first and second sequencing read products from the images obtained during the no more than 2-30 sequencing cycles.
  • the plurality of the first and second sequencing read products are detectable by imaging, and wherein the sequencing comprises simultaneously imaging the plurality of first and second detectable sequencing read products in the cellular sample (co-localization of the first and second sequencing read products).
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (g): removing the plurality of first sequencing read products from the first concatemer molecules and retaining the first concatemer molecules in the cellular sample, and removing the plurality of second sequencing read products from the second concatemer molecules and retaining the second concatemer molecules in the cellular sample.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (h): reiteratively sequencing the plurality of concatemers by repeating steps (f) and (g) at least once, wherein the sequences of the plurality of first sequencing read products confirms the presence of the first target RNA molecules in the cellular sample, and wherein the sequences of the plurality of second sequencing read products confirms the presence of the second target RNA molecules in the cellular sample.
  • reiteratively sequencing at least one region of the concatemer comprises repeating steps (f) - (g) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times.
  • reiteratively sequencing at least one region of the concatemer comprises repeating steps (f) - (g) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • steps (f) - (g) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • An example of reiterative sequence is shown in a schematic in FIG. 9-12.
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), (iii) an insert sequence that corresponds to a given target cDNA, and (iv) a target barcode sequence that corresponds to the given target cDNA (BC).
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., reference barcode sequence) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), (iii) an insert sequence that corresponds to a given target cDNA, and (iv) a target barcode sequence that corresponds to the given target cDNA (BC).
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., reference barcode sequence and the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • a first reference sequence e.g., reference barcode sequence and the insert sequence that corresponds to the target RNA
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), and (iii) an insert sequence that corresponds to a given target cDNA.
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq) and (ii) an insert sequence that corresponds to a given target cDNA.
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • At least one concatemer is sequenced by conducting step (f) once (non-reiterative sequencing). In some embodiments, at least one concatemer is sequenced by conducting steps (f) - (g) once. In some embodiments, at least one concatemer is reiteratively sequenced by conducting steps (f) - (g) at least twice.
  • the plurality of universal sequencing primers can be hybridized to concatemer template molecules with a hybridization reagent comprising an SSC buffer (e.g., 2X saline-sodium citrate) buffer with formamide (e.g., 10-20% formamide).
  • SSC buffer e.g., 2X saline-sodium citrate
  • formamide e.g., 10-20% formamide.
  • the hybridization conditions comprise a temperature of about 20-30 °C, for about 10-60 minutes.
  • the plurality of sequencing read products can be removed from the concatemers and the plurality of concatemers can be retained inside the cellular sample using a de-hybridization reagent comprising an SSC buffer (e.g., saline-sodium citrate) buffer, with or without formamide, at a temperature that promotes nucleic acid denaturation such as for example 30 - 90 °C.
  • SSC buffer e.g., saline-sodium citrate
  • the plurality of nucleotide reagents of step (f) comprise a plurality of nucleotides that are detectably labeled or non-labeled.
  • individual nucleotides are linked to a detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the plurality of detectably labeled nucleotide analogs comprise a plurality of chain terminating nucleotides, where the chain terminating moiety is linked to the 3’ nucleotide sugar position to form a 3’ blocked nucleotide analog.
  • the chain terminating moiety can be removed to convert the 3’ blocked nucleotide analog to an extendible nucleotide having a 3’ OH group on the sugar.
  • the labeled nucleotide analogs are linked to a different fluorophore that corresponds to the nucleo-bases adenine, cytosine, guanine, thymine or uracil, where the different fluorophores emit a fluorescent signal during the sequencing of step (f).
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a sequencing polymerase and a detectably labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the detectably labeled chain terminating nucleotide into the terminal end of the sequencing primer, (2) detecting and imaging the fluorescent signal and color emitted by the incorporated chain terminating nucleotide, and (3) removing the chain terminating moiety (e.g., unblocking) and the fluorophore from the incorporated nucleotide and retaining the concatemer/sequencing primer duplex.
  • chain terminating moiety e.g., unblocking
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • the sequencing when the sequencing is conducted using different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U), then the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U)
  • the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • out-of-sync phasing and/or pre-phasing events can occur during synchronized sequencing reactions on clonally amplified template amplicons, where the sequencing reactions comprise polymerase-catalyzed sequencing reactions employing detectably labeled chain terminator nucleotides.
  • a sequencing reaction on one template molecule in the clonally-amplified template molecules moves ahead (e.g., pre-phasing) or fall behind (e.g., phasing) of the sequencing of the other template molecules within the clonally-amplified template molecules.
  • a fluorescent signal is typically detected which corresponds to incorporation of a labeled chain terminator nucleotide.
  • phasing and pre-phasing events can be detected and monitored using incorporation of a labeled chain terminator nucleotide.
  • the plurality of nucleotide reagents of step (f) comprise a plurality of multivalent molecules each comprising a core attached to a plurality of nucleotide-arms, wherein the nucleotide-arms are attached to a nucleotide unit.
  • individual multivalent molecules are labeled with a detectably reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the core of the multivalent molecule is labeled with a fluorophore, and wherein the fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • a fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • At least one of the nucleotide arms of the multivalent molecule comprises a linker and/or nucleotide base that is attached to a fluorophore, and wherein the fluorophore which is attached to a given nucleotide base corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • the nucleotide base e.g., adenine, guanine, cytosine, thymine or uracil
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a first sequencing polymerase to form a complexed polymerase, (2) contacting the complexed polymerase with a detectably labeled multivalent molecule under a condition suitable for binding a complementary nucleotide unit of the multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex, and the condition is suitable for inhibiting incorporation of the complementary nucleotide unit into the terminal end of the sequencing primer, (3) detecting and imaging the fluorescent signal and color emitted by the bound detectably labeled multivalent molecule, (4) removing the first sequencing polymerase and the bound detectably labeled multivalent molecule, and retaining the concatemer/sequencing primer duplex, (5) contacting the retained concatemer/sequencing primer duplex with a second sequencing polymerase and a non-labeled chain terminating nucleotide under a condition suitable for polymerase-cata
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • individual cycle times can be achieved in less than 30 minutes.
  • the field of view (FOV) can exceed 1 mm 2 and the cycle time for scanning large area (> 10 mm 2 ) can be less than 5 minutes.
  • steps (2) and (3) can be conducted at a gentle temperature of about 35 - 45 °C, or about 39 - 42 °C.
  • steps (2) and (3) can be conducted at a gentle temperature which can help retain the compact size and shape of a DNA nanoball during multiple sequencing cycles (e.g., up to 30 cycles) which can improve FWHM (full width half maximum) of a spot image of the DNA nanoball inside a cellular sample.
  • the DNA nanoball does not unravel during multiple sequencing cycles.
  • the spot image of the DNA nanoball does not enlarge during multiple sequencing cycles.
  • the spot image of the DNA nanoball remains a discrete spot during multiple sequencing cycles.
  • the spot image can be represented as a Gaussian spot and the size can be measured as a FWHM.
  • a smaller spot size as indicated by a smaller FWHM typically correlates with an improved image of the spot.
  • the FWHM of a nanoball spot can be about 10 um or smaller.
  • out-of-sync phasing and/or pre-phasing events can occur during synchronized polymerase-catalyzed sequencing reactions employing detectably labeled multivalent molecules.
  • a fluorescent signal can be detected which corresponds to binding of complementary nucleotide unit of a multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex.
  • phasing and pre-phasing events can be detected and monitored using binding of labeled multivalent molecules.
  • the phasing and/or prephasing rate when conducting up to 30 sequencing cycles with detectably labeled multivalent molecules, can be less than about 5%, or less than about 1%, or less than about 0.01%, or less than about 0.001%.
  • the phasing and/or pre-phasing rates for conducting up to 30 sequencing cycles using labeled chain terminator nucleotides can be about 5%.
  • the present disclosure provides methods for conducting in situ multiplex and multi-omics detection and identification using coded padlocks probes.
  • the padlock probes are designed to selectively detect target RNA.
  • the RNA-specific padlock probes selectively hybridize to cDNA that corresponds to target RNA.
  • the RNA-specific probes carry barcodes that uniquely identify the cDNA.
  • the RNA-specific padlock probes also carry batch-specific sequencing primer binding sites.
  • Both types of padlock probes are used to generate concatemers which having multiple copies of batch-specific sequencing binding sites and barcodes.
  • the concatemers can collapse into DNA nanoballs having compact shape and size that produce increased signal intensity and color differentiation during sequencing.
  • the limit of optical resolution impedes the ability to perform highly multiplex sequencing.
  • the batch-specific sequencing primer binding sites on the padlock probes enables sequencing a desired subset (e.g., a batch) of the concatemers using selected batch-specific sequencing primers to reduce over-crowding signals and images.
  • the use of batch-specific sequencing primers produces optical images that are intense and resolvable. By conducting multiple rounds of sequencing on the same cellular sample using different batch-specific sequencing primers enables multiplex sequencing to reveal numerous target RNAs.
  • the batch-specific sequencing methods described herein have many uses. For example, the number of spots that are imaged and associated with sequencing can be counted. The counted spots can be used as a measure of RNA levels in a cellular sample.
  • the present disclosure provides methods for detecting in situ at least two different target RNA molecules, comprising step (a): providing a cellular sample deposited on a solid support, wherein the cellular sample harbors (i) a first plurality of DNA amplicons (e.g., first concatemers) that correspond to a first target cDNA or RNA molecule, and (ii) a second plurality of DNA amplicons (e.g., second concatemers) that correspond to a second target cDNA or RNA molecule.
  • a first plurality of DNA amplicons e.g., first concatemers
  • a second plurality of DNA amplicons e.g., second concatemers
  • the method further comprises step (b): sequencing the first plurality of DNA amplicons inside the cellular sample under a condition that inhibits sequencing the second plurality of DNA amplicons, wherein sequencing the first plurality of DNA amplicons inside the cellular sample comprises generating a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample.
  • the first amplicons can be reiteratively sequenced by conducting no more than 2-30 sequencing cycles, or can be reiteratively sequenced by conducting 1-250 sequencing cycles.
  • the method further comprises step (c): sequencing the second plurality of DNA amplicons inside the cellular sample under a condition that inhibits sequencing the first plurality of DNA amplicons, wherein sequencing the second plurality of DNA amplicons inside the cellular sample comprises generating a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample.
  • the second amplicons can be reiteratively sequenced by conducting no more than 2-30 sequencing cycles, or can be reiteratively sequenced by conducting 1-250 sequencing cycles.
  • the present disclosure provides methods for detecting in situ at least two different target RNA molecules, comprising step (a): providing a cellular sample deposited on a solid support, wherein the cellular sample harbors a first plurality of target RNA and a second plurality of target RNA.
  • the first plurality of target RNA encode a first polypeptide.
  • the second plurality of target RNA encode a second polypeptide.
  • the cellular sample is fixed and permeabilized.
  • the cellular sample harbors 2-25 different target RNA molecules, or harbors 25-50 different target RNA molecules, or harbors 50-75 different target RNA molecules, or harbors 75-100 different target RNA molecules. In some embodiments, the cellular sample harbors more than 100 different target RNA molecules, or more than 250 different target RNA molecules, or more than 500 different target molecules, or more than 1000 different target RNA molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target RNA molecules. In some embodiments, the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor.
  • the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, or an FFPE cellular sample.
  • the cellular sample is deposited onto a solid support.
  • the cellular sample is deposited onto a solid support which is passivated with a coating that promotes cell adhesion.
  • the cellular sample is deposited on a support that lacks immobilized capture oligonucleotides.
  • the cellular sample is cultured prior to conducting step (b) which is described below.
  • the cellular sample harbors 2-25 different target polypeptide molecules, or harbors 25-50 different target polypeptide molecules, or harbors 50-75 different target polypeptide molecules, or harbors 75-100 different target polypeptide molecules. In some embodiments, the cellular sample harbors more than 100 different target polypeptide molecules, or more than 250 different target polypeptide molecules, or more than 500 different target molecules, or more than 1000 different target polypeptide molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target polypeptide molecules.
  • the target polypeptide molecules are encoded by the target RNA molecules.
  • the methods comprise step (b): generating inside the cellular sample a plurality of cDNA by (i) generating at least a first plurality of target cDNA from the first plurality of target RNA, and (ii) generating at least a second plurality of target cDNA from the second plurality of target RNA (e.g., FIG. 13).
  • the first target cDNAs correspond to the first target RNA molecules.
  • the second target cDNAs correspond to the second target RNA molecules.
  • the method comprises generating at least 2-10,000 different target cDNA molecules that correspond to 2-10,000 different target RNA molecules.
  • the generating of step (b) comprises contacting the plurality of RNA inside the cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample.
  • a plurality of reverse transcription primers e.g., a plurality of reverse transcriptase enzymes
  • a plurality of nucleotides e.g., a plurality of nucleotides
  • the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA, and/or comprises a second sub-population of target-specific reverse transcription primers that hybridize selectively to the second target RNA. In some embodiments, the plurality of reverse transcription primers comprises a first sub-population of random-sequence reverse transcription primers that hybridize to the first target RNA, and/or comprises a second sub-population of random-sequence reverse transcription primers that hybridize to the second target RNA.
  • the first padlock probe comprises (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA, (ii) a first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • target BC-1 a first batch-specific sequencing primer binding site
  • Batch Seq-1 or a complementary sequence thereof
  • a universal binding site for an amplification primer universal RCA
  • a compaction oligonucleotide or a complementary sequence thereof
  • the second padlock probe comprises (i) a second target barcode sequence (target BC-2) that uniquely identifies the second target RNA, (ii) a second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • the methods comprise step (c): generating inside the cellular sample a plurality of DNA concatemers which correspond to the first and second plurality of target RNA molecules, comprising: (1) generating a first plurality of covalently closed circular padlock probes by contacting the first plurality of target cDNA with a first plurality of padlock probes, wherein the contacting is conducted under a condition suitable for hybridizing the first and second binding arms of the first padlock probes to proximal positions on their respective first target cDNA molecules to form a first plurality of circularized padlock probes each having a nick or gap between the hybridized first and second binding arms, wherein the first padlock probes include a (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA or cDNA, (ii) a first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), and (iii) a universal binding site for an amplification primer
  • target BC-1 a first
  • the rolling circle amplification reaction can be conducted in the presence or absence of a plurality of compaction oligonucleotides.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes.
  • the first padlock probe further comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the closing the nick in the first circularized padlock probes comprises conducting an enzymatic ligation reaction.
  • closing the gap in the first circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the first target cDNA molecule as a template, and conducting an enzymatic ligation reaction.
  • the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample.
  • each concatemer molecule in the first plurality comprises tandem repeat units, wherein a unit comprises the sequence of the first target cDNA and (i) the first target barcode sequence (target BC-1) that uniquely identifies the first target RNA, (ii) the first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), and (iii) the universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof).
  • the unit further comprises the universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • step (c) further comprises: generating inside the cellular sample a plurality of DNA concatemers which correspond to the second plurality of target RNA molecules, comprising: (1) generating a second plurality of covalently closed circular padlock probes by contacting the second plurality of target cDNA with a second plurality of padlock probes, wherein the contacting is conducted under a condition suitable for hybridizing the first and second binding arms of the second padlock probes to proximal positions on their respective second target cDNA molecules to form a second plurality of circularized padlock probes each having a nick or gap between the hybridized first and second binding arms, wherein the second padlock probes include a (i) a second barcode sequence (target BC-2) that uniquely identifies the second target cDNA or RNA, (ii) a second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof) wherein the sequence of the second batch-specific sequencing primer binding site differs from the sequence of the
  • the rolling circle amplification reaction can be conducted in the presence or absence of a plurality of compaction oligonucleotides.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes.
  • the second padlock probe further comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the closing the nick in the second circularized padlock probes comprises conducting an enzymatic ligation reaction.
  • closing the gap in the second circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the second target cDNA molecule as a template, and conducting an enzymatic ligation reaction.
  • the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample.
  • each concatemer molecule in the second plurality comprises tandem repeat units, wherein a unit comprises the sequence of the second target cDNA and (i) the second target barcode sequence (target BC-2) that uniquely identifies the second target cDNA or RNA, (ii) the second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof), and (iii) the universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof).
  • the unit further comprises the universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the methods further comprise step (d): sequencing the first plurality of concatemer molecules inside the cellular sample under a condition that inhibits sequencing the second plurality of concatemers (e.g., FIG. 14).
  • step (d) comprises sequencing the first plurality of concatemers inside the cellular sample comprises conducting no more than 2-30 sequencing cycles to generate a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample.
  • step (d) comprises sequencing the first plurality of concatemers inside the cellular sample comprises conducting 1-250 sequencing cycles to generate a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample.
  • the first and second concatemers are subjected to a first sequencing workflow using first batch-specific sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents.
  • the first concatemers undergo reiterative sequencing but the second concatemers do not.
  • the first and second concatemers are subjected to a second sequencing workflow using second batch-specific sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents.
  • the second concatemers undergo reiterative sequencing but the first concatemers do not.
  • step (d) in the first concatemer molecules, only the first target barcode region (target BC-1) is sequenced. In some embodiments, in the first concatemer molecules, at least a portion or the full length of the first target barcode (target BC-1) is sequenced. In some embodiments, in the first concatemer molecules, the first target barcode (target BC-1) is sequenced and a portion of the first cDNA region is sequenced.
  • the sequencing the first concatemers of step (d) comprises step (1) contacting the first plurality of concatemer molecules inside the cellular sample with (i) a plurality of first batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of first batch-specific sequencing primers to their respective first batch-specific sequencing primer binding sites on the first concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the first concatemers as template molecules.
  • the sequencing of step (d) comprises sequencing at least a portion of the first nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm 2 .
  • FOV field-of-view
  • the plurality of first sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of first sequencing read products from the images obtained during the no more than 2-30 sequencing cycles, or from the images obtained during the 1-250 sequence cycles.
  • the methods further comprise step (e): removing the plurality of first sequencing read products from the first concatemer molecules and retaining the first concatemer molecules inside the cellular sample.
  • a 3’ blocking moiety can be added to the first sequencing read products to inhibit further sequencing reactions.
  • a nucleotide analog can be incorporated where the nucleotide analog inhibits incorporation of a subsequent nucleotide.
  • Exemplary blocking nucleotide analogs include dideoxynucleotide or a nucleotide having a 2’ or 3’ chain terminating moiety.
  • the methods further comprise step (f): reiteratively sequencing the plurality of first concatemers by repeating steps (d) and (e) at least once. In some embodiments, reiterative sequencing of step (f) is optional.
  • the sequencing the first concatemers of step (f) comprises step (1) contacting the first plurality of concatemer molecules inside the cellular sample with (i) a plurality of first batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of first batch-specific sequencing primers to their respective first batch-specific sequencing primer binding sites on the first concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the first concatemers as template molecules.
  • the sequencing further comprises step (3) removing the first plurality of sequencing read products from the first concatemers and retaining the plurality of first concatemers inside the cellular sample.
  • the sequencing further comprises step (4) repeating steps (1) - (3) at least once (e.g., FIG. 14).
  • step (4) comprises repeating steps (1) - (3) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times.
  • step (4) comprises repeating steps (1) - (3) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • the reiterative sequencing of the first concatemers of step (f) can be conducting using a sequencing-by-binding procedure, labeled and/or nonlabeled chain-terminating nucleotides, or multivalent molecules. Descriptions of these three sequencing methods is described below.
  • the plurality of universal sequencing primers can be hybridized to concatemer template molecules with a hybridization reagent comprising an SSC buffer (e.g., 2X saline-sodium citrate) buffer with formamide (e.g., 10-20% formamide).
  • SSC buffer e.g., 2X saline-sodium citrate
  • formamide e.g., 10-20% formamide.
  • the hybridization conditions comprise a temperature of about 20-30 °C, for about 10-60 minutes.
  • the plurality of sequencing read products can be removed from the concatemers and the plurality of concatemers can be retained inside the cellular sample using a de-hybridization reagent comprising an SSC buffer (e.g., saline-sodium citrate) buffer, with or without formamide, at a temperature that promotes nucleic acid denaturation such as for example 30 - 90 °C.
  • SSC buffer e.g., saline-sodium citrate
  • the methods further comprise step (g): sequencing the second plurality of concatemer molecules inside the cellular sample under a condition that inhibits sequencing the first plurality of concatemers (e.g., FIG. 14).
  • step (g) comprises sequencing the second plurality of concatemers inside the cellular sample comprises conducting no more than 2-30 sequencing cycles to generate a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample.
  • step (g) comprises sequencing the second plurality of concatemers inside the cellular sample comprises conducting 1-250 sequencing cycles to generate a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample.
  • step (g) in the second concatemer molecules, only the second target barcode region (target BC-2) is sequenced. In some embodiments, in the second concatemer molecules, at least a portion or the full length of the second target barcode (target BC-2) is sequenced. In some embodiments, in the second concatemer molecules, the second target barcode (target BC-2) is sequenced and a portion of the second cDNA region is sequenced.
  • the sequencing the second concatemers of step (g) comprises step (1) contacting the second plurality of concatemer molecules inside the cellular sample with (i) a plurality of second batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of second batch-specific sequencing primers to their respective second batch-specific sequencing primer binding sites on the second concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a second plurality of sequencing read products using the second concatemers as template molecules.
  • the sequencing of step (g) comprises sequencing at least a portion of the second nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm 2 .
  • FOV field-of-view
  • the plurality of second sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of second sequencing read products from the images obtained during the no more than 2-30 sequencing cycles, or from the images obtained during the 1-250 sequencing cycles.
  • the methods further comprise step (h): removing the plurality of second sequencing read products from the second concatemer molecules and retaining the second concatemer molecules inside the cellular sample.
  • a 3’ blocking moiety can be added to the second sequencing read products to inhibit further sequencing reactions.
  • a nucleotide analog can be incorporated where the nucleotide analog inhibits incorporation of a subsequent nucleotide.
  • Exemplary blocking nucleotide analogs include dideoxynucleotide or a nucleotide having a 2’ or 3’ chain terminating moiety.
  • the methods further comprise step (i): reiteratively sequencing the plurality of second concatemers by repeating steps (g) and (h) at least once. In some embodiments, reiterative sequencing of step (i) is optional.
  • the sequencing the second concatemers of step (i) comprises step (1) contacting the second plurality of concatemer molecules inside the cellular sample with (i) a plurality of second batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of second batch-specific sequencing primers to their respective second batch-specific sequencing primer binding sites on the second concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the second concatemers as template molecules.
  • the sequencing further comprises step (3) removing the first plurality of sequencing read products from the second concatemers and retaining the plurality of second concatemers inside the cellular sample.
  • the sequencing further comprises step (4) repeating steps (1) - (3) at least once (e.g., FIG. 14).
  • step (4) comprises repeating steps (1) - (3) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times.
  • step (4) comprises repeating steps (1) - (3) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • the reiterative sequencing of the second concatemers of step (i) can be conducting using a sequencing-by-binding procedure, labeled and/or nonlabeled chain-terminating nucleotides, or multivalent molecules. Descriptions of these three sequencing methods is described below.
  • the plurality of nucleotide reagents of steps (d) and (g) comprise a plurality of nucleotides that are detectably labeled or non-labeled.
  • individual nucleotides are linked to a detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the plurality of detectably labeled nucleotide analogs comprise a plurality of chain terminating nucleotides, where the chain terminating moiety is linked to the 3’ nucleotide sugar position to form a 3’ blocked nucleotide analog.
  • the chain terminating moiety can be removed to convert the 3’ blocked nucleotide analog to an extendible nucleotide having a 3’ OH group on the sugar.
  • the labeled nucleotide analogs are linked to a different fluorophore that corresponds to the nucleo-bases adenine, cytosine, guanine, thymine or uracil, where the different fluorophores emit a fluorescent signal.
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a sequencing polymerase and a detectably labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the detectably labeled chain terminating nucleotide into the terminal end of the sequencing primer, (2) detecting and imaging the fluorescent signal and color emitted by the incorporated chain terminating nucleotide, and (3) removing the chain terminating moiety (e.g., unblocking) and the fluorophore from the incorporated nucleotide and retaining the concatemer/sequencing primer duplex.
  • chain terminating moiety e.g., unblocking
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • the sequencing when the sequencing is conducted using different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U), then the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U)
  • the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • out-of-sync phasing and/or pre-phasing events can occur during synchronized sequencing reactions on clonally amplified template amplicons, where the sequencing reactions comprise polymerase-catalyzed sequencing reactions employing detectably labeled chain terminator nucleotides.
  • a sequencing reaction on one template molecule in the clonally-amplified template molecules moves ahead (e.g., pre-phasing) or fall behind (e.g., phasing) of the sequencing of the other template molecules within the clonally-amplified template molecules.
  • a fluorescent signal is typically detected which corresponds to incorporation of a labeled chain terminator nucleotide.
  • phasing and pre-phasing events can be detected and monitored using incorporation of a labeled chain terminator nucleotide.
  • the plurality of nucleotide reagents of steps (d) and (g) comprise a plurality of multivalent molecules each comprising a core attached to a plurality of nucleotide-arms, wherein the nucleotide-arms are attached to a nucleotide unit.
  • individual multivalent molecules are labeled with a detectably reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the core of the multivalent molecule is labeled with a fluorophore, and wherein the fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • a fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • At least one of the nucleotide arms of the multivalent molecule comprises a linker and/or nucleotide base that is attached to a fluorophore, and wherein the fluorophore which is attached to a given nucleotide base corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • the nucleotide base e.g., adenine, guanine, cytosine, thymine or uracil
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a first sequencing polymerase to form a complexed polymerase, (2) contacting the complexed polymerase with a detectably labeled multivalent molecule under a condition suitable for binding a complementary nucleotide unit of the multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex, and the condition is suitable for inhibiting incorporation of the complementary nucleotide unit into the terminal end of the sequencing primer, (3) detecting and imaging the fluorescent signal and color emitted by the bound detectably labeled multivalent molecule, (4) removing the first sequencing polymerase and the bound detectably labeled multivalent molecule, and retaining the concatemer/sequencing primer duplex, (5) contacting the retained concatemer/sequencing primer duplex with a second sequencing polymerase and a non-labeled chain terminating nucleotide under a condition suitable for polymerase-cata
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are colocated in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • individual cycle times can be achieved in less than 30 minutes.
  • the field of view (FOV) can exceed 1 mm 2 and the cycle time for scanning large area (> 10 mm 2 ) can be less than 5 minutes.
  • the plurality of RNA or cDNA inside the cellular sample can be amplified to generate amplicons of the RNA or cDNA where the amplicons comprise concatemers.
  • the plurality of RNA or cDNA molecules inside the cellular sample can be amplified by conducting a padlock probe circularization and rolling circle amplification workflow.
  • the methods comprise contacting the plurality of RNA or cDNA molecules inside the cellular sample with a plurality of padlock probes, including a first plurality of target-specific padlock probes that hybridize with first target RNA or cDNA molecules, and a second plurality of target-specific padlock probes that hybridize with second target RNA or cDNA molecules.
  • the padlock probes comprise single-stranded oligonucleotides.
  • the padlock probes comprise DNA, RNA, or DNA and RNA.
  • individual padlock probes comprise an internal region between the first and second terminal regions, where the internal region comprises at least one universal adaptor sequence including a sample barcode sequence, an amplification primer binding site, a sequencing primer binding site, a compaction oligonucleotide binding site and/or a surface capture primer binding site (FIG. 6).
  • the padlock probes comprise at least one target barcode sequence that corresponds to a given target RNA or target cDNA to which the padlock probes binds.
  • the padlock probes comprise at least one unique identification sequence (e.g., unique molecular index (UMI)).
  • the padlock probes comprise at least one restriction enzyme recognition sequence.
  • a padlock probe comprises a single-stranded nucleic acid molecule having two terminal regions (e.g., first and second binding arms) and an internal region.
  • the first terminal region of an individual padlock probe has a first target-specific sequence that selectively hybridizes to a first region of a target RNA or target cDNA molecule
  • the second terminal region of the individual padlock probe has a second target-specific sequence that selectively hybridizes to a second region of the same target RNA or target cDNA molecule.
  • the internal region of a padlock comprises a target barcode sequence (e.g., Target BC-1 or Target BC-2, left and right schematics respectively) which corresponds to a given target RNA or target cDNA.
  • the target barcode sequence uniquely identifies the target RNA or target cDNA.
  • the internal region of a padlock comprises a universal primer binding site for a sequencing primer (or a complementary sequence thereof).
  • the internal region of a padlock comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof).
  • the internal region of a padlock comprises a universal binding site for a compaction oligonucleotide binding (or a complementary sequence thereof).
  • the internal region of a padlock probe includes a target barcode sequence and at least one universal primer binding site (e.g., for binding a sequencing primer, for binding a rolling circle amplification primer and/or for binding a compaction oligonucleotide) in any arrangement and orientation (FIG. 6, top and bottom).
  • a target barcode sequence e.g., for binding a sequencing primer, for binding a rolling circle amplification primer and/or for binding a compaction oligonucleotide
  • individual padlock probes comprise first and second terminal regions (e.g., first and second binding arms) that hybridize to portions of target RNA or target cDNA molecules to form a plurality of RNA-padlock probe complexes or a plurality of cDNA-padlock probe complexes, wherein individual complexes have the first and second terminal probe regions hybridized to proximal regions of an RNA or cDNA molecule to form a nick or gap between the first and second terminal probe ends.
  • first and second terminal regions e.g., first and second binding arms
  • the first terminal region of an individual padlock probe has a first target-specific sequence that selectively hybridizes to a first region of a target RNA or cDNA molecule
  • the second terminal region of the individual padlock probe has a second target-specific sequence that selectively hybridizes to a second region of the same target RNA or cDNA molecule, where a nick or gap is formed between the hybridized first and second terminal regions, thereby circularizing the padlock probe (e.g., FIG. 7).
  • the first padlock probe comprises (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA or the first target cDNA, (ii) a first sequencing primer binding site (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • target BC-1 a first target barcode sequence
  • target BC-1 a first sequencing primer binding site
  • a universal binding site for an amplification primer universal RCA
  • a compaction oligonucleotide or a complementary sequence thereof
  • the second padlock probe comprises (i) a second target barcode sequence (target BC-2) that uniquely identifies the second target RNA or the second target cDNA, (ii) a second sequencing primer binding site(or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • target BC-2 a second target barcode sequence
  • the padlock probes comprise canonical nucleotides and/or nucleotide analogs.
  • the padlock probes are modified to confer resistance to nuclease degradation (e.g., ribonuclease degradation).
  • the padlock probes comprise at least one phosphorothioate diester bond at their 5’ ends which can render the padlock probes resistant to nuclease degradation.
  • the padlock probes comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends.
  • the padlock probes comprise at least one ribonucleotide and/or at least one 2’-O-methyl, 2’-O-methoxyethyl (MOE), 2’ fluoro-base nucleotide.
  • the padlock probes comprise phosphorylated 3’ ends.
  • the padlock probes comprise at least one locked nucleic acid (LNA) base.
  • the padlock probes comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
  • individual padlock probes in a set of padlock probes comprise first and second terminal regions that hybridize to the same target regions of the target RNA or cDNA molecules to form a plurality of RNA-padlock probe complexes or a plurality of cDNA-padlock probe complexes having the same RNA or cDNA sequence.
  • a set of padlock probes (e.g., a plurality of padlock probes) comprise at least two sub-sets of padlock probes.
  • individual padlock probes in a first sub-set of padlock probes comprise first and second terminal regions that hybridize to the same target regions (e.g., a first target region) of the target RNA or cDNA molecules to form a first plurality of RNA-padlock probe complexes or a first plurality of cDNA-padlock probe complexes having the same RNA or cDNA sequence.
  • individual padlock probes in a second sub-set of padlock probes comprise first and second terminal regions that hybridize to the same target regions (e.g., a second target region) of the target RNA or cDNA molecules to form a second plurality of RNA-padlock probe complexes or a second plurality of cDNA- padlock probe complexes having the same cDNA sequence.
  • the first and second sub-sets of padlock probes hybridize to different target regions of the same target RNA or cDNA molecules.
  • the first and second subsets of padlock probes hybridize to different target regions of different target RNA or cDNA molecules.
  • the set of padlock probes comprise 2-10 subsets of padlock probes, or 10-25 sub-sets of padlock probes, or 25-50 sub-sets of padlock probes, or up to 100 sub-sets of padlock probes. In some embodiments, the set of padlock probes comprise at least 100 sub-sets of padlock probes, at least 500 sub-sets of padlock probes, at least 1000 sub-sets of padlock probes, at least 10,000 sub-sets of padlock probes, or more sub-sets of padlock probes.
  • the nicks can be enzymatically ligated to generate covalently closed circular padlock probes.
  • the ligase enzyme can discriminate between matched and mis-matched hybridized ends to ensure target-specific hybridization.
  • the ligation reaction comprises use of a ligase enzyme, including a T3, T4, T7 or Taq DNA ligase enzyme.
  • the size of the gap between the hybridized first and second terminal regions is 1-25 bases.
  • the 3 ’OH end of hybridized padlock probe can serve as an initiation site for a polymerase-catalyzed fill-in reaction (e.g., gap fill-in reaction) using the target cDNA molecule (or the target RNA molecule) as a template. After the fill-in reaction, the remaining nick can be enzymatically ligated to generate covalently closed circular padlock probes.
  • the gap-filling reaction comprises contacting the circularized padlock probe with a DNA polymerase and a plurality of nucleotides.
  • the DNA polymerase comprises E. coli DNA polymerase I, Klenow fragment of E. coli DNA polymerase I, T7 DNA polymerase, or T4 DNA polymerase.
  • the ligase enzyme can discriminate between matched and mismatched hybridized ends to ensure target-specific hybridization.
  • the ligation reaction comprises use of a ligase enzyme, including a T3, T4, T7 or Taq DNA ligase enzyme.
  • the plurality of covalently closed circular padlock probes can be subjected to a rolling circle amplification reaction to generate a plurality of concatemer molecules each having two or more tandem copies of a unit wherein the unit comprises a target sequence that corresponds to a target RNA molecules and any additional sequence(s) carried by the padlock probes including universal adaptor sequence(s), unique molecular index sequence(s) and/or restriction enzyme recognition sequence(s).
  • the rolling circle amplification reaction comprises contacting the covalently closed circularized padlock probes with an amplification primer (e.g., a universal rolling circle amplification primer), a strand-displacing DNA polymerase, and a plurality of nucleotides, under a condition suitable for hybridizing individual amplification primers to a covalently closed padlock probe, and under a condition suitable for conducting primer extension using the covalently closed padlock probe as a template molecule to generate a nucleic acid concatemer.
  • an amplification primer e.g., a universal rolling circle amplification primer
  • a strand-displacing DNA polymerase e.g., a strand-displacing DNA polymerase
  • the plurality of nucleotides in the rolling circle amplification reaction comprise any mixture of two or more of dATP, dGTP, dCTP, dTTP and/or dUTP.
  • any of the rolling circle amplification reactions described herein can be conducted in the presence or in the absence of a plurality of compaction oligonucleotides.
  • the resulting concatemer when the rolling circle amplification reaction includes a plurality of nucleotide which includes dUTP, the resulting concatemer can be cross-linked to a cross-linking reactive group by treating the cellular sample with a succinimide ester (NHS), maleimide (Sulfo-SMCC), imidoester (DMP), carbodiimide (DCC, EDC) or phenyl azide.
  • NHS succinimide ester
  • DMP imidoester
  • DCC carbodiimide
  • EDC carbodiimide
  • polymerization of the cross-linking reactive group can be initiated with light or UV light.
  • the resulting concatemer can be cross-linked to a matrix by treating the cellular sample with a cross-linked agarose, cross-linked dextran or cross-linked polyethylene glycol (PEG), polyacrylamide, cellulose alginate or polyamide.
  • PEG polyethylene glycol
  • the PEG comprises a sulfo-NHS ester moiety at one or both ends, for example a PEGylated bis(sulfosuccinimidyl)suberate) (e.g., BS(PEG)9 from Thermo Fisher Scientific, catalog No. 21582).
  • the rolling circle amplification reaction can be conducted at a constant temperature (e.g., isothermal) wherein the constant temperature is at room temperature to about 30 °C, or about 30 - 40 °C, or about 40 - 50 °C, or about 50 - 65 °C.
  • a constant temperature e.g., isothermal
  • the DNA polymerase having a strand displacing activity can be selected from a group consisting of phi29 DNA polymerase, large fragment of Bst DNA polymerase, large fragment of Bsu DNA polymerase, and Bea (exo-) DNA polymerase, Klenow fragment of E. coli DNA polymerase, T5 polymerase, M-MuLV reverse transcriptase, HIV viral reverse transcriptase, or Deep Vent DNA polymerase.
  • the phi29 DNA polymerase can be wild type phi29 DNA polymerase (e.g., MagniPhi from Expedeon), or variant EquiPhi29 DNA polymerase (e.g., from Thermo Fisher Scientific), and chimeric QualiPhi DNA polymerase (e.g., from 4basebio).
  • wild type phi29 DNA polymerase e.g., MagniPhi from Expedeon
  • EquiPhi29 DNA polymerase e.g., from Thermo Fisher Scientific
  • chimeric QualiPhi DNA polymerase e.g., from 4basebio
  • the rolling circle amplification primers can be modified to increase resistance to nuclease degradation.
  • the rolling circle amplification primers comprise at least one phosphorothioate diester bond at their 5’ ends which can render the amplification primers resistant to exonuclease degradation.
  • the rolling circle amplification primers comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends.
  • the rolling circle amplification primers comprise at least one ribonucleotide and/or at least one 2’-O- methyl or 2’-O-methoxyethyl (MOE) nucleotide.
  • the rolling circle amplification reaction can be conducted in the presence of a plurality of compaction oligonucleotides which, when hybridized to a concatemer molecule, compacts the size and/or shape of the concatemer to form a compact nanoball.
  • the compaction oligonucleotides comprise single stranded oligonucleotides having a first region at one end that hybridizes to a portion of a concatemer molecule and a second region at the other end that hybridizes to another portion of the same concatemer molecule, where hybridization of the compaction oligonucleotide to a given concatemer compacts the size and/or shape of the concatemer.
  • the compaction oligonucleotides include a 5’ region, an optional internal region (intervening region), and a 3’ region.
  • the 5’ and 3’ regions of the compaction oligonucleotide can hybridize to any portions of the concatemer.
  • the 5’ and 3’ regions of the compaction oligonucleotide can hybridize to different portions of the concatemer to pull together distal portions of the concatemer causing compaction of the concatemer to form a DNA nanoball.
  • the 5’ region of the compaction oligonucleotide is designed to hybridize to a first portion of the concatemer molecule (e.g., a universal compaction oligonucleotide binding site), and the 3’ region of the compaction oligonucleotide is designed to hybridized to a second portion of the concatemer molecule (e.g., a universal compaction oligonucleotide binding site).
  • Inclusion of compaction oligonucleotides during RCA can promote formation of DNA nanoballs having tighter size and shape compared to concatemers generated in the absence of the compaction oligonucleotides.
  • the compact and stable characteristics of the DNA nanoballs improves in situ sequencing accuracy by increasing signal intensity and the nanoballs retain their shape and size during multiple sequencing cycles.
  • the compaction oligonucleotides comprise single stranded oligonucleotides comprising DNA, RNA, or a combination of DNA and RNA.
  • the compaction oligonucleotides can be any length, including 20-150 nucleotides, or 30-100 nucleotides, or 40-80 nucleotides in length.
  • the compaction oligonucleotides comprises a 5’ region and a 3’ region, and optionally an intervening region between the 5’ and 3’ regions.
  • the intervening region can be any length, for example about 2-20 nucleotides in length.
  • the intervening region comprises a homopolymer having consecutive identical bases (e.g., AAA, GGG, CCC, TTT or UUU).
  • the intervening region comprises a non-homopolymer sequence.
  • the 5’ region of the compaction oligonucleotides can be wholly complementary or partially complementary along its length to a first portion of a concatemer molecule.
  • the 3’ region of the compaction oligonucleotides can be wholly complementary or partially complementary along its length to a second portion of a concatemer molecule.
  • the 5’ region of the compaction oligonucleotides can hybridize to a first universal sequence portion of a concatemer molecule.
  • the 3’ region of the compaction oligonucleotides can hybridize to a second universal sequence portion of a concatemer molecule.
  • the 5’ region of the compaction oligonucleotide can have the same sequence as the 3’ region.
  • the 5’ region of the compaction oligonucleotide can have a sequence that is different from the 3’ region.
  • the 3’ region of the compaction oligonucleotide can have a sequence that is a reverse sequence of the 5’ region.
  • the 5’ region of the compaction oligonucleotide can have a sequence that is a reverse sequence of the 3’ region.
  • the 3’ region of any of the compaction oligonucleotides can include an additional three bases at the terminal 3’ end which comprises 2’-O-methyl RNA bases (e.g., designated mUmUmU) or the terminal 3’ end lacks additional 2’-O- methyl RNA bases.
  • the compaction oligonucleotides comprise one or more modified bases or linkages at their 5’ or 3’ ends to confer certain functionalities. In some embodiments, the compaction oligonucleotides comprise at least one phosphorothioate linkages at their 5’ and/or 3’ ends to confer exonuclease resistance. In some embodiments, at least one nucleotide at or near the 3’ end comprises a 2’ fluoro base which confers exonuclease resistance. In some embodiments, the 3’ end of the compaction oligonucleotides comprise at least one 2’-O-methyl RNA base which blocks polymerase-catalyzed extension.
  • the 3’ end of the compaction oligonucleotide comprises three bases comprising 2’-O-methyl RNA base (e.g., designated mUmUmU).
  • the compaction oligonucleotides comprise a 3’ inverted dT at their 3’ ends which blocks polymerase-catalyzed extension.
  • the compaction oligonucleotides comprise 3’ phosphorylation which blocks polymerase-catalyzed extension.
  • the internal region of the compaction oligonucleotides comprise at least one locked nucleic acid (LNA) which increases the thermal stability of duplexes formed by hybridizing a compaction oligonucleotide to a concatemer molecule.
  • LNA locked nucleic acid
  • the compaction oligonucleotides comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
  • the compaction oligonucleotide comprises the sequence 5 ’ -C ATGT AATGC ACGT ACTTTC AGGGT AAAC ATGT AATGC ACGT ACTTT
  • the compaction oligonucleotides includes an additional three bases at the terminal 3’ end which comprises 2’-0-methyl RNA bases (e.g., designated mUmUmU) or the terminal 3’ end lacks additional 2’-O-methyl RNA bases.
  • the compaction oligonucleotides can include at least one region having consecutive guanines.
  • the compaction oligonucleotides can include at least one region having 2, 3, 4, 5, 6 or more consecutive guanines.
  • the compaction oligonucleotides comprise four consecutive guanines which can form a guanine tetrad structure (see FIG. 25).
  • the guanine tetrad structure can be stabilized via Hoogsteen hydrogen bonding.
  • the guanine tetrad structure can be stabilized by a central cation including potassium, sodium, lithium, rubidium or cesium.
  • At least one compaction oligonucleotide can form a guanine tetrad (FIG. 25) and hybridize to the universal binding sequences in a concatemer which can cause the concatemer to fold to form an intramolecular G-quadruplex structure (FIG. 26).
  • the concatemers can self-collapse to form compact nanoballs. Formation of the guanine tetrads and G-quadruplexes in the nanoballs may increase the stability of the nanoballs to retain their compact size and shape which can withstand changes in pH, temperature and/or repeated flows of reagents during sequencing inside the cellular sample.
  • the plurality of compaction oligonucleotides in the rolling circle amplification reaction have the same sequence.
  • the plurality of compaction oligonucleotides in the rolling circle amplification reaction comprise a mixture of two or more different populations of compaction oligonucleotides having different sequences.
  • the immobilized concatemer template molecule can selfcollapse into a compact nucleic acid nanoball.
  • the nanoballs can be imaged and a FWHM measurement can be obtained to give the shape/size of the nanoballs.
  • inclusion of compaction oligonucleotides in the rolling circle amplification reaction can promote collapsing of a concatemer into a DNA nanoball.
  • Conducting RCA with compaction oligonucleotides helps retain the compact size and shape of a DNA nanoball during multiple sequencing cycles which can improve FWHM (full width half maximum) of a spot image of the DNA nanoball inside a cellular sample.
  • the DNA nanoball does not unravel during multiple sequencing cycles.
  • the spot image of the DNA nanoball does not enlarge during multiple sequencing cycles.
  • the spot image of the DNA nanoball remains a discrete spot during multiple sequencing cycles.
  • the spot image can be represented as a Gaussian spot and the size can be measured as a FWHM.
  • a smaller spot size as indicated by a smaller FWHM typically correlates with an improved image of the spot.
  • the FWHM of a nanoball spot can be about 10 um or smaller.
  • each nanoball carries numerous tandem copies of a polynucleotide unit along their lengths, where the polynucleotide unit includes a sequence-of-interest (e.g., that corresponds to target RNA or target cDNA) and at least a universal sequencing primer binding site.
  • Each polynucleotide unit can bind a sequencing primer, a sequencing polymerase and a detectably-labeled nucleotide reagent (e.g., detectably labeled multivalent molecules), to form a detectable sequencing complex (e.g., a detectable ternary complex).
  • Each nanoball carries numerous detectable sequencing complexes.
  • the compact nature of the nanoballs increases the local concentration of detectably- labeled nucleotide reagents that are used during the sequencing workflow which increases the signal intensity emitted from a nanoball to give a discrete detectable signal which can be imaged as a fluorescent spot inside the cellular sample.
  • Each spot corresponds to a concatemer and each concatemer corresponds to a target RNA molecule in the cellular sample. Multiple spots can be detected and imaged simultaneously in the cellular sample.
  • the DNA nanoballs having compact shape and size that produce increased signal intensity and color differentiation during sequencing.
  • the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor.
  • the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, or an FFPE cellular sample.
  • the cellular sample comprise one or more living cells or non-living cells.
  • the cellular sample can be obtained from a virus, fungus, prokaryote or eukaryote. In some embodiments, the cellular sample can be obtained from an animal, insect or plant. In some embodiments, the cellular sample comprises one or more virally-infected cells.
  • the cellular sample can be obtained from any organism including human, simian, ape, canine, feline, bovine, equine, murine, porcine, caprine, lupine, ranine, piscine, plant, insect or bacteria.
  • the cellular sample can be obtained from any organ including head, neck, brain, breast, ovary, cervix, colon, rectum, endometrium, gallbladder, intestines, bladder, prostate, testicles, liver, lung, kidney, esophagus, pancreas, thyroid, pituitary, thymus, skin, heart, larynx, or other organs.
  • the cellular sample harbors a plurality of RNA which include target RNA and non-target RNA.
  • cells typically produce RNA by gene expression which includes transcription of DNA (e.g., genomic DNA) into RNA molecules.
  • the transcribed RNA can undergo splicing or may not be spliced.
  • the transcribed RNA can be translated into a polypeptide (e.g., coding RNA), or do not undergo translation but can be processed into tRNA or rRNA (e.g., noncoding RNA).
  • the plurality of RNA harbored by the cellular sample includes target and non-target RNA.
  • the plurality of RNA harbored by the cellular sample comprises wild type RNA, mutant RNA or splice variant RNA.
  • the plurality of RNA harbored by the cellular sample comprises pre-spliced RNA, partially spliced RNA, or fully spliced RNA.
  • the plurality of RNA harbored by the cellular sample comprises coding RNA, non-coding RNA, mRNA, tRNA, rRNA, microRNA (miRNA), mature microRNA, or immature microRNA.
  • the plurality of RNA harbored by the cellular sample comprises housekeeping RNA, cell-specific RNA, tissue-specific RNA or disease-specific RNA. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA expressed by one or more cells in response to a stimulus such as heat, light, a chemical or a drug. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA found in healthy cells or diseased cells. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA transcribed from transgenic DNA sequences that are introduced into the cellular sample using recombinant DNA procedures.
  • the RNA can be transcribed from a transgenic DNA sequence that is controlled by an inducible or constitutive promoter sequence.
  • the plurality of RNA harbored by the cellular sample comprises RNA that is transcribed from DNA sequences that are not transgenic.
  • the cellular sample can be cultured on the support.
  • the methods comprise culturing the cellular sample on the support under a condition suitable for expanding the cellular sample for 2-10 generations or more.
  • the cultured cellular sample can generate a colony of cells.
  • the methods comprise culturing the cellular sample to confluence or nonconfluence.
  • the methods comprise culturing the cellular sample on the support in a simple or complex cell culture media.
  • the cell culture media comprises D-MEM high glucose (e.g., from Thermo Fisher Scientific, catalog No.
  • fetal bovine serum e.g., 10% FBS; for example from Thermo Fisher Scientific, catalog No. A3160402
  • MEM non-essential amino acids e.g., 0.1 mM MEM, for example from Thermo Fisher Scientific, catalog No. 11140050
  • L-glutamine e.g., 6 mM L-glutamine, for example from Thermo Fisher Scientific, catalog No. A2916801
  • MEM sodium pyruvate e.g., 1 mM sodium pyruvate, for example from Thermo Fisher Scientific, catalog No.
  • the methods comprise culturing the cellular sample at a humidity and temperature that is suitable for culturing the cell(s) on the support.
  • exemplary suitable conditions comprise approximately 37 °C with a humidified atmosphere of approximately 5-10% carbon dioxide in air.
  • the cellular sample can be cultured with suitable aeration with oxygen and/or nitrogen.
  • simple cell media refers to a cell media that typically lacks ingredients to support cell growth and/or proliferation in culture.
  • Simple cell media can be used for example to wash, suspend, or dilute the cellular sample.
  • Simple cell media can be mixed with certain ingredients to prepare a cell media that can support cell growth and/or proliferation in culture.
  • a simple cell media comprises any one or any combination of two or more of a buffer, a phosphate compound, a sodium compound, a potassium compound, a calcium compound, a magnesium compound and/or glucose.
  • the simple cell media comprises PBS (phosphate buffered saline), DPBS (Dulbecco’s phosphate-buffered saline), HBSS (Hank’s balanced salt solution), DMEM (Dulbecco’s Modified Eagle’s Medium), EMEM (Eagle’s Minimum Essential Medium), and/or EBSS.
  • the cellular sample can be placed in a simple cell media prior to or during the step of conducting any of the nucleic acid methods described herein.
  • complex cell media refers to a cell media that can be used to support cell growth and/or proliferation in culture without supplementation or additives.
  • Complex cell media can include any combination of two or more of a buffering system (e.g., HEPES), inorganic salt(s), amino acid(s), protein(s), polypeptide(s), carbohydrate(s), fatty acid(s), lipid(s), purine(s) and their derivatives (e.g., hypoxanthine), pyrimidine(s) and their derivatives, and/or trace element(s).
  • a buffering system e.g., HEPES
  • inorganic salt(s) amino acid(s), protein(s), polypeptide(s), carbohydrate(s), fatty acid(s), lipid(s), purine(s) and their derivatives (e.g., hypoxanthine), pyrimidine(s) and their derivatives, and/or trace element(s).
  • Complex cell media includes fluids obtained from a fluid or tissue extract
  • complex cell media can be a serum-containing media, for example complex cell media includes fluids such as fetal bovine serum, blood plasma, blood serum, lymph fluid, human placental cord serum and amniotic fluid.
  • complex cell media can be a serum-free media, which are typically (but not necessarily) defined cell culture media.
  • complex cell media can be a chemically-defined media which typically (but not necessarily) include recombinant polypeptides, and ultra-pure inorganic and/or organic compounds.
  • complex cell media can be a protein- free media which include for example MEM (minimal essential media) and RPMI-1640 (Roswell Park Memorial Institute).
  • the complex cell media comprises IMDM (Iscove’s Modified Dulbecco’s Medium. In some embodiments, the complex cell media comprises DMEM (Dulbecco’s Modified Eagle’s Medium). In some embodiments, the cellular sample can be placed in a complex cell media prior to or during the step of conducting any of the nucleic acid methods described herein.
  • the cellular sample comprises a fixed cellular sample.
  • the cellular sample can be treated with a fixation reagent (e.g., a fixing reagent) that preserves the cell and its contents to inhibit degradation and can inhibit cell lysis.
  • a fixation reagent e.g., a fixing reagent
  • the fixation reagent can preserve RNA harbored by the cellular sample.
  • the fixation reagent inhibits loss of nucleic acids from the cellular sample.
  • the fixation reagent can cross-link the RNA to prevent the RNA from escaping the cellular sample.
  • a cross-linking fixation reagent comprises any combination of an aldehyde, formaldehyde, paraformaldehyde, formalin, glutaraldehyde, imidoesters, N-hydroxysuccinimide esters (NHS) and/or glyoxal (a bifunctional aldehyde).
  • the fixation reagent comprises at least one alcohol, including methanol or ethanol. In some embodiments, the fixation reagent comprises at least one ketone, including acetone. In some embodiments, the fixation reagent comprises acetic acid, glacial acetic acid and/or picric acid. In some embodiments, the fixation reagent comprises mercuric chloride. In some embodiments, the fixation reagent comprises a zinc salt comprising zinc sulphate or zinc chloride. In some embodiments, the fixation reagent can denature polypeptides.
  • the fixation reagent comprises 4% w/v of paraformaldehyde to water/PBS. In some embodiments, the fixation reagent comprises 10% of 35% formaldehyde at a neutral pH. In some embodiments, the fixation reagent comprises 2% v/v of glutaraldehyde to water/PBS. In some embodiments, the fixation reagent comprises 25% of 37% formaldehyde solution, 70% picric acid and 5% acetic acid.
  • the cellular sample can be fixed on the support with 4% paraformaldehyde for about 30-60 minutes and washed with PBS.
  • the cellular sample can be stained, de-stained or unstained.
  • the cellular sample comprises a permeabilized cellular sample.
  • the methods comprise treating the cellular sample with a permeabilization reagent that alters the cell membrane to permit penetration of experimental reagents into the cells.
  • the permeabilization reagent removes membrane lipids from the cell membrane.
  • the cellular sample can be treated with a permeabilization reagent which comprises any combination of an organic solvent, detergent, chemical compound, cross-linking agent and/or enzyme.
  • the organic solvents comprise acetone, ethanol, and methanol.
  • the detergents comprise saponin, Triton X-100, Tween-20, sodium dodecyl sulfate (SDS), an N-lauroylsarcosine sodium salt solution, or a nonionic polyoxyethylene surfactant (e.g., NP40).
  • the crosslinking agent comprises paraformaldehyde.
  • the enzyme comprises trypsin, pepsin or protease (e.g. proteinase K).
  • the cells can be permeabilized using an alkaline condition, or an acidic condition with a protease enzyme.
  • the permeabilization reagent comprises water and/or PBS.
  • the fixed cells can be permeabilized with 70% ethanol for about 30- 60 minutes, and the permeabilizing reagent can be exchanged with PBS-T (e.g., PBS with 0.05% Tween-20).
  • PBS-T e.g., PBS with 0.05% Tween-20
  • the cells can be post-fixed with 3% paraformaldehyde and 0.1% glutaraldehyde for about 30-60 minutes, and washed with PBS-T multiple times.
  • the cellular sample is infused with a swellable polyelectrolyte hydrogel (U.S. patent No. 10,309,879 and Chen 2015 Science 347:543, the contents of these documents are incorporated by reference in their entireties).
  • a fixed and permeabilized cellular sample can be infused with sodium acrylate, acrylamide and a cross-linker N-N’- methylenebisacrylamide.
  • ammonium persulfate (APS) initiator and tetramethylethylenediamine (TEMED) accelerator were infused to achieve polymerization.
  • the cellular sample can be infused with proteinase K for proteolysis and incubated in a digestion buffer.
  • the gel inside the cellular sample can be swelled by addition of water.
  • the plurality of RNAs inside cellular sample can be converted to cDNA.
  • the methods comprise contacting the plurality of RNA inside the fixed and permeabilized cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample.
  • synthesis of second strand cDNA molecules is omitted.
  • the RNA inside the cellular sample is not converted into cDNA, where the RNA is hybridized to targetspecific padlock probes.
  • the reverse transcriptase enzyme exhibits RNA-dependent DNA polymerase activity.
  • the reverse transcriptase enzyme comprises a reverse transcriptase enzyme from AMV (avian myeloblastosis virus), M- MuLV (moloney murine leukemia virus), or HIV (human immunodeficiency virus).
  • the reverse transcriptase enzyme comprises a recombinant enzyme that exhibits reduced RNase H activity, for example REVERTAID (e.g., from Thermo Fisher Scientific, catalog No. EP0441).
  • the reverse transcriptase can be a commercially-available enzyme, including MULTISCRIBE (e.g., from Thermo Fisher Scientific, catalog # 4311235), THERMOSCRIPT (e.g., from Thermo Fisher Scientific, catalog # 12236-014), or ARRAYSCRIPT (e.g., from Ambion, catalog No. AM2048).
  • the reverse transcriptase enzyme comprises SUPERSCRIPT II (e.g., catalog No. 18064014), SUPERSCRIPT III (e g., catalog No. 18080044), or SUPERSCRIPT IV enzymes (e.g., catalog No. 18090010 ) (all SUPERSCRIPT enzymes from Invitrogen).
  • the reverse transcription reaction can include an RNase inhibitor.
  • the reverse transcription primers comprise a singlestranded oligonucleotide comprising DNA, RNA, or chimeric DNA/RNA.
  • the reverse transcription primers Any combination of adenine (A), thymine (T), guanine (G), cytosine (C), uracil (U) and/or inosine (I).
  • the reverse transcription primers can be any length, for example 5-25 bases, or 25-50 bases, or 50-75 bases, or 75-100 bases in length or longer.
  • the reverse transcription primers each comprise a 5’ end and 3’ end.
  • the 3’ end of the reverse transcription primers can include a 3’ OH moiety which serves as a nucleotide polymerization initiation site in a polymerase-catalyzed primer extension reaction.
  • the 3’ end of the reverse transcription primers have a chain terminating moiety which blocks a polymerase-catalyzed primer extension reaction. The chain terminating moiety can be removed to convert the 3’ sugar position to an extendible 3 ’OH.
  • the reverse transcription primers are modified to confer resistance to nuclease degradation (e.g., ribonuclease degradation).
  • the reverse transcription primers comprise at least one phosphorothioate diester bond at their 5’ ends which can render the reverse transcription primers resistant to nuclease degradation.
  • the reverse transcription primers comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends.
  • the plurality of reverse transcription primers comprise at least one ribonucleotide and/or at least one 2’-O-methyl, 2’ -O-m ethoxy ethyl (MOE), 2’ fluoro-base nucleotide.
  • the reverse transcription primers comprise phosphorylated 3’ ends. In some embodiments, the reverse transcription primers comprise locked nucleic acid (LNA) bases. In some embodiments, the reverse transcription primers comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
  • LNA locked nucleic acid
  • the entire length of a reverse transcription primer can hybridize to a portion of an RNA molecule.
  • individual reverse transcription primers comprise a 3’ region having a sequence that hybridizes to a portion of an RNA molecule and a 5’ region that carries a tail that does not hybridize to an RNA molecule.
  • the 5’ tail comprises a universal adaptor sequence including any one or any combination of two or more of a sample barcode sequence, an amplification primer binding site, a sequencing primer binding site, a compaction oligonucleotide binding site and/or a surface capture primer binding site.
  • the 5’ tail comprises a unique identification sequence (e.g., unique molecular index (UMI).
  • the 5’ tail comprises a restriction enzyme recognition sequence.
  • individual reverse transcription primers comprise at least a portion of the 3’ region having a homopolymer sequence, for example poly-A, poly-T, poly-C, poly-G or poly-U.
  • the reverse transcription primers can hybridize to any portion of an RNA molecule, including the 5’ or the 3’ end of the RNA molecule, or an internal portion of the RNA molecule.
  • the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA (e.g., targeted transcriptomics). In some embodiments, the plurality of reverse transcription primers further comprise a second sub-population of target-specific reverse transcription primers that hybridize selectively to the second target RNA. In some embodiments, the target-specific reverse transcription primers comprise a pre-determined sequence at the 3’ region which hybridizes to a target RNA molecule. In some embodiments, the pre-determined sequence portion of the reverse transcription primers can be 4-20 bases, or 20-40 bases, or 40-50 bases in length.
  • the first sub-population of target-specific reverse transcription primers can selectively hybridize to an RNA transcribed in the cellular sample by a housekeeping gene.
  • selection of the housekeeping gene may be dependent upon the type of cellular sample to be used for the in situ methods described herein.
  • Exemplary housekeeping genes include glyceraldehyde-3 -phosphate dehydrogenase (GAPDH), beta-actins (ACTB), tubulins, PPIA (peptidyl-prolyl cis-trans isomerase), NME4 (NME/NM23 nucleoside diphosphate kinase 4), SMARCAL1 (SWI/SNF related matrix associated actin dependent regulator of chromatin, subfamily A like 1), and POMK (protein-O-mannose kinase).
  • GPDH glyceraldehyde-3 -phosphate dehydrogenase
  • ACTB beta-actins
  • tubulins tubulins
  • PPIA peptidyl-prolyl cis-trans isomerase
  • NME4 NME/NM23 nucleoside diphosphate kinase 4
  • SMARCAL1 SWI/SNF related matrix associated actin dependent regulator of chromatin, subfamily A like 1
  • the second sub-population of target-specific reverse transcription primers can selectively hybridize to an RNA transcribed from a gene that is expressed in the cellular sample being examined (e.g., a cell-specific or tissue-specific RNA).
  • the plurality of reverse transcription primers comprises a first sub-population of random-sequence reverse transcription primers that hybridize to the first target RNA (e.g., whole transcriptomics).
  • the plurality of reverse transcription primers further comprises a second sub-population of randomsequence reverse transcription primers that hybridize to the second target RNA.
  • the reverse transcription primers comprise a random and/or degenerate sequence at the 3’ region which hybridizes to an RNA molecule.
  • the random-sequence or the degenerate-sequence portion of the reverse transcription primers can be 4-20 bases, or 20-40 bases, or 40-50 bases in length.
  • sequencing polymerases can be used for conducting sequencing reactions.
  • the sequencing polymerase(s) is/are capable of binding and incorporating a complementary nucleotide opposite a nucleotide in a concatemer template molecule.
  • the sequencing polymerase(s) is/are capable of binding a complementary nucleotide unit of a multivalent molecule opposite a nucleotide in a concatemer template molecule.
  • the plurality of sequencing polymerases comprise recombinant mutant polymerases.
  • suitable polymerases for use in sequencing with nucleotides and/or multivalent molecules include but are not limited to: Klenow DNA polymerase; Thermus aquaticus DNA polymerase I (Taq polymerase); KlenTaq polymerase; Candidatus altiarchaeales archaeon; Candidatus Hadarchaeum Yellowstonense; Hadesarchaea archaeon; Euryarchaeota archaeon; Thermoplasmata archaeon; Thermococcus polymerases such as Thermococcus litoralis, bacteriophage T7 DNA polymerase; human alpha, delta and epsilon DNA polymerases; bacteriophage polymerases such as T4, RB69 and phi29 bacteriophage DNA polymerases; Pyrococcus furiosus DNA polymerase (Pfu polymerase); Bacillus subtilis DNA polymerase III; E.
  • Klenow DNA polymerase Thermus aquaticus
  • coli DNA polymerase III alpha and epsilon 9 degree N polymerase
  • reverse transcriptases such as HIV type M or O reverse transcriptases
  • avian myeloblastosis virus reverse transcriptase Moloney Murine Leukemia Virus (MMLV) reverse transcriptase
  • MMLV Moloney Murine Leukemia Virus
  • DNA polymerases include those from various Archaea genera, such as, Aeropyrum, Archaeglobus, Desulfurococcus, Pyrobaculum, Pyrococcus, Pyrolobus, Pyrodictium, Staphylothermus, Stetteria, Sulfolobus, Thermococcus, and Vulcanisaeta and the like or variants thereof, including such polymerases as are known in the art such as 9 degrees N, VENT, DEEP VENT, THERMINATOR, Pfu, KOD, Pfx, Tgo and RB69 polymerases.
  • Archaea genera such as, Aeropyrum, Archaeglobus, Desulfurococcus, Pyrobaculum, Pyrococcus, Pyrolobus, Pyrodictium, Staphylothermus, Stetteria, Sulfolobus, Thermococcus, and Vulcanisaeta and the like or variants thereof, including such polymerases as
  • the sequencing comprises conducting sequencing-by-binding (SBB) reactions inside the cellular sample, where the cDNA amplicons are the concatemer molecules.
  • the sequencing-by- binding (SBB) procedure employs non-labeled chain-terminating nucleotides.
  • a cycle of sequencing-by-binding comprises the steps of (a) sequentially contacting a primed concatemer (e.g., a concatemer annealed to a plurality of sequencing primers) with at least two separate mixtures under ternary complex stabilizing conditions, wherein the at least two separate mixtures each include a polymerase and a nucleotide, whereby the sequentially contacting results in the primed concatemer being contacted, under the ternary complex stabilizing conditions, with nucleotide cognates for first, second and third base type base types in the template; (b) examining the at least two separate mixtures to determine whether a ternary complex formed; and (c) identifying the next correct nucleotide for the primed concatemer, wherein the next correct nucleotide is identified as a cognate of the first, second or third base type if ternary complex is detected in step (b), and wherein the next correct nucleotide is imputed to be
  • any of the sequencing methods described herein can employ at least one nucleotide.
  • the nucleotides comprise a base, sugar and at least one phosphate group.
  • at least one nucleotide in the plurality comprises an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and one or more phosphate groups (e.g., 1-10 phosphate groups).
  • the plurality of nucleotides can comprise at least one type of nucleotide selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP.
  • the plurality of nucleotides can comprise at a mixture of any combination of two or more types of nucleotides selected from a group consisting of dATP, dGTP, dCTP, dTTP and/or dUTP.
  • at least one nucleotide in the plurality is not a nucleotide analog.
  • at least one nucleotide in the plurality comprises a nucleotide analog.
  • At least one nucleotide in the plurality of nucleotides comprise a chain of one, two or three phosphorus atoms where the chain is typically attached to the 5’ carbon of the sugar moiety via an ester or phosphoramide linkage.
  • at least one nucleotide in the plurality is an analog having a phosphorus chain in which the phosphorus atoms are linked together with intervening O, S, NH, methylene or ethylene.
  • the phosphorus atoms in the chain include substituted side groups including O, S or BH3.
  • the chain includes phosphate groups substituted with analogs including phosphoramidate, phosphorothioate, phosphordithioate, and O-methylphosphoroamidite groups.
  • At least one nucleotide in the plurality of nucleotides comprises a terminator nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety can inhibit polymerase-catalyzed incorporation of a subsequent nucleotide unit or free nucleotide in a nascent strand during a primer extension reaction.
  • the chain terminating moiety is attached to the 3’ sugar hydroxyl position where the sugar comprises a ribose or deoxyribose sugar moiety. In some embodiments, the chain terminating moiety is removable/cleavable from the 3’ sugar hydroxyl position to generate a nucleotide having a 3 ’OH sugar group which is extendible with a subsequent nucleotide in a polymerase-catalyzed nucleotide incorporation reaction.
  • the chain terminating moiety comprises an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group.
  • the chain terminating moiety is cleavable/removable from the nucleotide, for example by reacting the chain terminating moiety with a chemical agent, pH change, light or heat.
  • the chain terminating moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3-Dichloro-5,6-dicyano-l,4-benzo-quinone (DDQ).
  • the chain terminating moieties aryl and benzyl are cleavable with H2 Pd/C.
  • the chain terminating moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including betamercaptoethanol or dithiothritol (DTT).
  • the chain terminating moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH).
  • the chain terminating moieties urea and silyl are cleavable with tetrabutyl ammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
  • At least one nucleotide in the plurality of nucleotides comprises a terminator nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety comprises an azide, azido or azidomethyl group.
  • the chain terminating moiety comprises a 3’-O-azido or 3’-O-azidomethyl group.
  • the chain terminating moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound.
  • the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety.
  • the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP).
  • the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
  • the nucleotide comprises a chain terminating moiety which is selected from a group consisting of 3’-deoxy nucleotides, 2’,3’-dideoxynucleotides, 3’-methyl, 3’-azido, 3’- azidom ethyl, 3’-O-azidoalkyl, 3’-O-ethynyl, 3’-O-aminoalkyl, 3’-O-fluoroalkyl, 3’- fluorom ethyl, 3 ’-difluoromethyl, 3 ’-trifluoromethyl, 3 ’-sulfonyl, 3 ’-malonyl, 3 ’-amino, 3’-O-amino, 3’-sulfhydral, 3 ’-aminomethyl, 3’-ethyl, 3’butyl, 3" -tert butyl
  • the plurality of nucleotides comprises a plurality of nucleotides labeled with detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the fluorophore is attached to the nucleotide base.
  • the fluorophore is attached to the nucleotide base with a linker which is cleavable/removable from the base.
  • at least one of the nucleotides in the plurality is not labeled with a detectable reporter moiety.
  • a particular detectable reporter moiety e.g., fluorophore
  • the nucleotide base e.g., dATP, dGTP, dCTP, dTTP or dUTP
  • the nucleotide base e.g., dATP, dGTP, dCTP, dTTP or dUTP
  • the cleavable linker on the nucleotide base comprises a cleavable moiety comprising an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group.
  • the cleavable linker on the base is cleavable/removable from the base by reacting the cleavable moiety with a chemical agent, pH change, light or heat.
  • the cleavable moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3-Dichloro-5,6-dicyano-l,4-benzo-quinone (DDQ).
  • the cleavable moieties aryl and benzyl are cleavable with H2 Pd/C.
  • the cleavable moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including beta-mercaptoethanol or dithiothritol (DTT).
  • the cleavable moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH).
  • the cleavable moieties urea and silyl are cleavable with tetrabutylammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
  • the cleavable linker on the nucleotide base comprises cleavable moiety including an azide, azido or azidomethyl group.
  • the cleavable moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound.
  • the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety.
  • the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP).
  • the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
  • the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the cleavable linker on the nucleotide base have the same or different cleavable moieties.
  • the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the detectable reporter moiety linked to the base are chemically cleavable/removable with the same chemical agent.
  • the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the detectable reporter moiety linked to the base are chemically cleavable/removable with different chemical agents.
  • the sequencing employs at least one multivalent molecule which comprises a plurality of nucleotide arms attached to a core and having any configuration including a starburst, helter skelter, or bottle brush configuration (e.g., FIG. 16).
  • the multivalent molecule comprises: (1) a core; and (2) a plurality of nucleotide arms which comprise (i) a core attachment moiety, (ii) a spacer comprising a PEG moiety, (iii) a linker, and (iv) a nucleotide unit, wherein the core is attached to the plurality of nucleotide arms, wherein the spacer is attached to the linker, wherein the linker is attached to the nucleotide unit.
  • the nucleotide unit comprises a base, sugar and at least one phosphate group, and the linker is attached to the nucleotide unit through the base.
  • the linker comprises an aliphatic chain or an oligo ethylene glycol chain where both linker chains having 2-6 subunits. In some embodiments, the linker also includes an aromatic moiety.
  • An exemplary nucleotide arm is shown in FIG. 20. Exemplary multivalent molecules are shown in FIGS. 16-19. An exemplary spacer is shown in FIG. 21 (top) and exemplary linkers are shown in FIG. 21 (bottom) and FIG. 22. Exemplary nucleotides attached to a linker are shown in FIGS. 23 A-23D. An exemplary biotinylated nucleotide arm is shown in FIG. 24.
  • a multivalent molecule comprises a core attached to multiple nucleotide arms, and wherein the multiple nucleotide arms have the same type of nucleotide unit which is selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP.
  • a multivalent molecule comprises a core attached to multiple nucleotide arms, where each arm includes a nucleotide unit.
  • the nucleotide unit comprises an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and one or more phosphate groups (e.g., 1-10 phosphate groups).
  • the plurality of multivalent molecules can comprise one type multivalent molecule having one type of nucleotide unit selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP.
  • the plurality of multivalent molecules can comprise at a mixture of any combination of two or more types of multivalent molecules, where individual multivalent molecules in the mixture comprise nucleotide units selected from a group consisting of dATP, dGTP, dCTP, dTTP and/or dUTP.
  • the nucleotide unit comprises a chain of one, two or three phosphorus atoms where the chain is typically attached to the 5’ carbon of the sugar moiety via an ester or phosphoramide linkage.
  • at least one nucleotide unit is a nucleotide analog having a phosphorus chain in which the phosphorus atoms are linked together with intervening O, S, NH, methylene or ethylene.
  • the phosphorus atoms in the chain include substituted side groups including O, S or BH3.
  • the chain includes phosphate groups substituted with analogs including phosphoramidate, phosphorothioate, phosphordithioate, and O-methylphosphoroamidite groups.
  • the multivalent molecule comprises a core attached to multiple nucleotide arms, and wherein individual nucleotide arms comprise a nucleotide unit which is a nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the nucleotide unit comprises a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety can inhibit polymerase- catalyzed incorporation of a subsequent nucleotide unit or free nucleotide in a nascent strand during a primer extension reaction.
  • the chain terminating moiety is attached to the 3’ sugar hydroxyl position where the sugar comprises a ribose or deoxyribose sugar moiety.
  • the chain terminating moiety is removable/cleavable from the 3’ sugar hydroxyl position to generate a nucleotide having a 3 ’OH sugar group which is extendible with a subsequent nucleotide in a polymerase- catalyzed nucleotide incorporation reaction.
  • the chain terminating moiety comprises an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group.
  • the chain terminating moiety is cleavable/removable from the nucleotide unit, for example by reacting the chain terminating moiety with a chemical agent, pH change, light or heat.
  • the chain terminating moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3- Dichl oro-5, 6-di cyano- 1,4-benzo-quinone (DDQ).
  • the chain terminating moieties aryl and benzyl are cleavable with H2 Pd/C.
  • the chain terminating moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including beta-mercaptoethanol or dithiothritol (DTT).
  • the chain terminating moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH).
  • the chain terminating moieties urea and silyl are cleavable with tetrabutyl ammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
  • the nucleotide unit comprises a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety comprises an azide, azido or azidomethyl group.
  • the chain terminating moiety comprises a 3’-O-azido or 3 ’-0 -azidomethyl group.
  • the chain terminating moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound.
  • the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety.
  • the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP).
  • the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
  • the nucleotide unit comprising a chain terminating moiety which is selected from a group consisting of 3’-deoxy nucleotides, 2’,3’- dideoxynucleotides, 3 ’-methyl, 3 ’-azido, 3 ’-azidomethyl, 3’-O-azidoalkyl, 3’-O-ethynyl, 3’-O-aminoalkyl, 3’-O-fluoroalkyl, 3 ’-fluoromethyl, 3 ’-difluoromethyl, 3’- trifluoromethyl, 3 ’-sulfonyl, 3 ’-malonyl, 3 ’-amino, 3’-O-amino, 3’-sulfhydral, 3’- aminomethyl, 3’-ethyl, 3’butyl, 3" -tert butyl, 3’- Fluorenylmethyloxy carbonyl,
  • the multivalent molecule comprises a core attached to multiple nucleotide arms, wherein the nucleotide arms comprise a spacer, linker and nucleotide unit, and wherein the core, linker and/or nucleotide unit is labeled with detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • a particular detectable reporter moiety e.g., fluorophore
  • the multivalent molecule can correspond to the base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) of the nucleotide unit to permit detection and identification of the nucleotide base.
  • At least one nucleotide arm of a multivalent molecule has a nucleotide unit that is attached to a detectable reporter moiety.
  • the detectable reporter moiety is attached to the nucleotide base.
  • the detectable reporter moiety comprises a fluorophore.
  • a particular detectable reporter moiety (e.g., fluorophore) that is attached to the multivalent molecule can correspond to the base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) of the nucleotide unit to permit detection and identification of the nucleotide base.
  • the core of a multivalent molecule comprises an avidin-like or streptavidin-like moiety and the core attachment moiety comprises biotin.
  • the core comprises a streptavidin-type or avidin-type moiety which includes an avidin protein, as well as any derivatives, analogs and other non-native forms of avidin that can bind to at least one biotin moiety.
  • Other forms of avidin moieties include native and recombinant avidin and streptavidin as well as derivatized molecules, e.g. non-glycosylated avidin and truncated streptavidins .
  • avidin moiety includes de-glycosylated forms of avidin, bacterial streptavidin produced by Streptomyces (e.g., Streptomyces avidinii), as well as derivatized forms, for example, N-acyl avidins, e.g., N-acetyl, N-phthalyl and N-succinyl avidin, and the commercially- available products EXTRAVIDIN, CAPTAVIDIN, NEUTRAVIDIN and NEUTRALITE AVIDIN.
  • any of the methods for sequencing nucleic acid molecules described herein can include forming a binding complex, where the binding complex comprises (i) a polymerase, a nucleic acid concatemer molecule duplexed with a primer, and a nucleotide, or the binding complex comprises (ii) a polymerase, a nucleic acid concatemer molecule duplexed with a primer, and a nucleotide unit of a multivalent molecule.
  • the binding complex has a persistence time of greater than about 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1 second.
  • the binding complex has a persistence time of greater than about 0.1-0.25 seconds, or about 0.25-0.5 seconds, or about 0.5-0.75 seconds, or about 0.75-1 second, or about 1-2 seconds, or about 2-3 seconds, or about 3-4 second, or about 4-5 seconds, and/or wherein the method is or may be carried out at a temperature of at or above 15 °C, at or above 20 °C, at or above 25 °C, at or above 35 °C, at or above 37 °C, at or above 42 °C at or above 55 °C at or above 60 °C, or at or above 72 °C, or at or above 80 °C, or within a range defined by any of the foregoing.
  • the binding complex (e.g., ternary complex) remains stable until subjected to a condition that causes dissociation of interactions between any of the polymerase, template molecule, primer and/or the nucleotide unit or the nucleotide.
  • a dissociating condition comprises contacting the binding complex with any one or any combination of a detergent, EDTA and/or water.
  • the present disclosure provides said method wherein the binding complex is deposited on, attached to, or hybridized to, a surface showing a contrast to noise ratio in the detecting step of greater than 20.
  • the present disclosure provides said method wherein the contacting is performed under a condition that stabilizes the binding complex when the nucleotide or nucleotide unit is complementary to a next base of the template nucleic acid, and destabilizes the binding complex when the nucleotide or nucleotide unit is not complementary to the next base of the template nucleic acid.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

La présente invention concerne des systèmes de séquençage et des procédés de séquençage permettant d'entraîner des réseaux neuronaux et d'utiliser les réseaux neuronaux entraînés pour l'analyse de séquençage après avoir obtenu des images de cellules de circulation à l'aide des systèmes de séquençage. Les systèmes de séquençage selon invention peuvent comprendre des réseaux de portes programmables sur le terrain (FPGA), des puces d'intelligence artificielle (IA) ou une combinaison de ceux-ci.
PCT/US2025/014022 2024-02-02 2025-01-31 Identification tridimensionnelle de bases dans une analyse de séquençage de nouvelle génération Pending WO2025166157A1 (fr)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US202463549327P 2024-02-02 2024-02-02
US202463549333P 2024-02-02 2024-02-02
US63/549,333 2024-02-02
US63/549,327 2024-02-02
US202463570038P 2024-03-26 2024-03-26
US63/570,038 2024-03-26
US202463661332P 2024-06-18 2024-06-18
US63/661,332 2024-06-18
US202463724712P 2024-11-25 2024-11-25
US63/724,712 2024-11-25
US202463736743P 2024-12-20 2024-12-20
US63/736,743 2024-12-20

Publications (1)

Publication Number Publication Date
WO2025166157A1 true WO2025166157A1 (fr) 2025-08-07

Family

ID=96591469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/014022 Pending WO2025166157A1 (fr) 2024-02-02 2025-01-31 Identification tridimensionnelle de bases dans une analyse de séquençage de nouvelle génération

Country Status (1)

Country Link
WO (1) WO2025166157A1 (fr)

Similar Documents

Publication Publication Date Title
US20230326065A1 (en) Primary analysis in next generation sequencing
AU2022407175A1 (en) Primary analysis in next generation sequencing
US20250285709A1 (en) Phasing and prephasing correction of base calling in next generation sequencing
US20250209617A1 (en) Image registration in primary analysis
US20250349138A1 (en) Three-dimensional base calling in next generation sequencing analysis
US12469162B2 (en) Primary analysis in next generation sequencing
EP4590856A2 (fr) Augmentation du débit de séquençage dans le séquençage de nouvelle génération d'échantillons tridimensionnels
US20250232578A1 (en) Quality measurement of base calling in next generation sequencing
US20250363596A1 (en) Color correction of flow cell images
WO2024081805A1 (fr) Séparation de données de séquençage en parallèle avec un cycle de séquençage dans une analyse de données de séquençage nouvelle génération
AU2023282904A1 (en) Adapter trimming and determination in next generation sequencing data analysis
EP4445327A1 (fr) Analyse primaire dans le cadre d'un séquençage de nouvelle génération
WO2025166157A1 (fr) Identification tridimensionnelle de bases dans une analyse de séquençage de nouvelle génération

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25749461

Country of ref document: EP

Kind code of ref document: A1