[go: up one dir, main page]

WO2025166157A1 - Three-dimensional base calling in next generation sequencing analysis - Google Patents

Three-dimensional base calling in next generation sequencing analysis

Info

Publication number
WO2025166157A1
WO2025166157A1 PCT/US2025/014022 US2025014022W WO2025166157A1 WO 2025166157 A1 WO2025166157 A1 WO 2025166157A1 US 2025014022 W US2025014022 W US 2025014022W WO 2025166157 A1 WO2025166157 A1 WO 2025166157A1
Authority
WO
WIPO (PCT)
Prior art keywords
flow cell
computer
cell images
implemented method
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/014022
Other languages
French (fr)
Inventor
Haosen WANG
Ryan Kelley
Connor THOMPSON
Minghao GUO
Weston DAMRON
Christopher Brown
Michael Previte
Eric KOFMAN
Amirali Kia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Element Biosciences Inc
Original Assignee
Element Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Element Biosciences Inc filed Critical Element Biosciences Inc
Publication of WO2025166157A1 publication Critical patent/WO2025166157A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1429Signal processing
    • G01N15/1433Signal processing using image recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10064Fluorescence image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR

Definitions

  • Embodiments of this disclosure relate generally to image processing and base calling in sequencing data analysis, and particularly to three-dimensional (3D) images of in situ samples.
  • next-generation sequencing or NGS-like applications such as sequencing by synthesis, sequencing by binding, or sequencing by avidity
  • NGS next-generation sequencing
  • a new strand is synthesized one nucleotide base at a time.
  • one base attaches to any given strand.
  • image(s) are recorded.
  • a base-calling algorithm is applied to the image(s) to “read” the successive signals from each cluster or polony and convert the optical signals into an identification of the nucleotide base sequence added to each DNA fragment.
  • Traditional sequencing data analysis relies on two-dimensional (2D) flow cell images.
  • flow cell images at a selected z level can include signals from out-of-focus polonies located at adjacent z levels and other undesired signals, e.g., from the cell membrane.
  • 3D three-dimensional
  • the image processing methods herein may function to reverse the imaging process of an optical system and virtually improve the full width half maximum (FWHM) of the optical system.
  • the image processing methods disclosed herein may advantageously increase detectable density of polonies or clusters in 3D samples or traditional 2D samples.
  • the methods herein may advantageously lessen the impact of color mixing of polonies that may be caused by neighboring polonies in 2D or 3D dimensions by computationally increasing the spatial resolution of the flow cell images.
  • a neural network e.g., a convolutional neural network
  • a convolutional neural network is used in generating a high-resolution z-stack of flow cell images of the 3D sample from the low-resolution z-stack that has been acquired from the sequencing system, and subsequent primary analysis can be performed based on the high-resolution flow cell images instead of the low-resolution flow cell images.
  • the neural network e.g., a convolutional neural network, is used in image processing of the high- resolution z-stacks of flow cell images of the samples to generate the base callings.
  • Embodiments of these aspects include corresponding computer systems, apparatus, and computer program product recorded on computer storage device(s), which, alone or in combination, configured to perform the operations of the methods.
  • the computer system has installed on it software, firmware, hardware, or their combinations that in operation cause the computer system to perform the operations or actions.
  • the computer program product includes instructions that, when executed, by a hardware processor, cause the hardware processor to perform the operations or actions.
  • FIG. 1 illustrates a block diagram of a sequencing system for performing sequencing, flow cell image processing, and/or primary analysis operations including base calling using flow cell images, according to some embodiments.
  • FIGS. 2A-2C show an exemplary simulated flow cell image (FIG. 2A, of a in situ cell sample) and two different images (FIGS. 2B-2C) predicted using the systems and methods herein and corresponding to the image in FIG. 2A, according to some embodiments.
  • the predicted images are at different z levels.
  • FIGS. 2D-2E show exemplary simulated flow cell image in the reference set.
  • the simulated flow cell images are generated using the methods herein with the first (FIG. 2D) and second (FIG. 2E) resolutions, according to some embodiments.
  • FIGS. 3A-3D show two exemplary flow cell images (FIGS. 3A and 3D) with multiple cells at two different z levels, and two different predicted images at different z levels (FIGS. 3B-3C) generated from the image in FIG. 3A using the systems and methods herein according to some embodiments.
  • FIGS. 3E-3F shows improved detection of targets per cell in the same imaging area (FIG. 3E) and fewer false positives (FIG. 3F) using the methods herein when compared with non-artificial intelligence-based methods; in this case, the targets are polonies or clusters within the cells.
  • FIG. 3G shows improved detection of targets in simulated flow cell images of sample(s) using the neural network herein which produces higher R 2 value than a traditional method.
  • FIG. 4 illustrates a block diagram of a computer system for performing image processing, sequencing analysis, training of neural network(s), predicting base calls, image intensities, high resolution images, and/or classifications using the pre-trained neural networks, and/or base calling, according to some embodiments.
  • FIG. 5A is a flow chart of an exemplary method of predicting 3D flow cell images of sequencing sample(s) and performing base calling using the 3D flow cell images, according to some embodiments.
  • FIG. 5B is a flow chart of an exemplary method of training a neural network that can be used to predict higher resolution flow cell images of sequencing sample(s), according to some embodiments.
  • FIG. 5C is a schematic showing of an exemplary embodiment of the first reconfigurable logic device, the integrated circuit, and their connection(s) to the processor of the sequencing system.
  • FIG. 5D is a schematic showing of an exemplary embodiment of using the first reconfigurable logic device and the integrated circuit in parallel with a sequencing run in progress within a predetermined time window.
  • FIG. 5E is a flow chart of an exemplary method of training a neural network, thereby generating a pre-trained neural network that can be used to predict higher resolution flow cell images of sequencing sample(s), base calls, intensities, and/or classifications, according to some embodiments.
  • FIG. 5F shows scatter plots for an exemplary embodiment of generating reference intensities from high resolution training flow cell images.
  • FIG. 6 is a schematic showing exemplary embodiments of padlock probes.
  • FIG. 7 is a schematic showing a workflow for generating inside a cell circularized padlock probes, comprising generating first and second cDNAs from first and second target RNA molecules (respectively), hybridizing first and second padlock probes to the first and second cDNA molecules (respectively) to generate first and second circularized padlock probes (respectively).
  • FIG. 8 is a schematic showing a rolling circle and sequencing workflow inside a cell, comprising generating first and second concatemers by conducting rolling circle amplification using first and second covalently closed circular molecules (respectively). The first and second concatemers are subjected to a sequencing workflow using universal sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents.
  • FIG. 9 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 10 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 11 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 12 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
  • FIG. 13 is a schematic showing a workflow for generating circularized padlock probes, comprising generating first and second cDNAs from first and second target RNA molecules (respectively), hybridizing first and second padlock probes to the first and second cDNA molecules (respectively) to generate first and second circularized padlock probes (respectively).
  • FIG. 14 is a schematic showing a rolling circle and sequencing workflow comprising generating first and second concatemers by conducting rolling circle amplification using first and second covalently closed circular molecules (respectively).
  • FIG. 15 is a schematic of an exemplary low binding support comprising a glass substrate and alternating layers of hydrophilic coatings which are covalently or non- covalently adhered to the glass, and which further comprises chemically-reactive functional groups that serve as attachment sites for oligonucleotide primers (e.g., capture oligonucleotides).
  • oligonucleotide primers e.g., capture oligonucleotides
  • FIG. 16 is a schematic of various exemplary configurations of multivalent molecules.
  • Left (Class I) schematics of multivalent molecules having a “starburst” or “helter-skelter” configuration.
  • Center (Class II) a schematic of a multivalent molecule having a dendrimer configuration.
  • Right (Class III) a schematic of multiple multivalent molecules formed by reacting streptavidin with 4-arm or 8-arm PEG-NHS with biotin and dNTPs. Nucleotide units are designated ‘N’, biotin is designated ‘B’, and streptavidin is designated ‘ SA’ .
  • FIG. 17 is a schematic of an exemplary multivalent molecule comprising a generic core attached to a plurality of nucleotide-arms.
  • FIG. 18 is a schematic of an exemplary multivalent molecule comprising a dendrimer core attached to a plurality of nucleotide-arms.
  • FIG. 19 shows a schematic of an exemplary multivalent molecule comprising a core attached to a plurality of nucleotide-arms, where the nucleotide arms comprise biotin, spacer, linker and a nucleotide unit.
  • FIG. 20 is a schematic of an exemplary nucleotide-arm comprising a core attachment moiety, spacer, linker and nucleotide unit.
  • FIG. 21 shows the chemical structure of an exemplary spacer (top), and the chemical structures of various exemplary linkers, including an 11 -atom Linker, 16-atom Linker, 23-atom Linker and an N3 Linker (bottom).
  • FIG. 22 shows the chemical structures of various exemplary linkers, including Linkers 1-9.
  • FIG. 23 A shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 23B shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 23 C shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 23D shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
  • FIG. 24 shows the chemical structure of an exemplary biotinylated nucleotide- arm.
  • FIG. 25 is a schematic of a guanine tetrad (e.g., G-tetrad).
  • FIG. 26 is a schematic of an exemplary intramolecular G-quadruplex structure.
  • FIG. 27 shows an exemplary support with multiple tiles for immobilizing 2D or 3D sample(s) thereon for sequencing, including the cellular sample(s), according to some aspects.
  • FIG. 28 shows a flow chart of an exemplary method of predicting base calls of the flow cell images (e.g., of in situ samples) using the neural network disclosed herein, according to some embodiments.
  • FIG. 29 shows a flow chart of an exemplary method of training the neural network that can be used to predict base calls or high resolution flow cell images, according to some embodiments.
  • FIGS. 30A-30B show a flow cell image (FIG. 30A) and its high resolution image predicted using the neural network that is pre-trained using reference base calls.
  • base calls are determined from the high resolution image using non-neural network based algorithm(s).
  • FIG. 31 shows a block diagram of an exemplary method of training the neural network(s) and an exemplary method of predicting high resolution flow cell images and/or predicting base calls using such pretrained neural network(s).
  • system, apparatus, method, and/or computer program product embodiments, and/or combinations and sub-combinations thereof which enables image processing of flow cell images, e.g., flow cell images obtained from in situ samples or traditional 2D samples in a sequencing run, to: 1) generate images with improved spatial resolution and improved detectable density of polonies or clusters and perform base calling using flow cell images with such improved spatial resolutions, and the generated images may be used for subsequent sequencing analysis including but not limited to base calling; or 2) to predict intensities, base call(s), or classifications of polonies or clusters.
  • the techniques herein can be used while a sequence run is still in progress to improve efficiency of sequencing and sequencing analysis, reduce data storage required during sequencing and sequencing analysis, and improve accuracy and reliability of sequencing analysis.
  • the techniques herein can be used on flow cell images obtained using various imaging and/or sequencing techniques of volumetric 3D samples and/or traditional 2D samples and/or obtained using various sequencing systems, e.g., next generation sequencing (NGS) systems.
  • NGS next generation sequencing
  • the techniques disclosed herein are useful for base calling in NGS, and NGS flow cell images will be used as the primary example herein for describing the application of these techniques.
  • image analysis techniques may also be useful in other applications where spot-detection and/or CCD imaging is used.
  • the techniques herein can be used for processing flow cell images (e.g., 2D or 3D) to generate accurate and reliable image intensities for polonies or clusters with improved spatial resolution thus improved maximum polony or cluster density detected in the sample(s) for accurate and reliable sequencing analysis.
  • the technologies disclosed herein may advantageously function to reverse the imaging process of an optical system and virtually improve the full width half maximum (FWHM) of the imager so that the density of polony locations are not limited by the optical design of the sequencing systems.
  • FWHM full width half maximum
  • the disclosed technologies herein may advantageously increase detected density of polonies, e.g., by 2x, 4x, 8x, 16x, 27x, 40x, 50x, lOOx or more than polony density detectable using traditional optical systems and image processing methods.
  • the disclosed technologies herein may advantageously increase spatial resolution of flow cell images in each of the one or more spatial dimensions by 2x, 4x, 8x, 16x, 27x, 40x, 50x, lOOx or more than flow cell images acquired using traditional optical systems and/or image processing methods.
  • the methods herein may also advantageously lessen the impact of color mixing of polonies that may be caused by neighboring polonies or clusters by computationally increasing the spatial resolution of flow cell images.
  • In situ samples such as cells or tissue can have a thickness along the axial or z direction that cannot remain in-focus within a single 2D image.
  • a z-stack of multiple 2D flow cell images may be acquired to cover clusters or polonies at different z levels, e.g., in a 3D cellular sample. Interferences may occur in the z-stack of flow cell images, such as out-of-focus polonies and background signal from cellular components.
  • a polony that locates at a first z level can appear in a first flow cell image at a first z level and it may also generate a blob of signal in a second 2D flow cell image taken at its adjacent z level where it is out-of-focus.
  • the blob of signal may interfere with intensities of polonies at or near the same x-y location in the second flow cell image, thus deteriorating the accuracy and reliability of base callings.
  • color mixing from neighboring polonies may interfere with polony intensity or polony density that can be detected for subsequent base calling.
  • the techniques disclosed herein advantageously train a neural network to efficiently and accurately predict polony or cluster locations in the sample(s).
  • the techniques disclosed herein advantageously train a neural network to efficiently and accurately predict high resolution intensities, base calls, and/or classifications for polonies or clusters in the sample(s).
  • the samples herein are not limited to 3D samples, e.g., in situ cells and/or tissue.
  • the samples herein may also include traditional 2D samples.
  • the techniques disclosed herein may advantageously utilize the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), to: 1) predict high-resolution polony or cluster locations based on low-resolution flow cell images; or 2) to predict intensities, base calls and/or classifications at the high-resolution for the polonies or clusters in the sample(s).
  • the reconfigurable logic device e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs)
  • the utilization of the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), on-board the sequencing system may advantageously reduce computational time, reduce energy consumption, improve sequencing analysis efficiency, reduce data storage space required, and reduce sequencing system cost in analysis of flow cell images when compared with sequencing analysis using existing sequencing systems.
  • FPGAs field-programmable gate arrays
  • NPUs neural processing units
  • the techniques disclosed herein advantageously train a neural network based on a loss function that is determined by comparison to reference base calls as ground truth, while the trained neural network may be used to accurately and reliably predict high resolution post image-processing flow cell images based on the flow cell images that are acquired from the sample(s).
  • the techniques disclosed herein advantageously allow a mismatch in the training outputs and the prediction outputs.
  • the neural network may be trained by generating training base calls as training outputs and comparing the training outputs to reference base calls as ground truth. The trained neural network may then be used to predict high resolution flow cell images or to predict base calls.
  • Such mismatching in training and prediction outputs may advantageously allow reference base calls to be considered in training parameters of the neural network and prediction of higher resolution higher quality version of the flow cell images that can be used to improve base calling accuracy and reliability.
  • Such training and prediction advantageously enable utilization of a simplified neural network which requires less computational burden, reduction in computational time, reduction in power consumption, and reduction in making predictions.
  • the samples herein are not limited to 3D samples, e.g., in situ cells and/or tissue.
  • the samples herein may also include traditional 2D samples.
  • the techniques disclosed herein may advantageously utilize the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), to perform one or more operations in the training and/or the prediction.
  • Primary analysis can include some or all of operations and/or steps needed to perform base calling and compute quality score of the base callings.
  • Primary analysis can involve the formation of a template image for at least part of the flow cell.
  • the template image can include the estimated locations of all detected clusters or polonies in a common coordinate system.
  • the template image can include a polony map that is 2D or 3D.
  • Template images are generated by identifying cluster or polony locations in all images in the first cycle or the first few cycles of the sequencing process. Generation of the template image may need sufficient spatial resolution to differentiate the polonies from background features, neighboring polonies, and/or duplicate polonies that are out-of-focus.
  • FIG. 1 illustrates a block diagram of a computer-implemented system 100, according to one or more embodiments disclosed herein.
  • the system 100 has a sequencing system 110 that includes a flow cell 112, a sequencer 114, an imager 116, data storage 122, and user interface 124.
  • the sequencing system 110 may be connected to a cloud 130.
  • the sequencing system 110 may include one or more of dedicated processors 118, a first reconfigurable logic device, e.g., Field-Programmable Gate Array(s) (FPGAs) 120, and a computing system 126.
  • FPGAs Field-Programmable Gate Array
  • the flow cell 112 is configured to capture DNA fragments and form DNA sequences for base-calling on the flow cell.
  • the flow cell 112 can include a support as disclosed herein.
  • the support can be a solid support.
  • the support can include a surface coating thereon as disclosed herein.
  • the surface coating can be a polymer coating as disclosed herein.
  • a flow cell 112 can include multiple tiles or imaging areas thereon, and each tile may be separated into a grid of subtiles.
  • Each subtile can include a plurality of clusters or polonies immobilized thereon.
  • a flow cell can have 424 tiles, and each tile can be divided into a 6 x 9 grid, therefore 54 subtiles.
  • the flow cell image as disclosed herein can be an image including signals of a plurality of clusters or polonies.
  • the flow cell image can include one or more tiles of signals or one or more subtiles of signals.
  • a flow cell image can be an image that includes all the tiles and approximately all signals thereon.
  • each tile may include millions of polonies or clusters.
  • a tile can include about 1 to 10 million of clusters or polonies.
  • Each polony can be a collection of many copies of DNA fragments.
  • the flow cell images may be acquired using the imager 116 at single or multiple z levels along a z axis orthogonal to the image plane of the flow cell images.
  • the flow cell images can include multiple z-levels (i.e., z levels) in order to cover the whole sample(s) in 3D.
  • the z axis can extend from the objective lens of the imager 116 disclosed herein to the support, e.g., flow cell 112.
  • the z axis can be orthogonal to the image plane of the flow cell images.
  • Each z level of flow cell images may be separated from the adjacent z level(s) for a predetermined distance, for example, ranging from about 0.1 um to about 15 urns, or from 0.02 um to 10 urns.
  • Each z level of flow cell images may be separated from the adjacent z level(s) for a distance ranging from 0.5 um to 10 urns, from 0.01 um to 5 urns, or from 0.1 um to 15 urns.
  • flow cell images can be acquired from one or more sequencing cycles and/or one or more channels.
  • Each flow cell image may include in its field of view at least part of one or more tiles or subtiles of the flow cell.
  • FIG. 27 shows a portion of a flow cell 2712 with multiple tiles 2710.
  • the image plane is defined by the x and y axis. And the z direction (i.e., z axis) is orthogonal to the x-y plane.
  • the flow cell images, samples, and the z axis are described in a Cartesian coordinate system as shown in FIG. 27, any other coordinate systems can be used to define spatial locations and relationships herein.
  • Other coordinate systems can include but are not limited to the polar coordinate system, cylindrical, or spherical coordinate systems.
  • the sequencer 114 may be configured to flow a nucleotide mixture onto the flow cell 112, cleave blockers from the nucleotides in between flowing steps, and perform other steps for the formation of the DNA sequences on the flow cell 112.
  • the nucleotides may have fluorescent elements attached that emit light or energy in a wavelength that indicates the type of nucleotide. Each type of fluorescent element may correspond to a particular nucleotide base (e.g., A, G, C, T). The fluorescent elements may emit light in visible wavelengths.
  • the sequencer 114 and the flow cell 112 may be configured to perform various sequencing methods disclosed herein, for example, sequencing-by-avidite.
  • each nucleotide base may be assigned a color. Different types of nucleotides can have different colors. Adenine(A) may be red, cytosine(C) may be blue, guanine(G) may be green, and thymine(T) may be yellow, for example.
  • the color or wavelength of the fluorescent element for each nucleotide may be selected so that the nucleotides are distinguishable from one another based on the wavelengths of light emitted by the fluorescent elements.
  • the imager 116 may be configured to capture images of the flow cell 112 after each flowing step.
  • the imager 116 includes a camera configured to capture digital images, such as a CMOS or a CCD camera.
  • the camera may be configured to capture images at the wavelengths of the fluorescent elements bound to the nucleotides.
  • the images acquired by the imager of the sample(s) immobilized on at least a portion of the flow cell can be called the flow cell images.
  • the imager 116 can include one or more optical systems disclose herein.
  • the optical system(s) can be configured to capture optical signals from the flow cell and generate corresponding flow cell images thereof. The flow cell images can then be used for base calling.
  • the images of the flow cell may be captured in groups, where each image in the group is taken at a wavelength or in a spectrum that matches or includes only one of the fluorescent elements. In another embodiment, the images may be captured as single images that capture all of the wavelengths of the fluorescent elements.
  • the resolution of the imager 116 can control the level of detail in the flow cell images, including pixel size. In existing systems, this resolution is very important, as it controls the accuracy with which a spot-finding algorithm identifies the polony or cluster centers. In some embodiments, the image resolution of flow cell images disclosed herein can be about 10 nanometers (nms) to a couple of hundreds of nms or greater.
  • the image resolution of flow cell images can be in a range from 0.1 nm to 1000 nms. In some embodiments, the image resolution of flow cell images can be in a range from 1 nm to 500 nms. In some embodiments, the image resolution of flow cell images can be in a range from 5 nm to 300 nms.
  • One way to increase the accuracy of polony or cluster finding is to improve the resolution of the imager 116, or improve the processing performed on images taken by imager 116. Detecting polony or cluster centers in pixels other than those detected by a spot-finding algorithm can be performed. These methods can allow for improved accuracy in detection of polony or cluster centers without increasing the resolution of the imager 116. The resolution of the imager 116 may even be better than existing systems with comparable performance, which may reduce the cost of the sequencing system 110.
  • the image quality of the flow cell images can control the base calling quality.
  • One way to increase the accuracy of base calling is to improve the imager 116, or improve the processing performed on images taken by imager 116 to result in a better image quality.
  • the methods described herein may predict high resolution of the flow cell images (2x, 4x, or more than existing flow cell image resolution, in a common coordinate system) so that the detectable polony or cluster density can be improved with reduced or eliminated interferences from neighboring polonies, cellular background signal, color mixing, and/or other noises in the flow cell images.
  • 3D base calling can be more accurate using the methods herein when compared with existing methods without using such high resolution flow cell images.
  • Such methods herein can allow for accurate and efficient base calling.
  • the methods can be advantageously performed in parallel with a sequencing run in the computer-implemented system 100, without interference with or delay of existing sequencing workflow of the sequencing system 110.
  • the results of predicted high resolution flow cell images can be available for making base calling in the current sequencing cycle in the sequencing workflow.
  • some or all of the operations disclosed herein can be advantageously performed by the first reconfigurable logic device, e.g., FPGA(s) or the integrated circuit, e.g., an application specific integrated circuit (ASIC) chip, neural processing unit (NPU), or artificial intelligence (Al) chip and data can be communicated between the CPU(s) and the first reconfigurable logic device or integrated circuit to reduce the total operational time from methods operating using only the CPUs.
  • ASIC application specific integrated circuit
  • NPU neural processing unit
  • Al artificial intelligence
  • the sequencing system 110 may be configured to perform operations or actions for image processing of the flow cell images across different cycles and/or channels.
  • the operations or actions disclosed herein may be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or a combination thereof.
  • One or more operations or actions in the methods 500, 600, 700, 2800, 2900 disclosed herein may be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or a combination thereof.
  • which operations or actions are to be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or their combinations can be determined based on one or more of a computation time for the specific operation(s), the complexity of computation in the specific operation(s), the need for data transmission between the hardware devices, the power required for the specific operation(s), or their combinations.
  • Image processing operations or actions of the flow cell images can be performed after the corresponding flow cell images are acquired but before base calling of the flow cell images is performed.
  • the data storage 122 is used to store information used in the methods herein. This information may include the flow cell images themselves or information and/or images derived from the flow images captured by the imager 116.
  • the DNA sequences determined from the base-calling may be stored in the data storage 122. Parameters identifying polony or cluster locations may also be stored in the data storage 122. Raw and/or processed image intensities of each polony or cluster may be stored in the data storage 122.
  • the region and/or subtile that each polony or cluster corresponds to may also be stored in the data storage 122.
  • the transformation matrix of each region and/or subtile for different cycle(s) and/or channel(s) may also be stored in the data storage 122.
  • Cell images may be stored in the data storage 122.
  • the flow cell images, the processed images, and/or the filtered images may be stored in the data storage.
  • Other information or images that can facilitate 3D base calling of the sample can be saved in the data storage.
  • the user interface 124 may be used by a user to operate the sequencing system or access data stored in the data storage 122 or the computing system 126.
  • the computing system 126 may control the general operation of the sequencing system and may be coupled to the user interface 124. It may also perform steps in image processing, base calling, their preceding operations, and/or subsequent operations including but not limited to predicting high resolution flow cell images.
  • the computing system 126 is a computer system 400, as described in more detail in FIG. 4.
  • the computing system 126 may store information regarding the operation(s) of the sequencing system 110, such as configuration information, instructions for operating the sequencing system 110, or user information.
  • the computing system 126 may be configured to pass information between the sequencing system 110 and the cloud 130.
  • the computing system 126 can include one or more general purpose computers that provide interfaces to run a variety of program in an operating system, such as WindowsTM or LinuxTM. Such an operating system typically provides great flexibility to a user.
  • the computing system 126 may include one or more processors, e.g., CPUs, the CPUs may be configured for artificial intelligence algorithm development and training (e.g., neural network training), either alone or in combination with the reconfigurable logic device and/or integrated circuit 120.
  • processors e.g., CPUs
  • the CPUs may be configured for artificial intelligence algorithm development and training (e.g., neural network training), either alone or in combination with the reconfigurable logic device and/or integrated circuit 120.
  • the sequencing system may include one or more reconfigurable logic devices 120 and/or one or more other integrated circuits 120.
  • the reconfigurable logic device 120 can include one or more FPGA devices.
  • the integrated circuit 120 herein may or may not be reconfigurable, and it may include an Al chip, an application-specific integrated circuit (ASIC) chip, a neural processing unit (NPU), or a combination thereof.
  • the reconfigurable logic device and/or integrated circuit 120 may be configured for artificial intelligence algorithm development and training (e.g., training of a neural network), either alone or in combination with the CPU and/or GPU.
  • the reconfigurable logic device and/or integrated circuit 120 include a main unit and an edge unit.
  • the main unit may be a FPGA device and the edge unit may be an ASIC or Al chip.
  • the edge unit is an additional hardware processing module that may be individually installed and/or uninstalled on the system 110.
  • the edge unit may be configured for artificial intelligence algorithm development and training.
  • the edge unit may be configured for making inferences or predictions using deployed Al algorithm(s), e.g., neural networks.
  • the edge unit may communicate electronically with the main unit e.g., data communication via DMA connections.
  • the edge unit may communicate electronically for data with other parts of the system 100 via various connections, such as a chip2chip connection.
  • the edge unit may include a neural processing unit (NPU) chip, an Al chip, or any other integrated circuit(s).
  • NPU neural processing unit
  • the dedicated processors 118 may be configured to perform operations in the methods disclosed herein.
  • the dedicated processors 118 may include one or more reconfigurable logic devices and/or integrated circuits disclosed herein.
  • the dedicated processors 118 may not include general-purpose processors, but instead custom processors with specific hardware or instructions for performing those steps.
  • Dedicated processors directly run specific software without an operating system. The lack of an operating system reduces overhead, at the cost of the flexibility in what the processor may perform.
  • a dedicated processor may make use of a custom programming language, which may be designed to operate more efficiently than the software run on general-purpose computers. This may increase the speed at which the steps are performed and allow for real time processing.
  • the reconfigurable logic device and/or the integrated circuit 120 may be configured to perform some or all of operations in the methods herein.
  • the reconfigurable logic device and/or the integrated circuit may be programmed as hardware that can perform specific task(s).
  • a special programming language may be used to transform software steps into hardware componentry.
  • Each software step may correspond to at least one operation or action in the methods disclosed herein.
  • Each software step may include at least a part of the operation or action in the methods disclosed herein.
  • the reconfigurable logic device and/or integrated circuit generally processes data faster than a general-purpose computer. Similar to dedicated processors, this may be at the cost of flexibility. The lack of software overhead may also allow the reconfigurable logic device and/or the integrated circuit to operate faster than a dedicated processor, although this will depend on the exact processing to be performed and the specific the reconfigurable logic device and/or integrated circuit and dedicated processor.
  • a group of the reconfigurable logic devices and/or integrated circuits 120 may be configured to perform the steps in parallel.
  • a number of processing engines of the FPGA(s) may be configured to perform one or more identical image processing steps for an image, a set of images, a subtile, or a select region in one or more images.
  • Each FPGA(s) 120 may perform its own part of the image processing step(s) in parallel, reducing the time needed to process data. This may allow the image processing step(s) to be completed in real time.
  • a number of processing engines of a first FPGA may be configured to generate a polony map for a tile of the flow cell.
  • Each processing engine may be responsible for generating a portion, e.g., non-overlapping portion, of the polony map at a different subtile within the tile, e.g., in parallel.
  • a second FPGA may be configured to perform intensity normalization in parallel with the generation of the polony map.
  • a number of FPGA(s) and integrated circuits, e.g., Al chips may be configured to perform one or more image processing step(s) for the flow cell images.
  • Each FPGA(s) 120 may perform its own part of the processing step(s) in parallel, reducing the time needed to process data, while each Al chip may perform polony or cluster prediction after receiving data from its corresponding FPGA. This may allow the image processing steps to be completed in real time.
  • a first and second FPGA may be configured to perform intensity registration in parallel for a different subtile or tile of the flow cell.
  • a corresponding Al chip may perform prediction of high resolution flow cell image of the corresponding subtile or tile after image registration is completed by its corresponding FPGA. Further discussion of the use of FPGAs is provided below.
  • the reconfigurable logic device and/or the integrated circuit may be configured to perform some or all of the operations or actions in the methods disclosed herein in real time. Performing the operations or actions in real time may allow the system 110 to use less memory and/or data storage, as the data may be processed as it is received. This is an improvement over conventional systems that may need to store the data before it may be processed and consequently require more memory/data storage or accessing a computer system located in the cloud 130. Further, performing the operations or actions in real time may allow more efficient sequencing analysis as it is being performing in parallel while a sequencing run is still in progress.
  • performing the processing steps using the FPGAs and Al chips may allow the system to use less power, e.g., 2x, 5x, lOx, 20x or more, thus producing less heat than performing the same processing steps using the CPUs and/or GPUs. Further discussion of the use of FPGAs is provided below.
  • the sequencing system 110 may have dedicated processors 118, the reconfigurable logic device and/or integrate circuit 120, or the computing system 126.
  • the sequencing system may use one, two, or all of these elements to accomplish one or more operations or actions in the methods disclosed herein. In some embodiments, when these hardware elements are present together, the image processing tasks are split between them.
  • the reconfigurable logic device 120 may be used to perform some or all of: the preprocessing operations, color correction, polony map generation, image registration, predicting high resolution flow cell images, training a neural network, generating the training flow cell images, base calling, and any subsequent operations, while the computing system 126 may perform other processing functions for the sequencing system 110 such as intensity normalization and registering images for base calling with cell staining image(s).
  • the computing system 126 may perform other processing functions for the sequencing system 110 such as intensity normalization and registering images for base calling with cell staining image(s).
  • one or more reconfigurable logic devices and/or integrated circuits 120 can accelerate base calling and/or any primary analysis steps of flow cell images acquired from 2D or 3D sample(s).
  • the reconfigurable logic devices and/or integrated circuits can accelerate primary analysis of 2D sample(s) or 3D volumetric sample(s) by 2x, 4x, 5x, lOx, 15x, 20x, 25x, 30x, 40x, 50x, lOOx, 200x, 400x, 500x, 800x, lOOOx, or more than traditional primary analysis methods using only CPUs and/or GPUs.
  • one or more reconfigurable logic devices and/or integrated circuits 120 herein can accelerate sequencing and sequencing analysis (including at least primary analysis) of the flow cell images acquired from 2D or 3D sample(s).
  • the reconfigurable logic devices and/or integrated circuits herein can accelerate sequencing and sequencing analysis (including at least primary analysis) of the flow cell images acquired from 2D or 3D sample(s) by 2x, 4x, 5x, lOx, 15x, 20x, 25x, 30x, 40x, 50x, lOOx, 200x, 400x, 500x, 800x, lOOOx, or more than traditional sequencing systems with only CPUs and/or GPUs.
  • making inferences or predictions of high resolution images, of base calls, or of classifications, using the neural network disclosed herein and the reconfigurable logic devices and/or integrated circuits can be less than 800 ms, 500ms, 400ms, 300ms, 200 ms, 100ms, 50ms, 20 ms, or less per tile per cycle.
  • the tile size can be varied in different flow cells.
  • the title size may be at least 0.001 2 mm, 0.01 mm 2 , 0.05 mm 2 , 0.1 mm 2 , 0.5 mm 2 1 mm 2 , 2 mm 2 , 3 mm 2 or more.
  • one or more reconfigurable logic devices and/or integrated circuits 120 can enable primary analysis (base calling) of polonies for flow cell images at multiple z levels.
  • processing time using reconfigurable logic devices can be less than 400 hours for at least 50 flow cell images (e.g., covering 50 tiles and from two or more color channels) with a FOV of at least 1 mm 2 with a resolution of 1 um or better in three dimensions for one or more flow cycles, e.g., 1-15 cycles.
  • the flow cell images can be from multiple z- levels to cover some or all of the volumetric 3D samples (e.g., completely covering at least two samples).
  • one or more reconfigurable logic devices and/or integrated circuits 120 can be used for accelerating primary analysis of 3D samples involving training neural network(s) and using the trained neural networks for making predictions or inferences.
  • neural network(s) can be used to predict polony locations and/or predict cell boundaries thereby identifying polonies within the cell(s).
  • Using the reconfigurable logic device and/or integrated circuits 120 for computations associated with neural networks can reduce the training and/or prediction time needed in comparison with usage of GPUs or other computer processors, thereby accelerating sequence analysis, and enabling sequence analysis of flow cycles while subsequent flow cycles are to be performed or in progress in the sequence run.
  • the reconfigurable logic device(s) and/or integrated circuits 120 can accelerate training and/or prediction by lOx, 20x, 50x, 80x, lOOx, 200x, 500x, 600x, 800x, lOOOx, or more than training and/or prediction using CPUs and/or GPUs.
  • the reconfigurable logic devices and/or integrated circuits 120 can be used to achieve optimal acceleration in sequencing analysis.
  • one or more FPGA chips can be used in combination with an integrated circuit specific for computations corresponding to artificial intelligence (Al) algorithms, e.g., a NPU.
  • the integrated circuit(s) can be specific circuits for Al functions.
  • the integrated circuit(s) can include applicationspecific integrated circuits (ASIC).
  • Computational tasks can be distributed to the FPGA(s) and the integrated circuit(s) to optimize computational time, energy consumption, heat dissipation, etc.
  • the Al chip may be used only for computations involving a neural network (e.g., predicting polony locations, predicting high resolution flow cell images, or training the neural network) and the FPGA(s) may be used for the rest of the primary analysis steps.
  • the primary analysis time using dual FPGA chips or single FGPA chip in connection with the Al chip(s) can be less than 400, 300, 200, 100, 50, or 20 hours for at least 50 flow cell images (e.g., covering about 50 tiles of the flow cell and from two or more color channels) with a FOV of at least 1 mm 2 with a resolution of 1 um or better for each flow cell image in three dimensions for one or more flow cycles, e.g., 1-15 cycles.
  • the flow cell images can be from multiple z-levels to cover some or all of the volumetric 3D samples (e.g., 10 to 20 z-locations to completely cover at least two samples).
  • the primary analysis time may include a total time of image processing from obtaining raw flow cell images acquired using the imager 116 to generating base calls and saving base call results.
  • the 3D samples herein includes polonies or clusters that are centered at different z levels that are spaced apart from each other with at least 0.01 um, 0.05 um, 0.1 um, 0.2 um, 0.5 um, 1 um, or more along the z direction or axial direction.
  • the cloud 130 may be a network, remote storage, or some other remote computing system separate from the sequencing system 110.
  • the connection to cloud 130 may allow access to data stored externally to the sequencing system 110 or allow for updating of software in the sequencing system 110.
  • FIG. 5C shows an exemplary embodiment of the reconfigurable logic device and the integrated circuit(s) of the sequencing system disclosed herein.
  • the sequencing system 110 may include one or more reconfigurable logic devices 120_a.
  • the sequencing system comprises a single reconfigurable logic device, i.e., a first reconfigurable logic device 120_a.
  • the sequencing system comprises multiple reconfigurable logic devices (not shown).
  • the reconfigurable logic device may comprise data processing engines 5011 configured to perform data processing in parallel. Each data processing engine may include a combination of digital logic circuit to perform its function, e.g., intensity extraction, convolution, registration, etc.
  • the sequencing system 110 may further include reconfigurable routing channels 5013 that may function as connections among the data processing engines 5011 and may also connect the data processing engines to other structural elements, e.g., the first processor and the memory device, of the sequencing system 110.
  • a neural network may be deployed at least partly on the reconfigurable logic device 120_a so that the reconfigurable logic device can be used for at least some computational tasks for generating inferences using the neural network.
  • the neural network may be pretrained using various training methods and data, for example, using the training methods and training data disclosed herein.
  • the sequencing system may further include a first processor 120_c to selectively activate or deactivate different combinations of the of data processing engines 120_a and the reconfigurable routing channels 120_b.
  • the FPGA(s) 120 as shown in FIG. 1 of the sequencing system 110 may include one or more of the reconfigurable logic device 120_a, the integrated circuit 120_b, and the processor 120_c.
  • the FPGA(S) 120 may only include the reconfigurable logic device 120_a and the processor 120_c, but not the integrated circuit(s) 120_b.
  • the different combinations of the of data processing engines 5011 and the reconfigurable routing channels 5013 may be configured to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis.
  • Such operation(s) may include one or more of (a) obtaining sensor data from one or more sensors (in the imager 116) of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; (c) predicting a second plurality of flow cell images using the neural network based on the sensor data or the first plurality of flow cell images; (d) determining polonies from the second plurality of flow cell images; and (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
  • the sensor data includes raw data that has been acquired from the sensor(s) of the imager without any additional image processing.
  • the sensor data includes raw flow cell images that have not been processed by the computing system 126, the dedicated processors 118, and/or the reconfigurable logic device and integrated circuit(s) 120 of the sequencing system 110.
  • the sequencing system comprises: a first reconfigurable logic device 120_a comprising a first plurality of data processing engines 5011 configured to perform data processing in parallel; first reconfigurable routing channels 5013 connecting at least some of the first plurality of data processing engines 5011; a neural network deployed at least partly on the first reconfigurable logic device 5011; a first processor 120_c that selectively activates or deactivates different combinations of the first plurality of data processing engines 5011 and the first reconfigurable routing channels 5013 to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis.
  • Such operation(s) may include one or more of (a) obtaining sensor data directly from one or more sensors of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; (c) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; (d) repetitively performing, for one or more times, downsampling operations comprising: (1) performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (2) performing a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result after (d); (e) performing the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; (f) repetitively performing up sampling operations comprising: (3) performing an up sampling of the fourth convolution result
  • obtaining sensor data from one or more sensors (in the imager 116) of the sequencing system may be via a direct connection.
  • the direct connection between the first reconfigurable logic device (120 and 120_a) and the sensor(s) lacks other hardware components that may process or store the sensor data thus causing undesired complexity, delay, and possible errors in sensor data communication.
  • Such hardware components include the first processor 120_c, the memory device 5030, or any processors, e.g., computing system 126, e.g., CPU, of the sequencing system.
  • the direct sensor data communication herein advantageously improves data transmission efficiency from the sensor to the FPGAs 120, frees-up the other hardware(s), e.g., CPUs, storage devices, for other data processing functions, decreases power consumption from indirect data communication, and reduces time consumption in data communication thus sequencing analysis.
  • the connection between the first reconfigurable logic device (120 and 120_a) and sensor may include other hardware components that may process or store the sensor data.
  • Such hardware components may include the first processor 120_c, the memory device 5030, or any processors, e.g., CPUs 126 of the sequencing system.
  • the sensor data may be saved into the memory device 5030, and then it can be accessed by the first reconfigurable logic device using memory controller(s) 5013.
  • the reconfigurable logic device may include digital logic circuits therein, in a sense that it is also an integrated circuit.
  • the integrated circuit herein e.g., the Al chip, NPU, etc.
  • the integrated circuit may have various difference with the reconfigurable logic device, e.g., the integrated circuit may not be as flexible in reconfiguration as the reconfigurable logic device.
  • the integrated circuit herein e.g., the Al chip, NPU, etc., may not be reconfigurable.
  • the sequencing system 110 comprises at least one reconfigurable logic device but lacks any integrated circuits, e.g., Al chips, ASIC chips, or NPUs.
  • the reconfigurable logic device may perform one or more operations in sequencing analysis and may forward its output back to the CPU as end results of primary analysis, e.g. base calls. Alternatively, the reconfigurable logic device may forward its output back to the CPU so that subsequent operations may be performed based on its output by the CPU to generate the end results of sequencing analysis.
  • the sequencing system 110 comprises at least one reconfigurable logic device, and at least one integrated circuit as shown in FIG. 5C.
  • the integrated circuit may perform one or more operations in sequencing analysis and may forward its output back to the reconfigurable logic device so that subsequent operations may be performed based on its output at the reconfigurable logic device.
  • the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
  • the data communication between any two of the reconfigurable logic device, the integrated circuits, the first processor, and the second processor may be direct such that the direct communication lacks any other hardware components that may process or store the data.
  • Such other hardware components may include memory device(s), and/or other processor(s) of the sequencing system.
  • Such direct communication may include DMA connections.
  • the data communication the data communication between any two of the reconfigurable logic device, the integrated circuits, the first processor, and the second processor may be direct such the data may not be utilized by other logic circuits or stored before reaching its communication destination, but the data may be stored in a memory device before reach its communication destination.
  • the sequencing system 110 may include a first reconfigurable logic device 120_a, e.g., FPGA, comprising a first plurality of data processing engines 5011 configured to perform data processing in parallel; an integrated circuit 120_b, e.g., an Al chip; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines alone or in combination with the fist routing channels to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis.
  • the sequencing analysis may include operations or steps of secondary analysis.
  • Such operation(s) may include one or more of: obtaining sensor data from one or more image sensors of the sequencing system; processing the sensor data to generate a first plurality of flow cell images; and communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit.
  • the sequencing system may include a second processor or the first processor to control the integrated circuit to perform one or more operations in sequencing analysis to facilitate generating the sequencing analysis result(s).
  • the sequencing analysis may include operations or steps of primary analysis and/or secondary analysis.
  • Such operation(s) may include one or more of: receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; determining polonies from the second plurality of flow cell images; performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the determined polonies; corresponding base callings of polonies in the second plurality of flow cell images to one or more of: the first reconfigurable logic device 120_a, the first 120_c or second processor, and/or one or more processors of the sequencing system 126.
  • the operation of forwarding the second plurality of flow cell images, the determined polonies; corresponding base callings of polonies in the second plurality of flow cell images comprises forward to a memory device herein, e.g., DDR memory, so that one or more of: the first reconfigurable logic device 120_a, the first 120_c or second processor, and/or one or more processors of the computing system 126 can access the data from the memory. Accessing data from the memory including reading, writing, editing, etc., may be assisted by the memory controllers disclosed herein.
  • the sequencing system 110 comprises at least one reconfigurable logic device, and at least one integrated circuit as shown in FIG. 5C.
  • the integrated circuit may perform one or more operations in sequencing analysis and may generate its output as the end results of primary analysis and forward its output to one or more devices including: the reconfigurable logic device, the first or second processor, the hardware processor of the sequencing system, etc., so that the end results can be saved or presented to a user.
  • the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support.
  • the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions.
  • the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
  • the integrated circuit may perform one or more operations in sequencing analysis and generate its output as intermediate results of primary analysis, e.g., location of polonies, and may forward its output back to one or more of: the reconfigurable logic device, the first or second processor, the hardware processor of the sequencing system, etc., so that the end results can be determined based on its output.
  • the integrated circuit may forward its output, either intermediate or end results, to be stored in a memory device, so that one or more devices including: the reconfigurable logic device, the first or second processor, and the hardware processor of the sequencing system can access the stored output whenever needed.
  • the access to the output stored in a memory device can be via a memory controller of the sequencing system, e.g., 5013.
  • the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
  • the sequencing system comprises: a first reconfigurable logic device comprising a first plurality of data processing engines configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines.
  • the different combinations of the first plurality of data processing engines may be configured to perform operations comprising: obtaining sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images; and communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit.
  • the integrated circuit may perform operations comprising: (1) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; and (2) predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; and (3) communicating the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
  • the sequencing system comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising: (a) obtaining sensor data from one or more sensors of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; and (c) communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit; wherein the integrated circuit performs operations comprising: (d) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; (e) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; (f) repetitively performing, for one or more times, down
  • the first reconfigurable routing channels comprises one or more electronic nodes, and the electronic nodes are programmable.
  • the electronic nodes here may include junction points in the circuit(s).
  • the electronic nodes may include points where two or more circuit elements are connected together.
  • the first reconfigurable routing channels comprises one or more interconnects.
  • the interconnect may include the physical wiring(s) that connects transistors and other components on an integrated circuit.
  • reconfigurable routing channels comprises one or more memory controllers, e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels comprises one or more network- on-chips (NoCs), e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels may comprise one or more of: a network-on-chip (NoC), and a memory controller.
  • the first reconfigurable routing channels may be configured to passively communicate data between components of the sequencing system.
  • the reconfigurable routing channels may be configured to communicate data bilaterally between the data processing engines, e.g., 5011 in FIG. 5C and the memory device, e.g., 5030 in FIG. 5C.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device, e.g.,120_a, and one or more memory devices, e.g., 5030.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device e.g., 120_a, and the integrated circuit, e.g., 120_b.
  • the reconfigurable logic device herein may each comprise one or more data processing engines, e.g., 5011.
  • Each data processing engine may comprise multiple digital logic circuits.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels.
  • the first reconfigurable logic device may comprise digital circuits that are integrated and forming a FPGA device.
  • the FPGA device in FIG. 5C includes the first reconfigurable logic device, the DMA connections, the first reconfigurable routing channels (e.g., NoC and memory controllers).
  • the sequencing system may further comprise one or more memory devices electrically connected for data communication with one or more components of the sequencing system, the one or more components may include one or more of: the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; a second processor; and one or more processors of the sequencing system.
  • the sequencing system further comprises one or more direct data access (DMA) connections, e.g., 5012 in FIG. 5C, that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections may be configured to actively communicate data between components of the sequencing system.
  • the DMA connections may be configured to fetch data or send data to components that are connected thereto, e.g., the data processing engines, e.g., 5011 in FIG. 5C and the reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections herein may be configured to actively request data from or actively sending data directly to: the first reconfigurable logic device; the first reconfigurable routing channels; the integrated circuit; or a combination thereof.
  • One or more direct data access (DMA) connections may be in data communication with the first reconfigurable routing channels and the integrated circuit herein.
  • the DMA connections may be configured to allow data communication based on a predetermined protocol, e.g., a PCIe protocol.
  • the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
  • the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
  • the sequencing system further comprises an integrated circuit that is different from the first reconfigurable logic device, e.g., 120_b in FIG. 5C.
  • the integrated circuit herein may not be reconfigurable.
  • the integrated circuit may comprise an application specific integrated circuit (ASIC) chip.
  • the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip.
  • the integrated circuit may comprise a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits.
  • the integrated circuit may further comprise: second plurality of data processing engines and second routing channels, each connecting at least some of the second plurality of data processing engines.
  • the sequencing system further comprises a first processor.
  • the first processor may be configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations disclosed herein.
  • the sequencing system further comprises a second processor.
  • the second processor may be configured to control digital circuits of the integrated circuit herein.
  • the first processor, or a second processor, e.g., of the integrated circuit is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations.
  • the first processor or a second processor may be configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations herein.
  • the sequencing system may further comprise a housing that encloses the first reconfigurable logic device, the first reconfigurable routing channels, the one or more DMA connections, the integrated circuit, and the first processor therein.
  • the sequencing system further comprises: a housing that encloses at least the first reconfigurable logic device therein and the integrated circuit is external to the housing.
  • the sequencing system further comprises: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit.
  • a first power level supplied by the power source to the first reconfigurable logic device may be higher than a second power level supplied to the integrated circuit while a sequencing run and/or sequencing analysis is in progress.
  • a maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers, e.g., traditional sequencers without the first reconfigurable logic device (e.g., FPGA), the integrated circuit (e.g., Al chip), or both.
  • the time consumption in performing a sequencing run and corresponding sequencing analysis (e.g., primary analysis) thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips).
  • Time consumption in performing a sequencing run and sequencing analysis of the sequencing run (e.g., primary analysis) using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and analysis using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips).
  • a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding sequencing analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts.
  • the power source may be configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • the power source may be configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • one or more components of the first reconfigurable logic device and/or integrated circuit may include a computational performance of at least 2, 4, 8, 10, 16, 20, 30, 40, 50, 60, 70, 80, or 100 Giga-operations per second (GOPs) or more.
  • one or more processing engines of the first reconfigurable logic device and/or integrated circuit may include a computational performance of at least 12, 4, 8, 10, 16, 20, 30, 40, 50, 60, 70, 80, or 100 Giga-operations per second (GOPs), or more Giga-operations per second (GOPs), or more.
  • the first reconfigurable logic device and/or the integrated circuit includes a computational performances of at least 10, 20, 40, 50, 60, 80, or 100 Tera-operations per second (TOPs).
  • one or more components are located on a first printed circuit board (PCB).
  • the one or more components may include: the first reconfigurable logic device the first reconfigurable routing channels; the first processor; and the one or more DMA connections.
  • the integrated circuit is located on a second printed circuit board (PCB) different from the first printed circuit board, e.g., as shown in FIG. 5C.
  • the integrated circuit and the second PCB may be positioned within a same housing of the sequencing system as the first PCB or external to the housing of the sequencing system. Being on a separate PCB makes connecting the first reconfigurable logic device, e.g., FPGA device with various integrated circuit on a chip convenient, efficient, and easily customizable.
  • the first PCB board may be a main board
  • the second PCB board may be a daughter board or edge unit.
  • the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs). Instead, the sequencing systems utilizes FPGAs, Al chips, NPUs, or other ASIC chips for performing the operations disclosed herein.
  • the sequencing system disclosed herein advantageously requires less power, generates less heat, and reduces the hardware complexity and costs for performing NGS sequencing runs and corresponding sequencing analysis than sequencers that use GPUs or TPUs.
  • the sequencing systems include logic devices that are not limited to reconfigurable logic devices (e.g., FPGAs) and/or other integrated circuits (e.g., Al chips, NPUs).
  • the sequencing systems include various types of processing units or processors configured for reconfigurable parallel processing,
  • the sequencing systems include various types of logic devices or integrated circuits, e.g., ASIC chips.
  • the sequencing systems include GPUs, TPUs, or other various types of processing units that are configured to perform one or more operations disclosed herein.
  • the sequencing systems include GPUs, TPUs, or other various types of processing units that are configured to perform one or more operations that can be performed by the reconfigurable logic devices (e.g., FPGAs) and/or other integrated circuits (e.g., Al chips, NPUs).
  • the reconfigurable logic devices e.g., FPGAs
  • other integrated circuits e.g., Al chips, NPUs.
  • the first processor may be positioned on the first PCB board together with the reconfigurable logic device for convenient and efficient control of the reconfigurable logic device.
  • the first processor is a separate processor from one or more processors of the sequencing system configured to control the optical system, the fluidics of the sequencing system, etc.
  • the first processor can be configured to only control the components on the first PCB board, e.g., the FPGA device, alone or in combination with components on the second PCB board, e.g., the Al chip.
  • the sequencing system may comprise a second processor that is configured to separately control the Al chip.
  • the first processor or second processor of the sequencing system e.g., 120_c, may comprise a CPU.
  • the one or more hardware processors of the sequencing system comprises a CPU.
  • the first or second processor e.g., 120_c
  • the first or second processor e.g., 120_c
  • the first or second processor comprises only CPU(s).
  • the sequencing system may further comprise a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
  • the operation for processing the sensor data to generate the first plurality of flow cell images comprises one or more of: registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
  • each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are in real time. In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are within the time window of performing sequencing reactions and/or imaging of a single sequencing cycle of the sequencing run. In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are within the time window of performing sequencing reactions and/or imaging of a single z-level of a single sequencing cycle.
  • FIG. 5D shows an exemplary embodiment of performing sequencing analysis in parallel with performing a sequencing run.
  • the sequencing run includes multiple sequencing cycles, only part of a single cycle is shown herein.
  • flow cell images are acquired at multiple z-levels from different color channels of an in situ sample .
  • the sequencing reactions are repeatedly performed for each z-level in each cycle within a time window 5601.
  • the operations of the integrated circuit are performed within a processing window 5602 within the time window 5609 of a single sequencing cycle and also within a time window 5601 for sequencing reactions and imaging at a single z-level 5601.
  • the operations of the first reconfigurable logic device are also performed with a processing window 5603 that is within the time window 5609 of each sequencing cycle.
  • the processing windows 5602 and 5603 may be of identical or different duration depending on various factors such as sequencing data, primary analysis algorithms, etc.
  • the operations are not just performed within the processing windows but completed within the processing windows with respect to the data of the current cycle, e.g., of a preceding z-level of the current cycle that sensor data has been acquired.
  • the operations are completed within the processing windows with respect to the data of a preceding cycle, e.g., the cycle immediately preceding the current cycle.
  • the operations are performed for a single z level in each cycle within a predetermined time window, e.g., 5602, 5603.
  • the predetermined time window is for a single z level in a single sequencing cycle.
  • the predetermined time window is less than 1000 ms, 900 ms, 800 ms, 700ms, 600 ms, 500 ms, 400 ms, 300 ms, 250 ms, 200 ms, or 100 ms.
  • each of the one or more operations are performed within the predetermined time window and in parallel while the sequencing run is in progress.
  • each of the one or more operations are performed in parallel within a time window that sequencing, imaging, or both of a subsequent sequencing cycle is completed.
  • the first plurality of flow cell images herein may be obtained from a single z level of a 2D or 3D sample. In some embodiments, the first plurality of flow cell images herein may be obtained from multiple z levels covering at least partly of an in situ sample, e.g., of cells or tissue(s). The first plurality of flow cell images may be obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample. In some embodiments, the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from multiple color channels. In some embodiments, the first plurality of flow cell images are from a single sequencing cycle.
  • the first plurality of flow cell images are from multiple sequencing cycles.
  • the first plurality of flow cell images may be of a first spatial resolution in x, y, and/or z directions.
  • the second plurality of flow cell images may be generated based on the first plurality of flow cell images.
  • the second plurality of flow cell images may be of a second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be lower than the second spatial resolution, and a higher resolution herein indicates that a pixel size is smaller so that the polonies in the flow cell images are of finer spatial details.
  • the first spatial resolution may be 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be at least 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x,y, and/or z directions.
  • the first and second resolution is in 3D.
  • the first resolution is in a range of 0.1 um to 5 um.
  • the second resolution is in a range of 0.01 um to 2 um.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the sequencing system further comprises one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support.
  • the support may comprise a glass or plastic substrate.
  • the support may be included in a flow cell device.
  • the one or more image sensors may be configured to generate sensor data based on the optical signals.
  • the sequencing system further comprises: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations disclosed herein.
  • the one or more data storage devices may include one or more memory devices.
  • the one or more memory devices may be accessible by the one or more processors, the first processor, the second processor, the first reconfigurable logic device, and the integrated circuit.
  • the one or more processors are separate from the first or second processors.
  • the operations performed by the one or more processors may include one or more of 1) recording sensor data generated in the sequencing system in one or more flow cycles; 2) optionally processing the recorded sensor data; 3) sending the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit; 4) receiving outcome from the first reconfigurable logic device or integrated circuit; and 5) generating sequencing analysis results based on the received outcome.
  • the operations performed by the one or more processors may include one or more of 1) receiving outcome from the first reconfigurable logic device or integrated circuit; and 2) generating sequencing analysis results based on the received outcome.
  • the sequencing analysis results comprise primary analysis results.
  • the sequencing analysis results comprise a data file in a predetermined data format.
  • the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing system further comprises: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors.
  • the optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images.
  • the support may be comprised in a flow cell device.
  • the operation(s) performed by the first reconfigurable routing channels or the integrated circuit using the neural network comprises one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings.
  • the neural network herein comprises a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network comprises a U-Net.
  • the neural network has been pretrained.
  • the neural network has been trained using the first reconfigurable logic device or the integrated circuit.
  • the neural network is a 3D neural network.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel has at least four dimensions.
  • the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
  • n is an integer from 1 to 16384.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n is in a range from 4 to 1024.
  • the neural network has been trained using the first reconfigurable logic device or the integrated circuit.
  • the neural network is a 2D neural network.
  • the first convolution comprises a 2D convolution with a convolution kernel.
  • the convolutional kernel has at least three dimensions.
  • the convolutional kernel is m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer. In some embodiments, n is an integer from 1 to 16384.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n is in a range from 4 to 1024.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n filters in a first, second, third repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, filters in a last repetition, last minus one, last minus two, repetition, respectively.
  • n is in a range from 4 to 1024.
  • the neural network is pretrained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s).
  • the neural networks pretrained with 2D flow cell images are less complex and requires less computational effort in making predictions or inferences, thereby providing higher efficiency and saving time and computational effort in its prediction of polony locations.
  • the neural network pretrained with 2D flow cell images may predict polony locations per tile per cycle in a time window that is lOx, 50x, 80x, lOOx, 200x, 400x, 600x, 800x, lOOOx, 1500x, 2000x or less than making identical predictions using neural networks trained from 3D volumes of flow cell images.
  • the neural network pretrained with 2D flow cell images may predict polony locations per tile per cycle using the reconfigurable logic device and/or other integrated circuits, e.g., FPGA and/or Al chips, in a time window that is 5x, lOx, 20x, 40x, 50x, 80x, lOOx, 200x, 400x, 600x, 800x, lOOOx or less than identical neural network using CPUs or other processors.
  • the reconfigurable logic device and/or other integrated circuits e.g., FPGA and/or Al chips
  • the operation (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
  • the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
  • the fourth plurality of flow cell images comprises the second resolution.
  • the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity. In some embodiments, the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles. In some embodiments, the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences.
  • the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
  • the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
  • the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -IO 10 per mm 2 .
  • the down-sampling factor is 2, 4, or 8. In some embodiments, the up-sampling factor is 2, 4, or 8. In some embodiments, the downsampling factor is 2, 4, 8, 16, 32, 64, or more. In some embodiments, the up-sampling factor is 2, 4, 8, 16, 32, 64, or more.
  • one or more of operations of (a) to (k) are performed while a sequencing run is being performed.
  • the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500.
  • the one or more cycles comprises a current cycle N.
  • N is in a range from 1 to 500.
  • the one or more cycles comprises a single cycle ranging from 1 to 500.
  • the one or more cycles comprises multiple cycles ranging from 1 to 500.
  • one or more of operations e.g., operations (a) to (j), are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • the training data set of flow cell images comprises z-stacks of flow cell images taken at different z-locations, and each z-stack is used as a 3D volume for training the neural network.
  • the training data set of flow cell images comprises 2D flow cell images taken at different z-locations, and individual 2D flow cell images at multiple z-levels are used as 2D images for training the neural network.
  • the training data set of flow cell images comprises simulated flow cell images of in situ samples at different z-locations. In some embodiments, the training data set of flow cell images comprises actual flow cell images acquired from in situ samples at different z-locations.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing up sampling operations comprises: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; (4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous downsample repetition, wherein the first up-sampled result has a same size as the first down- sampled result in the previous down-sampling repetition; and (5) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the different combinations of the first plurality of data processing engines are configured to perform operations further comprising: (a) receiving the second plurality of flow cell images from the integrated circuit; (b) determining polonies from the second plurality of flow cell images; and (c) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (d) forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings to the first processor or one or more hardware processors of the sequencing system or a combination thereof.
  • the one or more operations performed by the first reconfigurable logic device further comprises: forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
  • the one or more operations performed by the integrated circuit further comprises forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system.
  • the operations performed by the first reconfigurable logic device or the integrated circuit further comprising: registering the second plurality of flow cell images to a common coordinate system.
  • the operations performed by the integrated circuit further comprising one or more of: determining polonies from the second plurality of flow cell images; performing a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
  • the operation (d) or (i) of determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies.
  • the operation of generating a 3D polony map comprising spatial location of polonies based on the determined polonies may further comprise: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus.
  • the operation of determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
  • Exemplary embodiments of methods for generating the polony maps are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
  • sequencing methods comprising operations herein. Such operation may include one or more of: (a) obtaining, by a first reconfigurable logic device of a sequencing system, sensor data from one or more sensors of the sequencing system; (b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images; (c) predicting, by the first reconfigurable logic device, a second plurality of flow cell images using a neural network at least partly deployed on the first reconfigurable device and based on the sensor data or the first plurality of flow cell images; (d) determining, by the first reconfigurable logic device, polonies from the second plurality of flow cell images; (e) performing, by the first reconfigurable logic device, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (f) optionally forwarding, by the first reconfigurable logic device, the second plurality of flow cell images, the corresponding base calling,
  • sequencing methods comprising operations herein. Such operations may include one or more of (a) obtaining, by the first reconfigurable logic device, sensor data from one or more image sensors of the sequencing system; (b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images; (c) communicating, by the first reconfigurable logic device to an integrated circuit, the sensor data, the first plurality of flow cell images, or both; (d) receiving, by the integrated circuit and from the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both; (e) predicting, by the integrated circuit, a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; (f) determining, by the integrated circuit, polonies from the second plurality of flow cell images; and (g) performing, by the integrated circuit, a corresponding base calling for each of the determined polonies based on
  • sequencing methods comprising operations herein.
  • Such operation may include one or more of (a) obtaining, by the first reconfigurable logic device of a sequencing system, sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images; (b) communicating, by the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both to the integrated circuit; (c) receiving, by the integrated circuit of the sequencing system, the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; (d) predicting, by the by the integrated circuit, a second plurality of flow cell images using a neural network deployed at least partly on the integrated circuit and based on the sensor data, the first plurality of flow cell images, or both; and (e) communicating, by the integrated circuit, the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
  • the first reconfigurable routing channels comprises one or more electronic nodes, and the electronic nodes are programmable.
  • the electronic nodes here may include junction points in the circuit(s).
  • the electronic nodes may include points where two or more circuit elements are connected together.
  • the first reconfigurable routing channels comprises one or more interconnects.
  • the interconnect may include the physical wiring(s) that connects transistors and other components on an integrated circuit.
  • reconfigurable routing channels comprises one or more memory controllers, e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels comprises one or more network- on-chips (NoCs), e.g., 5013 in FIG. 5C.
  • the first reconfigurable routing channels may comprise one or more of a network-on-chip (NoC), and a memory controller.
  • the first reconfigurable routing channels may be configured to passively communicate data between components of the sequencing system.
  • the reconfigurable routing channels may be configured to communicate data bilaterally between the data processing engines, e.g., 5011 in FIG. 5C and the memory device, e.g., 5030 in FIG. 5C.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
  • the first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
  • the reconfigurable logic device herein may each comprise one or more data processing engines.
  • Each data processing engine may comprise multiple digital logic circuits.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto.
  • the first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels.
  • the first reconfigurable logic device may comprise a first integrated circuit forming a FPGA device.
  • the FPGA device in FIG. 5C includes the first reconfigurable logic device, the DMA connections, and the first reconfigurable routing channels (e.g., NoC and memory controllers).
  • the sequencing system may further comprises one or more memory devices electrically connected for data communication with one or more components of the sequencing system, the one or more components may include one or more of the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; a second processor; and one or more processors of the sequencing system.
  • the sequencing system further comprises one or more direct data access (DMA) connections, e.g., 5012 in FIG. 5C, that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections may be configured to actively communicate data between components of the sequencing system.
  • the DMA connections may be configured to fetch data or send data to components that are connected thereto, e.g., the data processing engines, e.g., 5011 in FIG. 5C and the reconfigurable routing channels, e.g., 5013 in FIG. 5C.
  • the DMA connections herein may be configured to actively request data from or actively sending data directly to: the first reconfigurable logic device; the first reconfigurable routing channels; the integrated circuit; or a combination thereof.
  • One or more direct data access (DMA) connections may be in data communication the first reconfigurable routing channels and the integrated circuit herein.
  • the DMA connections may be configured to allow data communication based on a predetermined protocol, e.g., a PCIe protocol.
  • the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
  • the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
  • the sequencing system further comprises an integrated circuit that is different from the first reconfigurable logic device, e.g., 120_b in FIG. 5C.
  • the integrated circuit herein may not be reconfigurable.
  • the integrated circuit may comprise an application specific integrated circuit (ASIC) chip.
  • the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip.
  • the integrated circuit may comprise a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits.
  • the integrated circuit may further comprise: second plurality of data processing engines and second routing channels, each connecting at least some of the second plurality of data processing engines.
  • the sequencing system further comprises a first processor.
  • the first processor may be configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations disclosed herein.
  • the first processor or a second processor is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations.
  • the first processor or a second processor may be configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations herein.
  • the sequencing system may further comprise a housing that encloses the first reconfigurable logic device, the first reconfigurable routing channels, the one or more DMA connections, the integrated circuit, and the first processor therein.
  • the sequencing system further comprises: a housing that encloses at least the first reconfigurable logic device therein and the integrated circuit is external to the housing.
  • the sequencing system further comprises: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit.
  • a first power level supplied by the power source to the first reconfigurable logic device may be higher than a second power level supplied to the integrated circuit while a sequencing run and/or sequencing analysis is in progress.
  • a maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers, e.g., traditional sequencers without the first reconfigurable logic device (e.g., FPGA), the integrated circuit (e.g., Al chip), or both.
  • the time consumption in performing a sequencing run and corresponding sequencing analysis (e.g., primary analysis) thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips).
  • Time consumption in performing a sequencing run and sequencing analysis of the sequencing run (e.g., primary analysis) using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and analysis using a sequencer without the first reconfigurable logic device, the integrated circuit, or both(e.g., a traditional sequencer without FPGA and/or Al chips).
  • a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding sequencing analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts.
  • the sequencing system further comprises a power source configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • the sequencing system further comprises a power source configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
  • one or more components are located on a first printed circuit board (PCB).
  • the one or more components may include: the first reconfigurable logic device the first reconfigurable routing channels; the first processor; and the one or more DMA connections.
  • the integrated circuit is located on a second printed circuit board (PCB) different from the first printed circuit board, e.g., as shown in FIG. 5C.
  • the integrated circuit and the second PCB may be positioned within a same housing of the sequencing system as the first PCB or external to the housing of the sequencing system. Being on a separate PCB makes connecting the first reconfigurable logic device, e.g., FPGA device with various integrated circuit on a chip convenient, efficient, and easily customizable.
  • the first PCB board may be a main board
  • the second PCB board may be a daughter board.
  • the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs). Instead, the sequencing systems utilizes FPGAs, Al chips, NPUs, or other ASIC chips for performing the operations disclosed herein.
  • the sequencing system disclosed herein advantageously requires less power, generate less heat, and reduces the hardware costs for performing NGS sequencing runs and corresponding sequencing analysis.
  • the first processor may be positioned on the first PCB board together with the reconfigurable logic device for convenient and efficient control of the reconfigurable logic device.
  • the first processor is a separate processor from one or more processors of the sequencing system configured to control the optical system, the fluidics of the sequencing system, etc.
  • the first processor can be configured to only control the components on the first PCB board, e.g., the FPGA device, alone or in combination with components on the second PCB board, e.g., the Al chip.
  • the sequencing system may comprise a second processor that is configured to separately control the Al chip.
  • the first processor or second processor of the sequencing system e.g., 120_c, may comprise a CPU.
  • the one or more hardware processors of the sequencing system comprises a CPU.
  • the sensor data at the imager 116 can be communicated directly to the data processing engine(s) 5011 of the first reconfigurable logic device 120(a).
  • the sensor data may be saved into a memory device, e.g., 5030 so that it can be accessed by the data processing engine.
  • the first processor 120_c may control operation of the data processing engines and the routing channels to process the sensor data and generate the first plurality of flow cell images.
  • the processing may include operations disclosed herein such as intensity normalization, color correction, phasing and prephasing correction, background subtraction, etc.
  • the first plurality of flow cell images may then be communicated from the processing engines through the routing channels to the memory device 5030 so that the integrated circuit may be controlled by the first processor or a second processor to access the first plurality of flow cell images for subsequent steps in primary analysis.
  • the first plurality of flow cell images may be directly communicated to the integrated circuit 120-b via DMA connections 5012.
  • the integrated circuit is only used for prediction higher resolution polony locations using a pretrained CNN, thereby generating the second plurality of flow cell images with a resolution that is at least 8 times higher than the resolution of the first plurality of flow cell images.
  • the CNN may be pretrained using simulated images or real flow cell images.
  • the second plurality of flow cell images are transmitted back from the integrated circuit to the first reconfigurable logic device for subsequent processing steps such as base calling.
  • the base calls along with quality information may then be saved into a FastQ data file.
  • Other information including cell segmentation and staining may also be saved in the same file or another FastQ file with compatible data format.
  • the sequencing system may further comprise a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
  • the operation for processing the sensor data to generate the first plurality of flow cell images comprises one or more of registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
  • each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are performed within the time window of performing a single sequencing cycle of the sequencing run.
  • FIG. 5D shows an exemplary embodiment of performing sequencing analysis in parallel with performing a sequencing run.
  • the sequencing run include multiple sequencing cycles. For each cycle, flow cell images are acquired at multiple z-levels from different color channels. The sequencing reactions are repeatedly performed for each z- level in each cycle within a time window 5601. The operations of the integrated circuit are performed within a processing window 5602 within the time window 5609 of a single sequencing cycle and also within a time window 5601 for sequencing reactions and imaging at a single z-level 5601.
  • the operations of the first reconfigurable logic device are also performed with a processing window 5603 that is within the time window 5609 of each sequencing cycle.
  • the processing windows 5602 and 5603 may be identical or different depending on various factors such as sequencing data, primary analysis algorithms, etc.
  • the operations are not just performed within the processing windows but completed within the processing windows with respect to the data of the current cycle, e.g., a preceding z- level that sensor data has been acquired.
  • the operations are completed within the processing windows with respect to the data of a preceding cycle, e.g., the cycle immediately preceding the current cycle.
  • the operations are performed for a single z level in each cycle within a predetermined time window, e.g., 5602, 5603.
  • the predetermined time window is for a single z level in a single sequencing cycle.
  • the predetermined time window is less than 1000 ms, 900 ms, 800 ms, 700 ms, 600 ms, 500 ms, 400 ms, 300 ms, 250 ms, 200 ms, or 100 ms.
  • each of the one or more operations are performed within the predetermined time window and in parallel while the sequencing run is in progress.
  • the first plurality of flow cell images herein may be obtained from multiple z levels covering at least partly of an in situ sample, e.g., of cells or tissue(s).
  • the first plurality of flow cell images may be obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample.
  • the first plurality of flow cell images are from a single color channel.
  • the first plurality of flow cell images may be of a first spatial resolution in x, y, and/or z directions.
  • the second plurality of flow cell images may be generated based on the first plurality of flow cell images.
  • the second plurality of flow cell images may be of a second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be lower than the second spatial resolution, and a higher resolution herein indicate that a pixel size is smaller so that the polonies in the flow cell images are of finer spatial details.
  • the first spatial resolution may be 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x, y, and/or z directions.
  • the first spatial resolution may be at least 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x,y, and/or z directions.
  • the first and second resolution is in 3D.
  • the first resolution is in a range of 0.1 um to 5 um.
  • the second resolution is in a range of 0.01 um to 2 um.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the sequencing system further comprises one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support.
  • the support may comprise a glass or plastic substrate.
  • the support may be comprised in a flow cell device.
  • the one or more image sensors may be configured to generated sensor data based on the optical signals.
  • the sequencing system further comprises: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations disclosed herein.
  • the one or more data storage devices may include one or more memory devices.
  • the one or more memory devices may be accessible by the one or more processors, the first processor, the second processor, the first reconfigurable logic device, the integrated circuit.
  • the one or more processors are separate from the first or second processors.
  • the operations performed by the one or more processors may include one or more of: 1) recording sensor data generated in the sequencing system in one or more flow cycles; 2) optionally processing the recorded sensor data; 3) sending the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit; 4) receiving outcome from the first reconfigurable logic device or integrated circuit; and 5) generating sequencing analysis results based on the received outcome.
  • the operations performed by the one or more processors may include one or more of: 1) receiving outcome from the first reconfigurable logic device or integrated circuit; and 2) generating sequencing analysis results based on the received outcome.
  • the sequencing analysis results comprise primary analysis results.
  • the sequencing analysis results comprise a data file in a predetermined data format.
  • the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
  • the sequencing system further comprises: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors.
  • the optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images.
  • the support may be comprised in a flow cell device.
  • the output data comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data comprises identification of base calling locations in two dimensions. In some embodiments, the output data comprises identification of base calling locations in three dimensions.
  • the operation(s) performed by the first reconfigurable routing channels or the integrated circuit using the neural network comprises one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings.
  • the neural network comprises a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network comprises a U-Net.
  • the neural network has been trained using the first reconfigurable logic device or the integrated circuit.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel have at least four dimension.
  • the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer. In some embodiments, n is an integer from 1 to 16384.
  • the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n is in a range from 4 to 1024.
  • the operation (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
  • the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
  • the fourth plurality of flow cell images comprises the second resolution.
  • the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity. In some embodiments, the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles. In some embodiments, the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences.
  • the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
  • the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
  • the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -10 10 per mm 2 .
  • the down-sampling factor is 2, 4, or 8. In some embodiments, the up-sampling factor is 2, 4, or 8. In some embodiments, the downsampling factor is 2, 4, 8, 16, 32 or 64. In some embodiments, the up-sampling factor is 2, 4, 8, 16, 32, or 64.
  • one or more of operations of (a) to (k) are performed while a sequencing run is being performed.
  • the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500.
  • the one or more cycles comprises a current cycle N.
  • N is in a range from 1 to 500.
  • the one or more cycles comprises a single cycle ranging from 1 to 500.
  • the one or more cycles comprises multiple cycles ranging from 1 to 500.
  • one or more of operations e.g., operations (a) to (j), are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing up sampling operations comprises: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; (4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous downsample repetition, wherein the first up-sampled result has a same size as the first down- sampled result in the previous down-sampling repetition; and (5) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the different combinations of the first plurality of data processing engines are configured to perform operations further comprising: (a) receiving the second plurality of flow cell images from the integrated circuit; (b) determining polonies from the second plurality of flow cell images; and (c) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (d) forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
  • the one or more operations performed by the first reconfigurable logic device further comprises: forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
  • the one or more operations performed by the integrated circuit further comprises forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system.
  • the operations performed by the integrated circuit further comprising one or more of: determining polonies from the second plurality of flow cell images; performing a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
  • the operations performed by the first reconfigurable logic device or the integrated circuit further comprising: registering the second plurality of flow cell images to a common coordinate system.
  • the operation (d) or (i) of determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies.
  • the operation of generating a 3D polony map comprising spatial location of polonies based on the determined polonies may further comprise: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus.
  • the operation of determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
  • Exemplary embodiments of methods for generating 3D polony map are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
  • the method further comprises: providing the cellular sample harboring a plurality of RNA which comprises the first target RNA molecule and the second target RNA molecule. In some embodiments, the method further comprises: generating inside the cellular sample a plurality of cDNA molecules which include a first target cDNA molecule that corresponds to the first target RNA molecule and a second target cDNA molecule that corresponds to the second target RNA molecule. In some embodiments, the method further comprises: contacting the plurality of cDNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of first target-specific padlock probes and a second plurality of second target-specific padlock probes.
  • the method further comprises: contacting the plurality of RNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
  • individual padlock probes in the first plurality of first target-specific padlock probes comprise: first and second terminal regions, wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule or the first target RNA molecule, and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule or the first target RNA molecule.
  • contacting the plurality of RNA molecules in the cellular sample with the plurality of target-specific padlock probes comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule or the first target RNA molecule to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions.
  • the first targetspecific padlock probe comprises a first target barcode sequence that corresponds to and uniquely identifies the first target cDNA sequence or the first target RNA sequence.
  • the first target-specific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule or the first target RNA sequence.
  • the first target-specific padlock probe comprises at least one universal adaptor sequence.
  • the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer or a complementary sequence thereof.
  • the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site or a complementary sequence thereof.
  • the method further comprises: closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the cellular sample.
  • the method further comprises: conducting a rolling circle amplification reaction inside the cellular sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least the first concatemer molecule that corresponds to the first target RNA molecule, and the second concatemer molecule that corresponds to the second target RNA molecule.
  • the first concatemer comprises: tandem repeat units of: a first target barcode sequence that uniquely identifies the first target RNA or the first target cDNA sequence, a first insert sequences that corresponds to the first target RNA or the first target cDNA, and a first sequencing primer binding site or a complementary sequence thereof.
  • the first concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
  • the second concatemer comprises: tandem repeat units of: a second target barcode sequence that uniquely identifies the second target RNA or the second target cDNA sequence, a second insert sequences that corresponds to the second target RNA or the second target cDNA, and a second sequencing primer binding site or a complementary sequence thereof.
  • the second concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
  • conducting the one or more cycles of sequencing reactions comprises: contacting the plurality of concatemer molecules inside the cellular sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers.
  • the plurality of nucleotide reagents comprise: multivalent molecules, nucleotides, nucleotide analogs, or their combinations.
  • individual nucleotides or nucleotide analogs are detectably labeled or non-labeled.
  • the detectably labeled individual nucleotides or nucleotide analogs comprises a different detectable color label that corresponds with each different type of nucleotide base of A, G, C, and T/U.
  • an individual multivalent molecule comprise a core attached with multiple nucleotide arms and each arm of the individual multivalent molecule comprises the same type of nucleotide base.
  • generating the first plurality of flow cell images comprises: in each cycle, imaging, by an optical system, optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
  • the first plurality of flow cell images comprises optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
  • conducting the one or more cycles of sequencing reactions comprises: sequencing only the first target barcode sequence region of the first concatemer, thereby generating the first sequencing read product.
  • conducting the one or more cycles of sequencing reactions comprises: sequencing the first target barcode sequence region and at least a portion of the first insert sequence of the first concatemer, thereby generating the first sequencing read product.
  • conducting the one or more cycles of sequencing reactions comprises: sequencing only the second target barcode sequence region of the second concatemer, thereby generating the second sequencing read product. In some embodiments, conducting the one or more cycles of sequencing reactions comprises: sequencing the second target barcode sequence region and at least a portion of the second insert sequence of the second concatemer, thereby generating the second sequencing read product.
  • the method further comprises: removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample.
  • the method further comprises: reiteratively sequencing the plurality of concatemers by repeating the following operations for at least once: generating the first plurality of flow cell images of a cellular sample immobilized on a support by conducting one or more cycles of sequencing reactions thereby generating the first sequencing read product and the second sequencing product, the cellular sample comprising a plurality of concatemer molecules therewithin, wherein a first concatemer molecule of the plurality of concatemer molecules corresponds to a first target RNA molecule of the cellular sample, and a second concatemer molecule of the plurality of concatemer molecules corresponds to a second target RNA molecule of the cellular sample, wherein the first plurality of flow cell images; and removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample.
  • the first sequencing read product comprises some or all of: a first target barcode sequence in one or more tandem units of the first concatemer molecule; a first insert sequence in one or more tandem units of the first concatemer molecule; or their combinations.
  • the method further comprises: confirming presence of the first target RNA molecule, the second target RNA molecule, or both molecules in the cellular sample based on the performed base calling of the second plurality of flow cell images at the base calling locations in the base calling template.
  • the method further comprises: generating, by the sequencing system, the second plurality of flow cell images of the cellular sample immobilized on the support by conducting subsequent cycles of sequencing reactions after the one or more cycles.
  • generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the first concatemer inside the cellular sample under a condition that inhibits sequencing the second concatemer.
  • sequencing at least the first concatemer inside the cellular sample comprises: generating a plurality of first sequencing read products, and wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm presence of the first target RNA in the cellular sample.
  • generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the second concatemer inside the cellular sample under a condition that inhibits sequencing the first concatemer.
  • sequencing at least the second concatemer inside the cellular sample comprises: generating a plurality of second sequencing read products, and wherein sequences of the second sequencing read products are aligned with a second target reference sequence to confirm presence of the second target RNA in the cellular sample. Predicting high resolution flow cell images
  • FIG. 5A shows a flow chart of a computer-implemented method 500 for predicting high resolution flow cell images thereby improving detectable polony density in the flow cell images.
  • the method 500 can include some or all of the operations disclosed herein. The operations may be performed in but is not limited to the order that is described herein.
  • the method 500 can be performed by one or more processors disclosed herein.
  • the processor can include one or more of: a computing system comprising a processing unit 118, a reconfigurable logic device 120, an integrated circuit that is not reconfigurable 120, or their combinations.
  • the processing unit can include a central processing unit (CPU).
  • the reconfigurable logic device can include one or more FPGA devices.
  • the integrated circuit can include a chip such as an Al chip or an ASIC chip.
  • the one or more processors can include the computer system 400 disclosed herein.
  • some or all operations in method 500, 600, 700, 2800, and 2900 can be performed by the reconfigurable logic device, e.g., the FPGA(s), and/or the integrated circuit, e.g., the Al chip.
  • the reconfigurable logic device e.g., the FPGA(s)
  • the integrated circuit e.g., the Al chip.
  • the data produced by the reconfigurable logic device and/or integrated circuit, e.g., the FPGA(s) after performing one or more operations can be communicated to various hardware elements of the system 100, e.g., CPU(s) or GPU(s), so that subsequent operation(s) in method 500, 600, 700, 2800, and 2900 can be performed by such various hardware using the communicated data.
  • data can also be communicated in the opposite direction from various hardware e.g., CPU(s), to the reconfigurable logic device or the integrated circuit for processing.
  • all the operations in method 500, 600, 700, 2800, and 2900 can be performed by CPU(s).
  • the operations performed by CPU(s) can be performed by other processors such as the dedicated processors, or GPU(s).
  • all the operations in method 500, 600, 700, 2800, and 2900 can be performed by the reconfigurable logic device and/or the integrated circuit, e.g., FPGA(s) and/or the Al chip(s).
  • the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit, e.g., via DMA connections. In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit without being routed first to a CPU, a GPU, or any other processing units before reaching the reconfigurable logic device and/or the integrated circuit.
  • predicting high resolution flow cell images using the methods 500 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips may require at least 2x, 8x, lOx, 15x, 20x, 40x, 50x, or lOOx less power than making the same predict! on(s) using other computing hardware including but not limited to CPUs or GPUs.
  • the sequencing system herein further comprises: a power source that is configured to supply identical or different power levels to the reconfigurable logic device and the integrated circuit.
  • a maximum power output of the power source to the sequencing system in performing methods 500, 600, 700, 2800, and/or 2900 is less than 2000 Watts, 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, 300 Watts, 200 Watts, or 100 Watts.
  • the sequencing system herein comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 500, 2800) to make predictions.
  • a first reconfigurable logic device e.g., a FPGA unit
  • first reconfigurable routing channels each connecting at least some of the first plurality of data processing engines
  • a neural network deployed at least partly on the first reconfigurable logic device
  • a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 500
  • the sequencing system herein comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit in data communication with the first reconfigurable logic device; a neural network deployed at least partly on the integrated circuit and/or the first reconfigurable logic device; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations in methods herein (e.g., methods 500, 2800) to make prediction using the neural network.
  • the first reconfigurable logic device and the integrated circuit is within the same physical housing as the other elements of the sequencing system as show in FIG 1.
  • the first reconfigurable logic device and the integrated circuit are not physically external to the sequencing system 110 as shown in FIG. 1, e.g., not in the cloud 130.
  • the method 500 can comprise an operation 510 of (i) generating, by the sequencing system 110, a first plurality of flow cell images of sample(s) immobilized on a support by conducting one or more cycles of sequencing reactions.
  • the sample(s) may comprise concatemer molecules therewithin.
  • the sample(s) may include concatemer molecules from one or more different sample sources.
  • the sample(s) may include a thickness along the z-axis so that the first plurality of flow cell images may be acquired at a z-stack of different z-locations with a first resolution to cover the sample in 3D.
  • the samples may be acquired from a single z-location of a 2D or 3D sample.
  • the sample can be in situ.
  • the sample can be a 3D sample.
  • the sample can be a volumetric sample that may contain different biological information at the same x-y location but different z levels.
  • the sample can be a cellular sample including multiple cells, tissue, or their combination.
  • the sample can be any biological sample that has a thickness that is greater than a predetermined threshold along the z axis. For example, the thickness can be greater than 2 um, 3 um, 4 um, 5 um, 10 um, 20 um, or more.
  • the z axis (e.g., z axis) is orthogonal to the image plane defined by x and y axes.
  • the sample can be traditional 2D sequencing samples.
  • such computer-implemented method comprises an operation (i) of generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, wherein the first plurality of flow cell images are acquired with a first resolution.
  • Such operation is similar to operation 510 in FIG. 5 A except that the sample may be 2D or 3D sample.
  • the sample comprises concatemer molecules therewithin.
  • the sample comprises template molecules therewithin.
  • the flow cell images can be acquired using the optical system of the imager 116 disclosed herein, from the 1, 2, 3, 4, or more color channels.
  • Each flow cell image can include at least a portion of one or more tiles (e.g., imaging areas). Each tile can be divided into multiple subfiles.
  • Each tile or subtile can include a plurality of polonies or clusters. Each subtile can include multiple regions with each region including a number of polonies or clusters.
  • the flow cell image as disclosed herein can be an image that is acquired from a flow cell 112 as shown in FIG. 1 or 2712 as shown in FIG. 27.
  • the flow cell images are acquired from a single color channel, and subsequent prediction is by using a pretrained neural network corresponding to that single channel.
  • the flow cell images are acquired from 2, 3, 4, or more color channels, and subsequent prediction is by using a pretrained neural network corresponding to the multiple color channels.
  • a flow cell image herein can be an image of one or more tiles, one or more subtiles, one or more segmented regions within tile(s) or subtile(s), or their combinations.
  • Each flow cell image can comprise a field of view (FOV).
  • the FOV can be orthogonal to the z axis.
  • the FOV can be within the x-y plane.
  • the FOV of different flow cell images at different z levels can be identical within the x-y plane.
  • the FOV of different flow cell images at different z levels can have at least an overlapping portion within the x-y plane.
  • the image resolution of different flow cell images at different z levels can be about identical or exactly identical.
  • FIGS. 3A and 3D show two exemplary flow cell images acquired at two different z levels along the z axis of a same 3D sample within a same sequencing cycle.
  • the FOV can be in 3D and be of various sizes to cover the volumetric sample to be imaged.
  • the FOV along x, y, and/or z direction can be in a range from 10 um to 5 mm.
  • the FOV along x, y, and/or z direction can be in a range from about 0.1 um to about 2 mm.
  • the FOV along x, y, and/or z direction can be in a range from 0.5 um to 1 mm.
  • the FOV can be about 0.5 mm by 0.5 mm by 20 um for certain cellular samples along the x, y, and z direction, respectively.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be any integer greater than 64 or 128.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be in a range from 2 to 65536.
  • a single flow cell image can be separated into different number of regions, for example, 4, 8, 16, or even more regions, and each region may include a size of 256 by 256 by 1, 512 by 512 by 3, or other sizes.
  • the number of pixels along x, y, and/or z direction may be adjusted to maintain a particular spatial resolution in a given FOV. For example, with a spatial resolution of 0.2 um, to cover a FOV of 0.8 mm, the number of pixels may be 4000.
  • Each flow cell image at a specific z level may include intensities generated by polonies or clusters at the corresponding z level.
  • signals from polonies or clusters are small bright spots within the images.
  • Each bright spot can be of various sizes that is less than a couple of pixels, e.g., less than a pixel, about a pixel, about 2 pixels, 3 pixels, 4, pixels, 5 pixels, or more.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.01 pixel to about 100 pixels.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.1 pixel to about 16 pixels.
  • Each flow cell image can also include intensities generated by the cell and its structural elements. Such structural elements can be background objects or components, e.g., in FIG. 3 A. Each flow cell images can also include noise and/or artifacts that are not from the polonies or cellular structures.
  • the optical system when the depth of field the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis, polonies or clusters that are within the range of depth of field can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image. So, such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurry signal spots represent out-of-focus polonies or clusters. Some of the out-of- focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • the flow cell images are from multiple color channels. In some embodiments, the flow cell images are of unbalanced nucleotide diversity. In some embodiments, the flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more sequencing cycles. In some embodiments, the flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences. In some embodiments, different insert sequences correspond to different target RNA molecules or target cDNA molecules.
  • each location of the determined polonies corresponds to a location of the concatemer molecules.
  • the flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support.
  • the flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
  • the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases that is less than 20%, 15%, 10%, or 5% in the one or more sequencing cycles.
  • the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
  • bases calls from the polonies include 4 different bases, and percentage of polonies for each of the 4 different bases can be greater than about 10% so that the data are of balanced diversity.
  • bases called from the plurality of polonies includes 4 or less different bases, and percentage of polonies for one or more bases can be less than about 10%, and such data can be considered as unbalanced diversity.
  • bases called from the plurality of polonies include 4 or less different bases, and percentage of polonies for some of the bases can be less than about 5%, about 2%, or even about 1%, and such data can be considered as unbalanced diversity.
  • the unbalanced diversity data include bases A, T, C, G in the plurality of polonies, and their percentages of the total base calls are about 1%, about 2%, about 1%, and about 95%, respectively.
  • plexity can also be a factor that when plexity is lower than a number, e.g., 8 or 16, the signal is of unbalanced diversity.
  • the method 500 is configured to predict high resolution flow cell images even if the polonies in the acquired flow cell images are of unbalanced diversity in one or more sequencing cycles.
  • the method 500 comprises an operation 520 of (ii) providing, by a processor, the first plurality of flow cell images as an input to a neural network (e.g., CNN), wherein the neural network is pre-trained using a training data set of training flow cell images using a training method 600 herein.
  • the neural network is pretrained so that the values of parameters of the neural network has been optimized based on the training.
  • the neural network may be retrained when needed, for example, for predicting flow cell images from different cellular samples.
  • the computer-implemented method 500 may be used to predict high resolution flow cell images that are at higher resolution than the first plurality of flow cell images (e.g., 2x, 4x, or 6x along at least one spatial dimension) acquired by the imager 116.
  • the high resolution flow cell images may be post image-processing images of the first plurality of flow cell images, e.g., by going through the image processing part 3120 of the neural network in FIG. 31.
  • Image processing herein may include various image processing steps including but are not limited to: background removal, background reduction, artifact removal, artifact suppression, adjusting signal to noise ratio, adjusting contrast to noise ratio, intensity normalization, intensity offset correction, noise reduction, color correction, phasing or dephasing correction, image registration, and deconvolution.
  • the neural network in operation 520 is a first neural network that can be trained using method 700 disclosed herein.
  • the method 500 comprises an operation 520’ in replacement of the operation 520.
  • the operation 520 includes (ii) providing, by a processor or a first reconfigurable logic device, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images and reference base calls of the training dataset.
  • the operation 520’ is similar to the operation 520, e.g., as shown in FIG. 5 A, with the exception of a different neural network.
  • the operation 520’ may replace the operation 520 in method 500.
  • the neural network in operation 520’ is a different neural network from that in operation 520.
  • the neural network in operation 520 is a first neural network
  • the neural network in operation 520’ is a second neural network that is different from the first neural network in operation 520.
  • the difference(s) among the first and second neural networks may include but is not limited to: different types of neural networks, differences in values of parameters, number of parameters, number of convolutional layers, number of layers, or a combination thereof.
  • the second neural network in operation 520’ is a different neural network that is pretrained using the same training data set of flow cell images as that used for training the first neural network in operation 520.
  • the second neural network in operation 520’ is a different neural network that is pretrained using a different training data set of flow cell images as that used for training the first neural network in operation 520. In some embodiments, the second neural network in operation 520’ is pre-trained using reference base calls of the training dataset.
  • the first neural network in operation 520 is pretrained using reference images or reference intensities as ground truths, e.g., reference high resolution images or reference intensities in high resolution images, and the second neural network in operation 520 is pre-training using reference base calls of the training flow cell images in the training datasets as ground truths.
  • the reference base calls may be generated using various base calling methods including those methods disclosed herein in relation to training methods for predicting base calls herein. In some embodiments, the reference base calls may be generated using methods that lacks usage of a neural network. Exemplary embodiments of generating base calls from flow cell images are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
  • the second neural network in operation 520’ may be trained using a training method similar to methods 700 in FIG. 5E, In such embodiments, the reference intensities are not used in operations, e.g., operations 725, 730, and 755. Instead, reference base calls are used in such operations, e.g., operations 725’, 730’, and 755.
  • the loss function for training the second neural network in operation 520’ may be different from the loss function used in training the first neural network in the operation 520. In some embodiments, various loss functions may be used for training the second neural network in operation 520’.
  • the second neural network is pre-trained using one or more loss functions based on comparing training base calls of the training flow cell images to the reference base calls of the training flow cell images.
  • the loss function for training the second neural network in operation 520’ may be based on comparison of training outputs, e.g., base calls, to the reference base calls.
  • training of the second neural network in operation 520’ may be completed when the loss function satisfies a predetermined criteria.
  • the predetermined criteria can be customized to include various aspects of training outputs.
  • the predetermined criteria is determined based on the comparison of training base calls to reference base calls.
  • the predetermined criteria is based on the correctness of the training base calls in comparison to the reference base calls.
  • the predetermined criteria is at least based on training time that has been spent.
  • FIG. 31 is a block diagram showing an exemplary embodiment of the first and second neural networks and the method for training such neural networks.
  • neural network 3110 may be any artificial intelligence-based or machine learning based model that can include an imaging processing part 3120 and a base calling part 3130.
  • the imaging processing part 3120 and the base calling part 3130 each of them may be any artificial intelligence-based or machine learning based model that may achieve similar functions as the neural network-based equivalent
  • the neural network 3110 may be the first neural network in operation 520 or the second neural network in operation 520’.
  • the method for training the neural network 3110 may be method 700 as an example.
  • the neural network may include two separate parts, the first part is the image processing part 3120, and the second part is the base calling part 3130.
  • the image processing part 3120 is configured to perform one or more image processing steps disclosed herein, e.g., in relation to method 500, on the flow cell images herein, e.g., the first or second plurality of flow cell images.
  • the one or more image processing steps may include but are not limited to: background removal, background reduction, artifact removal, artifact suppression, adjusting signal to noise ratio, adjusting contrast to noise ratio, intensity normalization, intensity offset correction, noise reduction, color correction, phasing or dephasing correction, image registration, intensity extraction, and deconvolution.
  • the base calling part 3130 is configured to perform base calling using the output images 3150 from the image processing part 3120 of the neural network.
  • the base calling part 3130 may be configured to perform some image processing steps including but not limited to intensity extraction, color correction, and/or phasing or dephasing correction in embodiments where such image processing steps are not performed in the image processing part 3120 of the neural network.
  • the first or second part of the neural network 3120, 3130 may each include one or more structural elements of the neural network such as a convolutional layer.
  • the first or second part of the neural network 3120, 3130 may include one or more embedding layers of the neural network.
  • the first part of the neural network 3120 may include at least part of an encoder of the neural network
  • the second part of the neural network may include at least part of an decoder of the neural network.
  • the second part of the neural network may include at least part of: a convolutional layer, a pooling layer, a fully connected layer, a SoftMax layer, an input layer, an output layer, an embedding layer, an encoder, and a decoder of the neural network.
  • the base calling part 3130 may lack any structural element of a neural network, e.g., a convolutional layer or a pooling layer. In some embodiments, the base calling part 3130 may lack any artificial-intelligence based algorithm. In some embodiments, the base calling part 3130 may lack any convolutional layers of the neural network. In some embodiments, the base calling part 3130 may lack any part of an embedding layer or a decoder of the neural network. In some embodiments, the base calling part 3130 may lack any part of: a convolutional layer, a pooling layer, a fully connected layer, a SoftMax layer, an input layer, an output layer, an embedding layer, an encoder, and a decoder of the neural network.
  • the base calling part 3130 may only comprise non-neural network base calling algorithm(s).
  • the neural network 3110 that generates the output images 3150 is the second neural network in operation 520’ .
  • the neural network 3110 that generate the output base calls 3160 is the neural network disclosed in relation to method 2800.
  • training of the neural network may include training of one or more parameters of the base calling part 3130.
  • the one or more parameters may include a feature size.
  • the back propagation for finding adjustments of values for parameters of the neural network 3110 goes through the base calling part 3130 with making any adjustment to parameters of the base calling part 3130 and the image processing part 3120 (with adjustment of parameters) as the solid gray line with arrow shown in FIG. 31.
  • training of the neural network does not include training of any parameters of the base calling part 3130.
  • the back propagation for finding adjustments of values for parameters of the neural network 3110 may go through the base calling part 3130 without making any adjustment to parameters of the base calling part 3130 and then the image processing part 3120 (but with adjustment of parameters) as the solid gray line with arrow shown in FIG. 31.
  • training of the neural network may include training of the base calling part 3130 and the image processing part 3120, including adjusting parameters from both parts, as shown in FIG. 31 as the solid gray line with arrow.
  • training of the neural network may only include training of the image processing part 3120 but not training of any of the parameters in the base calling part 3130.
  • the back propagation for updating thereby training the parameters of the neural network 3110 goes directly to the image processing part 3120 without going through the base calling part 3130 as shown in FIG. 31 as the dotted grey line with arrow.
  • the base calling part 3130 is not trained and the parameters in the base calling part 3130 are fixed.
  • the loss function may be based on the output of the base calling part 3130, and the value of the loss function may be determined based on the output of the base calling part.
  • the input 3140 may go through the image processing part 3120 to generate the output images 3150.
  • the output images 3150 comprise the second plurality of flow cell images, e.g., disclosed herein in relation to methods 500.
  • the output images 3150 comprise high resolution post-processing images corresponding to the input images 3140.
  • the output images may go through the base calling part 3130 to generate the base calls 3160.
  • the output images 3150 may go through various base calling algorithms, e.g., non-neural network based traditional base calling algorithms, but not the base calling part 3130 for generating the base calls.
  • the neural network 3110 advantageously reduces the time required to make predictions, and reduces the computational burden and power required to make the prediction comparing with existing neural networks that predicts base calls.
  • the input images 3140 comprise raw flow cell images acquired at the imager 116. In some embodiments, the input images 3140 comprise the first plurality of flow cell images disclosed herein. In some embodiments, the input images 3140 may be from multiple color channels and multiple sequencing cycles. In some embodiments, the input images 3140 may be from multiple color channels and a single sequencing cycle. In some embodiments, the input images 3140 may be from a single color channel and multiple sequencing cycles. In some embodiments, the input images 3140 may be from a single z level or multiple z levels.
  • references or ground truths 3180 can be used for comparison of the output base calls 3160, and the value of the loss function 3170 can be calculated based on such comparison.
  • the value of the loss function then can be used during training for back propagation into the neural network 3110 for adjusting values of the parameters of the neural network 3110, e.g., gradients.
  • adjusting parameters of the neural network may include parameters of the base calling part 3130 and the image processing part 3120. In other words, both parts 3120, 3130 are trained during training of the neural network, e.g., using the training methods herein 700, 2900.
  • the value of the loss function then can be back propagated into the neural network 3110 for adjusting values of the parameters of only the image processing part 3120, but not the base calling part 3130.
  • only the image processing part 3120, but not the base calling part 3130 is trained during training of the neural network, e.g., using the training methods herein 700, 2900.
  • the neural network 3110 that is trained only on the image processing part 3120 is the second neural network in operation 520’.
  • the neural network 3110 that is trained on both the image processing part 3120 and the base calling part 3130 is the second neural network in operation 520’ .
  • the second neural network in operation 520’ comprises a convolutional neural network. In some embodiments, the second neural network in operation 520’ comprises a recurrent neural network. In some embodiments, the second neural network in operation 520’ comprises a U-Net, residual U-Net, ResNet (residual neural network), and/or a LSTM (long short-term memory) neural network.
  • the training flow cell images are acquired only from a same color channel.
  • each of the training flow cell images comprise flow cell images of a same field of view from a plurality of sequencing cycles stacked along a time dimension.
  • the plurality of sequencing cycles may be of a same sequencing run.
  • the plurality of sequencing cycles may be consecutive sequencing cycles in the sequencing run.
  • each of the training flow cell images comprise flow cell images of a same field of view from one or more sequencing cycles.
  • each of the training flow cell images comprise flow cell images of the sample at one or more z-levels.
  • the training flow cell images comprise flow cell images of the sample at multiple different field of views of the same sample.
  • the training flow cell images comprise flow cell images of the sample at multiple different field of views of one or more sample(s).
  • the different field of views may be at the same x, y, or z location of the same sample.
  • the different field of views may be different subtitles of the sample at the same z location, but different x,y locations.
  • the multiple different views may be adjacent to each other, with none or at least some spatial overlap with other field of views.
  • each of the training flow cell images comprise flow cell images of different field of views (e.g., adjacent FOVs of the same sample) from a plurality of sequencing cycles stacked along one or two spatial dimensions.
  • the training dataset for the neural network may only include flow cell images of the same color channel thereby the neural network is not trained on variations across different color channels that may be caused by differences in optical elements in response to different colored light signals (e.g., emission filter, illumination, etc.), differences in fluorescent dyes, or other factors of the sequencing system, etc.
  • variations may cause but is not limited to cause different background levels, different signal to noise ratio, different artifacts in the field of view, different full width at half maximum (FWHM) of emission light signals, point spread function (PSF), etc.
  • Training the neural network using flow cell images of the same color channel may advantageously remove fitting to variations across different color channels, and may simplify and speed up training the neural network and avoid possible errors in training.
  • a different neural network is trained with flow cell images of a corresponding color channel.
  • the neural network is trained to be a channel-specific neural network.
  • 3 different neural networks are trained using corresponding flow cell images of the corresponding color channels.
  • Each channel-specific neural network is used for prediction of high resolution flow cell images of the corresponding color channel.
  • a single neural network may be trained using flow cell images of such same colors from the two or more channels. Such neural network may be used to predict or make inferences of high resolution flow cell images from the two or more channels of the same color.
  • a different neural network may be trained using flow cell images of a single channel.
  • Each different neural network is a channel specific neural network that may be used for prediction or inference only of the corresponding channel.
  • the method 500 comprises an operation 530 of (iii) predicting, by the first reconfigurable device or an integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images is with a second resolution and corresponds to a corresponding image of the first plurality of flow cell images, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the operation 530 of (iii) predicting the second plurality of flow cell images using the neural network comprises predicting high resolution postprocessing images corresponding to the first plurality of flow cell images, and wherein the processing comprises various image processing or intensity processing steps.
  • the processing steps may comprise one or more of: noise reduction, background reduction; background removal; artifact removal; artifact suppression; intensity offset correction; intensity normalization; adjusting signal to noise ratio; adjusting contrast to noise ratio; color correction; phasing and/or dephasing; image registration; and deconvolution.
  • predicting high resolution post-processing images corresponding to the first plurality of flow cell images may advantageously allow a higher resolution and higher image quality version of the first plurality of flow cell images to be generated, and the higher resolution, higher quality version may be used for generating more accurate and reliable base calls.
  • the second neural network of method 500 e.g., in operation 520’, may be trained using not reference flow cell images or reference intensities, but reference base calls as ground truths.
  • the reference base calls may be generated using various methods including methods disclosed herein in relation to training neural network for predicting base calls herein.
  • the training may optimize at least some of the parameters of the neural network for producing training base calls that are similar enough to the reference base calls (e.g., determined by the value of the loss function satisfying a predetermined criteria).
  • the trained neural network may be used to predict high resolution post-processing images corresponding to the first plurality of flow cell images, and such high resolution post-processing images may be used to produce accurate and reliable base calls.
  • the embodiments of method 500 with the operation 520’ may improve base calling accuracy, reliability, and reduce computation complexity in the prediction, free up storage space, save power and time when compared with methods that predicts base calling directly.
  • training the second neural network in the operation 520’ for each corresponding color channel in comparison with training of first neural network in the operation 520 with flow cell images from multiple color channels may require less computations, require less power consumption, require less memory or data storage, reduce training time, and avoid possible training failures.
  • FIG. 30A shows an exemplary flow cell image of the first plurality of flow cell images.
  • the exemplary flow cell image is of a 2D sequencing sample, and is acquired from one of the 4 different color channels.
  • the image size is 608 pixels by 608 pixels.
  • FIG. 30B is a high resolution image of the flow cell image in FIG. 30A and it is predicted using method 500 with the second neural network in the operation 520 herein.
  • the neural network is pretrained using a training data set comprising training flow cell images.
  • the neural network is pretrained using the training data set and reference base calls instead of reference intensities.
  • the high resolution image has a size of 1216 by 1216 pixels, which provides 2x resolution of the flow cell image in FIG. 30A in x and y direction.
  • the detectable polony density in high resolution image is increased by at least 2x, 4x, or more than that in the first plurality of flow cell images, e.g., FIG. 30A.
  • the neural network predicts the high resolution image with less background noise, blurriness, bright artifacts, etc.
  • the high resolution image in combination with other high resolution images from the other 3 color channels, can then be used together to determine base calls.
  • the error rate in determining polonies, and the error rate in making base calls can be lower using the high resolution image in FIG. 30B than using the flow cell image in FIG. 30A.
  • the prediction of high resolution images rather than prediction of base calls directly may advantageously reduce computational complexity, computation burden, power consumption, storage usage for performing base calls while maintaining or improving accuracy and reliability.
  • the method 500 include an operation 540 (iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images.
  • determining the polonies comprises determining locations of the polonies, locations of the center of the polonies, size of the polonies, or a combination thereof.
  • the location of the polonies, or the locations of the center of the polonies may be 2D or 3D.
  • the polonies excludes duplicate polonies.
  • the method 500 comprises an operation 550 of (v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
  • the second neural network in the operation 520’ when it is being trained, may include one or more layers, e.g., convolutional layers, for generating base calls from the high resolution post image processing images.
  • the second neural network, after it is trained, in operation 520’ may utilize only a subset of the layers of the second neural network being trained since the neural network only predicts the high resolution post image processing images but not the base calls.
  • the neural network, after it is trained, in operation 520’ may utilize only a subset of the layers of the neural network in operation 520’ since the neural network only predicts the high resolution post image processing images but not the base calls.
  • the pretrained second neural network using reference base calls may have a first number of layers, while a second number of the layers in the pretrained second neural network is used in operation 530 in predicting the high resolution flow cell images.
  • the second number of layers is less than the first number of layers.
  • the pretrained second neural network may have 5 layers with the first 4 layers for predicting high resolution post image processing images and the last layer for predicting base calls. In the operation of 530, only the first 4 layers of the pretrained second neural network is used.
  • the operation 550 may use the last layer of the pretrained second neural network.
  • the second neural network in operation 520’ utilizes the same number of layers as the second neural network being trained.
  • the neural network in operation 520’ may lack any neural network layers that is specific for generating base calls based on the high resolution post image-processing images.
  • the neural network in operation 520’ may rely on other non-neural network based algorithms or software for base calling of the high resolution post imageprocessing images.
  • the pretrained second neural network may have 5 layers with the first 4 layers for predicting high resolution post image processing images and the last layer for predicting base calls. In the operation of 530, only the first 4 layers of the pretrained second neural network is used.
  • the operation 550 may use non-neural network based algorithms or software for base calling.
  • the neural network in operation 520’ has fewer number of convolutional layers than the number of convolutional layers in the neural network in operation 520. In some embodiments, the second neural network in operation 520’ has the same number of layers as the first neural network in operation 520.
  • the neural network herein has less than or equal to 18, 15, 12, 10, 8, 7, 6, 5, 4, 3, or 2 layers. In some embodiments, the neural network herein has 6, 5, 4, 3, or 2 layers. In some embodiments, the neural network has less than 256, 128, 96, 80, 64, or 32 features.
  • cycle N may be one of the reference cycle(s) for generating the polony map.
  • cycle N may be a cycle different from the reference cycle(s).
  • the polony map can be generated in the reference cycle(s) as a subsequent operation after the methods herein have improved the detectable polony density in flow cell images. Polonies from one or more channels within the reference cycle(s) can be included in the polony in a reference coordinate system, while base calling of cycle N is yet to be performed.
  • cycle N is the current cycle.
  • N can be any non-zero integer.
  • N can be any integer from 1 to 150, from 1 to 200, or from 1 to 1000.
  • the polony map disclosed herein can include individual regions within a subtile or tile. Each polony map can include a plurality of polonies therein. In some embodiments, the polony map can be of about the same size of a flow cell image so that all the polonies, from different tiles, and from multiple channels, can be registered to the same polony map. However, such polony map may contain polonies that will not be used in at least some operations described herein to reduce computational burden without sacrificing accuracy. In some embodiments, more than one polony map can be generated, and each corresponds to at least part of a subtile of a flow cell image from a channel. The more than one polony map may be tiled together in order to cover the entire sample region of the flow cell device.
  • the polony map disclosed herein can include polonies that are within individual cells or tissue, or on the membrane thereof. In some embodiments, the polony map disclosed herein can exclude polonies or signal spots that are outside cell boundaries. In some embodiments, the polony map disclosed herein can exclude duplicate polonies, such duplication may occur at different z-locations, with one or more in-focus and/or out-of-focus in the flow cell images. The duplicate polonies may be within the same flow cell image or in different flow cell images. [0281] The polony map herein can be initialized as a virtual image that has a black or dark background with no signals from polonies. For example, the polony map can be initialized to be zero or include otherwise minimal image intensity at all pixels.
  • the intensity of the polony can be added to the polony map at the location determined by the coordinates and with the size and shape determined based on registration.
  • the polony map can be a virtual image that combines image intensity from polonies obtained from 2, 3, 4, or even more channels at the reference cycle.
  • the pixels of the template containing no polonies in them remains to be black or dark so that the polony map can have a cleaner background without noise that appear in actual flow cell images.
  • the polony map includes a list of entries, and each entry corresponding to information for identifying a corresponding polony.
  • each entry can include spatial coordinates of the corresponding polony center in the reference coordinate system, and image intensity of the polony.
  • the entry may also include a unique identification number of the polony.
  • the polonies can be from a subtile of flow cell images within a reference cycle, and more specifically, from one or more selected regions of the subtile.
  • the flow cell images can be from different channels of 1, 2, 3, 4, or more channels of the system 100.
  • a reference cycle can be any cycle of the first 5 or 6 cycles. In some embodiments, the reference cycle can be any cycle that is greater than 0. In some embodiments, the reference cycle is the first cycle.
  • the operation 540 comprises performing image processing step(s) to adjust image intensities of polonies.
  • the image processing steps comprise one or more of the following: background subtraction; image sharpening; intensity offset adjustment; color correction; intensity normalization; phasing and prephasing correction; image registration; quality score estimation, or the like.
  • the image registration is configured to align images from different cycles and/or different channels, for example, with respect to a template image (i.e., a polony map) or a reference coordinate system.
  • the image registration herein is configured to register polonies or clusters from different cycles and different channels, to a template image or a reference coordinate system.
  • the second plurality of flow cell images may be the output of the neural network.
  • the second resolution may be 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be 4 to 32 times greater than the first resolution in 2D or 3D.
  • the operation 540 is based on a polony map that has been generated.
  • the polony map may be 2D or 3D.
  • the polony map has the second resolution.
  • the operation 540 comprises generating a polony map, and determining the polonies based on the generated polony map. The details of generating a 2D or 3D polony map has been disclosed in U.S. Patent Application Nos. 18/078,820 and 18/078,797, and are incorporated herein by reference in their entirety.
  • the base calling can be performed using polony locations in the second plurality of flow cell images from different channels in cycle N, after the second plurality of flow cell images from different channels are registered relative to the polony map disclosed herein.
  • Various existing 2D base calling algorithms can be used.
  • the base calling results can be saved with its 3D coordinates. Such 3D coordinates can be used to register the base calling across different cycles and at different z levels.
  • the method 500 can comprise an operation 550 of (v) performing, by the processor, a corresponding base calling for each of the determined polonies.
  • the operation 550 of performing base calling may be based on the second plurality of images generated in operation 530.
  • the operation 540 may be further based on the determined polony map in operation 540.
  • the base calling can be performed using intensity of the polonies from different channels per cycle per z level.
  • the method 500 may include an operation of saving the base calls obtained in operation 550 in a predetermined format, e.g., in a FastQ file compatible with subsequent operations so that subsequent analysis such as adaptor trimming and secondary analysis can be performed.
  • the neural network is a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network is a U-Net.
  • the neural network comprises a U-Net with a first predetermined repetition of down-sampling and convolution operations and then a second predetermined repetition of up-sampling, concatenation, and convolution operations.
  • the first and second predetermined repetition can have an identical quantity, e.g., 3 or 4.
  • the neural network is a U-Net with a first predetermined number of filters in each repetition of down sampling, and then a second predetermined number of filters in each repetition of up sampling and/or concatenation.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 256, 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 530 may comprise: performing, by the processor, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising: (a) performing, by the processor, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (b) performing, by the processor, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result.
  • the second convolution may comprises a corresponding number of filters, thereby generating a third convolution result after the repetitions.
  • the operation 530 may further comprise: performing, by the processor, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (c) performing, by the processor, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (d) performing, by the processor, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the second convolution may comprise a corresponding number of filters, thereby generating a sixth convolution result after the repetitions.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel may have 4 dimensions.
  • the convolutional kernel is m*m*m for the first three spatial dimensions and the size of its fourth dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be 512x512 flow cell images, and the z-stack can have 12 slices.
  • the first convolution can include 32 filters and each filter has one kernel that is 3x3x3xl.
  • the output from that convolutional block is 512x512x12x32.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x12x32 and the output is 512x512x12x32.
  • Each filter uses a kernel sized 3x3x3x3x32. The number of filters may correspond to features of the input.
  • the second convolution comprises two 3D convolutional layers, e.g., as shown in the pseudo code.
  • the second convolution comprises two repetition or blocks of the first convolution in 3D, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the first convolution comprises a 2D convolution with a convolution kernel.
  • the convolutional kernel may have 3 dimensions.
  • the convolutional kernel is m x m for the first two spatial dimensions and the size of its third dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be flow cell images with a size of 512x512x1.
  • the first convolution can include 64 filters and each filter has one kernel that is 3x3x1.
  • the output from that convolutional block is 512x512x64.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x64 and the output is 512x512x32.
  • Each filter can use a kernel sized 3x3x32.
  • the second convolution comprises two convolutional layers, e.g., as shown in the pseudo codes.
  • the second convolution comprises two repetition or blocks of the first convolution, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the second convolution in operation (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in operation (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n can be an integer in the range from 8 to 256.
  • operation (a) comprises 32, 64, 128, and 256 filters in three repetitions
  • operation (c) comprises 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • operation (a) comprises 32, 64, 128, and 256 filters in four repetitions
  • operation (c) comprises 256, 128, 64, and 32 filters in the corresponding four repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n filters in a last repetition, last minus one, last minus two, repetition, respectively.
  • operation (a) comprises 32, 64, 128 filters in three repetitions and operation (c) comprises 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 530 may further comprise: performing, by the processor, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and predicting, by the processing, the second plurality of flow cell images based on the seventh convolution result.
  • Each of the second plurality of flow cell images may correspond to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -10 10 2 per mm .
  • the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the first resolution is in a range of 0.01 um to 10 um. In some embodiments, the second resolution is in a range of 0.02 um to 2 um. In some embodiments, the second resolution is in a range of 0.001 um to 3 um. In some embodiments, the down-sampling factor is 2, 4, 6, 8, 16, or more. In some embodiments, the up-sampling factor is 2, 4, 6, 8, 16, or more.
  • one or more of operations (ii) to (v) are performed while a sequencing run is being performed. In some embodiments, one or more operations (ii) to (v) are performed in parallel as the corresponding sequencing run to reduce sequencing analysis time.
  • the one or more cycles comprises a current cycle N.
  • N may be in a range from 1 to 150, 1 to 300, 1 to 500, or 1 to 1000.
  • one or more of operations (ii) to (v) are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • the training data set of training flow cell images comprises z-stacks of training flow cell images taken at different z-locations.
  • Each z-stack may represent an individual FOV of cellular sample(s).
  • the z-axis is orthogonal to image planes of the flow cell images.
  • the training data set of training flow cell images comprises flow cell images from multiple sequencing cycles.
  • One or more sequencing cycles may be of unbalanced diversity so that image appear dimmer or the number of polonies are less than images from sequencing cycles of high nucleotide diversity.
  • the number of polonies in the training flow cell images in a particular cycle may vary from 1% to 99% of a total number of polonies within a FOV of that cycle.
  • the number of polonies in the training flow cell image of a particular cycle is from 1% to 5% or 1% to 10% of the total number of polonies within that cycle, it is of low or unbalance diversity.
  • the number of polonies in the training flow cell image of a particular cycle is greater than 10% or 15% of the total number of polonies within that cycle, it is of high or unbalanced diversity.
  • the training data set of training flow cell images comprises flow cell images from multiple samples and multiple sequencing cycles, and the training flow cell images include a subset of flow cell images with unbalanced diversity in multiple sequencing cycles and another subset of flow cell images with balanced diversity in multiple sequencing cycles.
  • the training flow cell images from one or more cycles may be transformed from other training flow cell images from different cycle(s) to simulate the transformation that may occur across cycles within a same color channel.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises performing, by the processor, the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises performing, by the processor, the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing, for one or more times, operations comprising (c), (d), and (e), wherein (e) is after operation (c) and before operation (e), and wherein (e) comprises: concatenating, by the processor, the first up- sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous down-sampling repetition.
  • operation (e) is in each repetition.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing operations comprising (c), (d), and (e) in each repetition of one or more repetitions.
  • the kernel may take any size that is smaller than the size of the flow cell image undergoing the convolution.
  • the kernel can be 2 by 2 by 2, 3 by 3 by 3, 4 by 4 by 4, 5 by 5 by 5, or 6 by 6 by 6 in the first three spatial dimensions.
  • the kernel size can be customized to remove at least some of the noise and unwanted signal that are larger than the kernel size.
  • the kernel can be circular.
  • the kernel can be in various other shapes.
  • when the focus of the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc.
  • Polonies or clusters that are within the range of focus can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image, but at different z levels.
  • Such polonies or clusters are out-of-focus.
  • bigger and blurred signal spots represent out-of-focus polonies or clusters.
  • Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • the method 500 include an operation of registering the second plurality of flow cell images.
  • the images are registered across channels and/or across different cycles.
  • the images are registered before any base calling are performed in operation 550.
  • the images are registered across channels and different cycles before generating or obtaining the polony maps.
  • the images are registered across channels and different cycles before one or more primary analysis steps here.
  • the images can be registered after one or more preprocessing operations disclosed herein are performed.
  • Various image registration techniques can be used to register the images.
  • Various image registration techniques can be used to register the images.
  • the images can be registered using 2D or 3D registration techniques.
  • the operation of registering the flow cell images is with respect to a reference coordinate system. In some embodiments, the operation of registering the flow cell images is with respect to one or more template images.
  • the operation of registering the images can comprise generating the one or more template images in a reference coordinate system. In some embodiments, the operation of registering the images can comprise registering polonies to template polonies in the one or more template images.
  • the operation of registering the images can comprise determining a plurality of transformations based on the one or more template images. Each of the plurality of transformations can corresponds to a corresponding subtile of the flow cell images, the processed images, or the filtered images and configured to register the subtile to the one or more template images. Each transformation can be used to register a corresponding subtile or tile to the one or more template images.
  • the plurality of transformations can comprise one or more affine transformations.
  • the operation of registering the images can comprise performing image registration of the polonies based on fiducial markers.
  • the fiducial markers can be located on the flow cell. Alternatively, the fiducial markers can be external to the flow cell.
  • the image registration as an image processing step herein is configured to align images from different cycles and/or different channels, for example, with respect to a template image or a reference coordinate system.
  • the image registration herein is configured to register polonies or clusters from different cycles and/or different channels, e.g., in the filtered image, to a template image or a reference coordinate system.
  • the base calling can be performed using the filtered images from different channels in cycle N after the filtered images from different channels are registered relative to the corresponding template image disclosed herein.
  • the operation 540 can comprise an operation of extracting polony intensities based on the polony map.
  • the location information of such polony can be obtained from the polony map, e.g., 2D coordinates of the polony and the z level.
  • the corresponding flow cell image and its pixel(s) can be determined. Image intensity of such pixels can be extracted from the corresponding processed image after one or more image processing steps as intensity of such pixel for performing base calling.
  • the operation of registering the flow cell images may be based on background objects in the flow cell images.
  • the background objects can be used to align the flow cell image to the cell images by using one or more transformation(s).
  • the cell staining images herein are staining images of the sample(s) immobilized on the support, with possible transformation (e.g., translation) from the sample(s) in the flow cell images.
  • the transformation may be represented by a single transformation of the whole image or be separated into multiple transformations, each representing a portion of the whole image.
  • the method 500 may include an operation of registering the base calling in 550 to the cell staining images.
  • registration may be based on fiducial markers.
  • fiducial markers can also be included in the cell staining images. Aligning the fiducial markers can generate the transformation(s) between the flow cell images or between flow cell images and cell staining images. The transformation(s) can be used to register or align polonies or clusters between the sequencing images and the cell images.
  • the simulated z-stack is 2048x2048x3, each cell may include 200 to 2000 polonies per cell.
  • the spatial resolution can be about 0.1 um.
  • Prediction is performed independently for each 512x512 region of the simulated z-stack.
  • the predicted high-resolution z-stack is 8192x8192x12.
  • FIGS. 2A-2C show simulated flow cell images, and two different predicted flow cell images with 4x resolution at different z-locations.
  • FIGS. 3 A and 3D show two actual flow cell images at different z-locations in a 512x512x3 z-stack.
  • the predicted high resolution flow cell images (2048x2048) in FIGS. 3B-3C are at two different z-locations corresponding to the low resolution image in FIG. 3A.
  • the neural network may be used to predict polony locations using z-stack(s) of flow cell images comprising flow cell images from multiple z-levels forming 3d volume(s).
  • m is in a range from 2 to 10
  • filters can be in a range from 8 to 1024
  • the fourth dimension of k size can match the number of filters in the corresponding repetition.
  • the input flow cell images can have various sizes in 3D as disclosed herein, e.g., 1024 by 1024 by 4.
  • bi conv block (inputs, filters, k size)
  • n l:m-l
  • u n upsampling3D(bm+ n )
  • cats n concatenate (u n , b m +i-n)
  • bm+n+1 double conv(cats n , filter s* 2 m ⁇ n ⁇ 1 , k size)
  • b2m+i conv block b 2m, filters, k size)
  • the neural network may be used for predicting polony locations based 2D flow cell images at different z-levels.
  • m is in a range from 2 to 10
  • filters can be in a range from 8 to 1024
  • k size can be in 3 dimensions
  • the third dimension of k size can match the number of filters in the corresponding repetition.
  • the input flow cell images can have various sizes in 2D as disclosed herein, e.g., 1024 by 1024, and there can be 3, 4, 5, or other numbers of z-levels.
  • n 2:m-l
  • u n upsampling2D(b m+n- 2 )
  • cats n concatenate (u n +i, bm-n+i)
  • bm+n-l double conv(cats n , filter s* 2 m ⁇ n ⁇ 1 , k size)
  • model tfkeras.Model(inputs, outputs)
  • the methods and systems herein can be used to predict base calls for some or all polonies of the flow cell images.
  • the systems and methods herein advantageously use a neural network that is pretrained for predicting the base calls for polonies of flow cell images.
  • the same neural network may also be advantageously used, without additional training, to generate a polony map or a template image so that the locations of the predicted base calls can be determined.
  • the embodiments herein used convolutional neural network as an example, however, it is understood that various other neural networks or machine learning models may also be used achieve prediction of base calls using the systems and methods herein.
  • the methods for predicting base calls may include one or more operations here. When there are multiple operations involved, such operations may or may not be performed in the order that is described herein.
  • FIG. 28 shows a flow chart of a computer-implemented method 2800 for predicting base calls for flow cell images of biological samples, e.g., cellular samples, thereby enabling efficient and accurate primary analysis.
  • the method 2800 can include some or all of the operations disclosed herein. The operations may be performed in but is not limited to the order that is described herein.
  • the method 2800 can be performed by one or more processors disclosed herein.
  • the processor can include one or more of: a processing unit, e.g., a CPU, a reconfigurable logic device, an integrated circuit that is not reconfigurable, or their combinations.
  • the processing unit can include a central processing unit (CPU).
  • the reconfigurable logic device can include one or more FPGA devices.
  • the integrated circuit can include a chip such as an Al chip or an ASIC chip.
  • the processor can include the computing system 400.
  • some or all operations in method 2800 can be performed by the reconfigurable logic device, e.g., the FPGA(s), and/or the integrated circuit, e.g., the Al chip(s).
  • the data produced by the reconfigurable logic device and/or integrated circuit, e.g., the FPGA(s) after performing one or more operations can be communicated to various hardware elements of the system 100, e.g., CPU(s) or GPU(s), so that subsequent operation(s) in method 500, 600, 700, 2800, and 2900 can be performed by such various hardware using the communicated data.
  • data can also be communicated in the opposite direction from various hardware e.g., CPU(s), to the reconfigurable logic device or the integrated circuit for processing.
  • CPU(s) e.g., a central processing unit
  • all the operations in the methods herein can be performed by CPU(s).
  • the operations performed by CPU(s) can be performed by other processors such as the dedicated processors, or GPU(s).
  • all the operations in the methods herein can be performed by the reconfigurable logic device and/or the integrated circuit, e.g., FPGA(s) and/or the Al chip(s).
  • the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit, e.g., via DMA connections. In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit without being routed first to a CPU, a GPU, or any other processing units before reaching the reconfigurable logic device and/or the integrated circuit.
  • making predictions or inferences using the methods 2800 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips may require at least 2x, 8x, lOx, 15x, 20x, 40x, 50x, or lOOx less power than making prediction(s) or interference(s) with the same neural network(s) with identical training images using other computing hardware including but not limited to CPUs or GPUs.
  • the sequencing system herein further comprises: a power source that is configured to supply identical or different power levels to the reconfigurable logic device and the integrated circuit.
  • a maximum power output of the power source to the sequencing system in performing methods 500, 600, 700, 2800, and/or 2900 is less than 2000 Watts, 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, 300 Watts, 200 Watts, or 100 Watts.
  • the method 2800 can comprise an operation 2810 of (i) generating, by the sequencing system 110, a first plurality of flow cell images of sample(s) immobilized on a support by conducting one or more cycles of sequencing reactions.
  • the sample(s) may be traditional 2D sequencing samples containing biological analytes.
  • the sample(s) may be cellular or tissue samples.
  • the samples may comprise concatemer molecules therewithin.
  • the sample(s) may include concatemer molecules from one or more different sample sources.
  • the sample(s) may include a thickness along the z-axis so that the first plurality of flow cell images may be acquired at a z-stack of different z-locations with a first resolution to cover the cellular sample in 3D.
  • the sample can be in situ.
  • the sample can be a 3D sample.
  • the sample can be a volumetric sample that may contain different biological information at the same x-y location but different z level.
  • the sample can include multiple cells, tissue, or their combinations.
  • the 3D sample can be any biological sample that has a thickness that is greater than a predetermined threshold along the z axis. For example, the thickness can be greater than 1 um, 2 um, 3 um, 4 um, 5 um, 10 um, 20 um, or more.
  • the z axis (e.g., z axis) is orthogonal to the image plane defined by x and y axes.
  • the sample can be traditional 2D sequencing samples.
  • the flow cell images can be acquired using the optical system of the imager 116 disclosed herein, from the 1, 2, 3, 4, or more channels.
  • Each flow cell image can include at least a portion of one or more tiles (e.g., imaging areas), and each tile can be divided into multiple subtiles.
  • Each tile or subtile can include a plurality of polonies or clusters.
  • Each subtile can include multiple regions with each region including a number of polonies.
  • the flow cell image as disclosed herein can be an image that is acquired from a flow cell 112 as shown in FIG. 1 or 2712 as shown in FIG. 27.
  • the flow cell images are acquired from a single color channel, and subsequent prediction is by using a pretrained neural network corresponding to that single channel.
  • the flow cell images are acquired from 2, 3, 4, or more color channels, and subsequent prediction is by using a pretrained neural network corresponding to the multiple color channels.
  • a flow cell image herein can be an image of one or more tiles, one or more subtiles, one or more segmented regions within tile(s) or subtile(s), or their combinations.
  • Each flow cell image can comprise a field of view (FOV).
  • the FOV can be orthogonal to the z axis.
  • the FOV can be within the x-y plane.
  • the FOV of different flow cell images at different z levels can be identical within the x-y plane.
  • the FOV of different flow cell images at different z levels can have at least an overlapping portion within the x-y plane.
  • the image resolution of different flow cell images at different z levels can be about identical or exactly identical.
  • FIGS. 3A and 3D show two exemplary flow cell images acquired at two different z levels along the z axis of a same 3D sample within a same sequencing cycle.
  • the FOV can be in 3D and be of various sizes to cover the volumetric sample to be imaged.
  • the FOV along x, y, and/or z direction can be in a range from 10 um to 5 mm.
  • the FOV along x, y, and/or z direction can be in a range from about 0.1 um to about 2 mm.
  • the FOV along x, y, and/or z direction can be in a range from 0.5 um to 1 mm.
  • the FOV can be about 0.5 mm by 0.5 mm by 20 um for certain cellular samples along the x, y, and z direction, respectively.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be any integer greater than 64 or 128.
  • the flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be in a range from 2 to 65536.
  • a single flow cell image can be separated into different number of regions, for example, 4, 8, 16, or even more regions, and each region may include a size of 256 by 256 by 1, 512 by 512 by 3, or other sizes.
  • the number of pixels along x, y, and/or z direction may be adjusted to maintain a particular spatial resolution in a given FOV. For example, with a spatial resolution of 0.2 um, to cover a FOV of 0.8 mm, the number of pixels may be 4000.
  • Each flow cell image at a specific z level may include intensities generated by polonies or clusters at the corresponding z level.
  • signals from polonies or clusters are small bright spots within the images.
  • Each bright spot can be of various sizes that is less than a couple of pixels, e.g., less than a pixel, about a pixel, about 2 pixels, 3 pixels, 4, pixels, 5 pixels, or more.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.01 pixel to about 100 pixels.
  • each signal spot of the polonies or clusters can be any number of pixels in the range from 0.1 pixel to about 16 pixels.
  • Each flow cell image can also include intensities generated by the cell and its structural elements. Such structural elements can be background objects or components, e.g., in FIG. 3 A. Each flow cell images can also include noise and/or artifacts that are not from the polonies or cellular structures.
  • the optical system when the depth of field the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis.
  • Polonies or clusters that are within the range of depth of field can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image. Such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurry signal spots represent out-of-focus polonies or clusters. Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • base calls from the polonies include 4 different bases, and percentage of polonies for each of the 4 different bases can be greater than about 10% so that the data are relatively diverse.
  • bases called from the plurality of polonies includes 4 or less different bases, and percentage of polonies for one or more bases can be less than about 10%, and such data can be considered as data of unbalanced diversity.
  • bases called from the plurality of polonies include 4 or less different bases, and percentage of polonies for some of the bases can be less than about 5%, about 2%, or even about 1%, and such data can be considered as data of unbalanced diversity.
  • the base called for bases A, T/U, C, G in the plurality of polonies can be about 1%, about 2%, about 1%, and about 95%.
  • the base called for bases A, T/U, C, G in the plurality of polonies can be about 10%, about 10%, about 10%, and about 70%, respectively.
  • plexity can also be a factor that when plexity is lower than a number, e.g., 8 or 16, the signal could be of unbalanced diversity .
  • the method 2800 is configured to predict base calls of flow cell images, e.g., of a first resolution, even if the polonies in the flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles, and the base calls may be spatially aligned to the polonies of the flow cell images, of a second resolution.
  • the second resolution may be higher than the first resolution.
  • the method 2800 comprises an operation 2802 of (ia) generating, by a processor or a first reconfigurable logic device, a second plurality of flow cell images comprising a second resolution.
  • each of the second plurality of flow cell images corresponds to a corresponding flow cell image of the first plurality of flow cell images.
  • the second plurality of flow cell images may be generated using various up-sampling algorithms including but not limited to interpolation.
  • the second resolution may be greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be at least 2 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be 4 to 64 times greater than the first resolution in one or more spatial dimensions, e.g., along x, y, and/or z direction.
  • the second resolution may be at least 2 to 32 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution may be at least 4 to 64 times greater than the first resolution in one or more spatial dimensions.
  • the method 2800 comprises an operation 2804 of (ii) providing, by a processor, the second plurality of flow cell images as an input to a neural network, e.g., a convolutional neural network (CNN), wherein the neural network is pretrained using a training data set of training flow cell images using a training method disclosed herein, e.g., 600, 700, 2900 herein.
  • a neural network e.g., a convolutional neural network (CNN)
  • the neural network is pretrained using a training data set of training flow cell images using a training method disclosed herein, e.g., 600, 700, 2900 herein.
  • the neural network is pre-trained so that the values of parameters (e.g., weights) of the neural network has been optimized based on the training.
  • the neural network may be retrained when needed, for example, for predicting flow cell images from different cellular samples.
  • the method 2800 may include image processing step(s) that can be performed on the first or second plurality of flow cell images, optionally prior to providing any input to the neural network.
  • the processing step(s) may include: intensity normalization, background subtraction, background removal, artifact reduction, artifact removal, adjustment of signal to noise ratio, adjustment of contrast to noise ratio, color correction, adjusting intensity offset, image registration, phasing and prephasing, filtering, segmentation, noise reduction, deconvolution (e.g., to differentiate neighboring or at least partly overlapping signal spots), or a combination thereof.
  • the method 2800 comprises an operation 2804’ of providing, by the processor, the first reconfigurable logical device, or the integrated circuit, the first or the second plurality of flow cell images to a polony map generation algorithm or a base calling algorithm.
  • the polony map generation algorithm and the base calling algorithm does not include a trained neural network or an artificial intelligence-based algorithm.
  • the polony map generation algorithm and base calling algorithm does not include a trained neural network or an artificial intelligence-based algorithm.
  • Exemplary polony map generation algorithms for generating 2D or 3D polony maps and base calling algorithms for generating base calls have been disclosed in U.S. Application No. 18/078,797 and 18/078,820, and U.S. Patent No. 10,266,888, and are incorporated herein by reference in their entireties.
  • the method 2800 comprises an operation 2806 of (iia) of determining, by the first reconfigurable device or the integrated circuit, the polony map based on the second plurality of flow cell images.
  • the operation 2806 can be based on the operation of 2804 in some embodiments, and based on the operation of 2804’ in some other embodiments.
  • the polony map is 3D. In some embodiment, the 3D polony map includes multiple 2D polony maps at different z levels. In some embodiments, the 3D polony map has the second resolution. In some embodiments, generating a polony map using a polony map generation algorithm. In some embodiments, the polony map generation algorithm lacks any neural network or artificial intelligence based algorithms. In some embodiments, the polony map generation algorithm lacks any neural network that has been pretrained and can predict base calls in operation 2812 without additional training for predicting the polony map. In some embodiments, the polony map generation algorithm utilize traditional algorithms that lacks artificial intelligence.
  • the neural network in operation 2804-2806 is the same pretrained neural network used in operation 2812.
  • the same pretrained neural networks may include identical parameters, layers, and neural network structures therewithin.
  • the same pretrained neural networks may include an identical number of parameters, an identical number of layers, and neural network structures therewithin.
  • the method 2800 may further comprise an operation to train the neural network before operations 2804 and 2806.
  • the neural network is trained before operation 2804 and 2806, e.g., using method 700 or 2900 disclosed herein.
  • the pretrained neural network may be used to predict polony locations, polony shape and/or size, polony center locations, or equivalently the polony map.
  • the operation 2800 may further include one or more operations in method 500, e.g., operation 530 and 540, and/or 550 for predicting locations of the polonies, thus predicting the polony map.
  • the same neural network used in operations 2804, 2806, and 2812 may be trained using identical training data including identical flow cell images of samples.
  • the identical training data may also include identical “ground truths” or references in training.
  • the same neural networks may comprise identical values for parameters, identical number of layers, and identical neural network structures.
  • the same neural network used in operations 2804, 2806, and 2812 may be trained using at least a different portion of the identical training data.
  • the same neural networks may comprise identical parameters with identical or different values for such parameters, identical layers, and identical neural network structures. Training of the same neural network may be performed before operation 2804 and does not require retraining the neural network after operation 2806 and before operation 2812. The pretrained neural network may then be used in operations 2804-2806 and operation 2812 without retraining to allow fast and efficient prediction of the base calls using methods 2800.
  • the pretrained neural network may be used in operations 2804-2806 to update an existing polony map.
  • the existing polony map may be generated in an earlier cycle of the sequencing run.
  • the predicted polony map using the pretrained neural network may be used to update the existing polony map in a later cycle of the sequencing run.
  • an initial polony map may be generated by a non-neural network algorithm in the first cycle or first several cycles, e.g., cycles 1-4, of the sequencing run.
  • the neural network may be trained using data of the first cycles or a number of cycles, e.g., cycles 1-4 or cycles 1-5.
  • the pretrained neural network then can be used to predict a second polony map that can be used to update the initial polony map.
  • the second polony map may advantageously reselect more accurate and reliable locations of the polonies for making predictions of base calls, intensities, or classifications, e.g., in operation 2812.
  • Such prediction may be repeated by training the neural network with different cycles that has been completed in the sequencing run to improve reselection of polony locations.
  • the trained neural network may be retrained using data of cycles 1-6 or 1-7 following the training using data from cycle 1-4, and make another prediction of the polony map after the training.
  • the same neural network used in operation 2806 and 2816 may be trained using different reference information as the “ground truth” in training.
  • the training of the neural network for predicting polony locations may use reference intensities as the “ground truth,” while the training of the neural network for predicting base call may use reference base calls as the “ground truth.”
  • the same neural network used in operations 2804-2806 and 2816 may be trained using identical reference information as the “ground truth” in training.
  • the training of the neural network for predicting polony locations may use reference intensities as the “ground truth,” and the training of the neural network for predicting base call may use reference base calls that can be determined based on such reference intensities.
  • the same neural network may be trained to predict base calls using various training methods, e.g., method 2900 disclosed herein.
  • Reference base calls may be used for the training of the neural network.
  • the reference base calls used in training, e.g., using method 2900 may include spatial information thereof.
  • the reference base calls used in training, e.g., using method 2900 may be of a first resolution, a second resolution, or a third resolution. In some embodiments, the third resolution can be higher than the first and second resolution.
  • the same neural network may be trained to predict base calls, e.g., using method 2900. After being trained, such neural network may be used to predict base calls of the second plurality of flow cell images. The prediction of base calls can then be processed for determining locations of the polonies, thereby generating the polony map.
  • the polony map may be determined as the locations at which the base calls are predicted with a probability satisfying a predetermined threshold.
  • the polony locations, thus the polony map may be determined as the locations in which one or more quality metrics satisfy a predetermined threshold.
  • quality metrics can include but is not limited to maximum, medium, or average intensity of the polony among different color channels, a Q score of the base call, a clarity of the base call, and a purity of the base call.
  • the second plurality of flow cell images may be used for generating the polony map at the second resolution using operations 2804- 2806 or operations 2804’-2806.
  • the method may include an operation of generating the polony map based on the first plurality of flow cell images at the first resolution, and an operation of up-sampling to generate the polony map at the second resolution after operation 2810.
  • the first plurality of flow cell images may be provided instead of the second plurality of flow cell images in operation 2804 or operation 2804’ and then the operation 2806 may be replaced by an operation of determining the polony map based on the first plurality of flow cell images.
  • cycle N may be one of the reference cycle(s) for generating the polony map.
  • cycle N may be a cycle different from the reference cycle(s).
  • the polony map can be generated in the reference cycle(s) as a subsequent operation after the methods herein have improved the detectable polony density in flow cell images. Polonies from one or more channels within the reference cycle(s) can be included in the polony in a reference coordinate system, while base calling of cycle N is yet to be performed.
  • cycle N is the current cycle.
  • N can be any non-zero integer.
  • N can be any integer from 1 to 150.
  • N can be any integer from 1 to 20, 1 to 200, 1 to 300, 1 to 500, or 1 to 1000.
  • the polony map disclosed herein can include individual regions within a subtile or subtile. Each polony map can include a plurality of polonies therein. In some embodiments, the polony map can be of about the same size of a flow cell image so that all the polonies, from different tiles, and from multiple channels, can be registered to the same polony map. However, such polony map may contain polonies that will not be used in at least some operations described herein to reduce computational burden without sacrificing accuracy. In some embodiments, more than one polony map can be generated, and each corresponds to at least part of a subtile of a flow cell image from a channel. The more than one polony map may be tiled together in order to cover the entire sample region of the flow cell device.
  • the polony map disclosed herein can include polonies that are within individual cells or tissue, or on the membrane thereof. In some embodiments, the polony map disclosed herein can exclude polonies or signal spots that are outside cell boundaries. In some embodiments, the polony map disclosed herein can exclude duplicate polonies, such duplication may occur at different z-locations, with one or more in-focus and/or out-of-focus in the flow cell images. The duplicate polonies may be within the same flow cell image or in different flow cell images.
  • the polony map herein can be initialized as a virtual image that has a black or dark background with no signals from polonies.
  • the polony map can be initialized to be zero or include otherwise minimal image intensity at all pixels.
  • the intensity of the polony can be added to the polony map at the location determined by the coordinates and with the size and shape determined based on registration.
  • the polony map can be a virtual image that combines image intensity from polonies obtained from 2, 3, 4, or even more channels at the reference cycle.
  • the pixels of the template containing no polonies in them remains to be black or dark so that the polony map can have a cleaner background without noise that appear in actual flow cell images.
  • the polony map includes a list of entries, and each entry corresponding to information for identifying a corresponding polony.
  • each entry can include spatial coordinates of the corresponding polony center in the reference coordinate system, and image intensity of the polony.
  • the entry may also include a unique identification number of the polony.
  • the polonies can be from a subtile of flow cell images within a reference cycle, and more specifically, from one or more selected regions of the subtile.
  • the flow cell images can be from different channels of 1, 2, 3, 4, or more channels of the system 100.
  • a reference cycle can be any cycle of the first 5 or 6 cycles.
  • the reference cycle can be any cycle that is greater than 0.
  • the reference cycle is the first cycle.
  • the processing steps herein comprises performing image processing step(s) herein to adjust image intensities of polonies.
  • the image processing steps comprise one or more of the following: background subtraction; image sharpening; intensity offset adjustment; color correction; intensity normalization; phasing and prephasing correction; image registration; quality score estimation, or the like .
  • the image registration is configured to align images from different cycles and/or different channels, for example, with respect to a template image (i.e., a polony map) or a reference coordinate system.
  • the image registration herein is configured to register polonies or clusters from different cycles and different channels, to a template image or a reference coordinate system.
  • the method 2800 can comprise an operation 2812 of: (iii) predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network; or predicting, by the first reconfigurable device or the integrated circuit, one or more classifications corresponding to one or more pixels of the second plurality of flow cell images using the neural network.
  • the operation 2812 of performing base calling may be based on the second plurality of flow cell images.
  • the operation 2812 may be further based on the determined polony map in operation 2804 or 2804’.
  • the second plurality of flow cell images may be from one or more color channels, one or more z levels, and/or one or more cycles.
  • the prediction of base calls in operation 2812 can be performed using intensity of the polonies.
  • the second plurality of flow cell images may be from a single color channel, a single z level, and/or a single cycle.
  • the prediction of base calling can be performed using intensity of the polonies from a single color channel and one or more cycles. For example, flow cell images acquired from each color channel of the multiple color channels in multiple cycles may use a different pre-trained neural network for predicting the polony intensity of the corresponding channel.
  • the prediction in operation 2812 of base calling can then be performed using intensities of the polony from different color channels.
  • flow cell images from a single z level may require a different pre-trained neural network for predicting the base calls from a different z level using operation 2812.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from different color channels, multiple z levels, and multiple cycles.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from different color channels, a single z level, and multiple cycles.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from a single color channel, one or more z levels, and one or more cycles.
  • prediction of base calling in operation 2812 can be performed using intensity of the polonies from one or more color channels, one or more z levels, and one or more cycles.
  • the operation 2812 (iii) may include generating outputs that includes base calls, e.g., A, T, C, G, and/or U for one or more pixels of the second plurality of flow cell images.
  • the one or more pixels may be determined using a polony map or a location list of polonies disclosed herein so that each pixel of the one or more e pixels is comprised in at least one polony in the polony map.
  • the operation 2812 of (iii) may comprise generating outputs that includes base calls, e.g., A, T, C, G, and/or U for one or more pixels of the second plurality of flow cell images.
  • the operation 2812 of (iii) may comprise generating outputs that includes classifications, e.g., A, T, C, G, U, and/or background for one or more pixels of the second plurality of flow cell images.
  • the one or more pixels may include pixels that are not included in the polony map or the location list disclosed herein. For example, the one or more pixels may include all pixels within the FOV of the second plurality of flow cell images.
  • the one or more pixels include at least one pixel that is not comprised in any polony of the polony map. In some embodiments, the one or more pixels include at least one pixel that is comprised in the background of the polonies comprise noise signal(s). In some embodiments, the one or more pixels include at least one pixel that is not comprised in any polony in the polony map and at least one pixel that is comprised in at least one polony in the polony map. In some embodiments, the one or more pixels include at least one pixel that is not within a cell membrane or on the cell membrane.
  • FIGS. 3E -3F show comparison of accuracy of identifying transcripts (corresponding to polonies) using the neural network methods herein (“new algorithm”), e.g., 2800, and a traditional non-neural network based algorithm (“POR-YOLO”).
  • new algorithm e.g., 2800
  • POR-YOLO traditional non-neural network based algorithm
  • simulated flow cell images of in situ sample with multiple cells are used. Each area may include a number of targets ranging from 0 to 4000. Such targets can be transcripts.
  • the neural network herein, e.g., in method 2800, and a classic non-neural network based algorithm are used to predict/detect transcripts in such cells. And the prediction/determination is then compared with ground truths (or equivalently, the reference polony map) for accuracy.
  • the correct number of targets per area is higher using the neural network and method disclosed herein than using the non-neural network based algorithm.
  • the detected targets per area using the neural network and methods herein are much higher (2x or 3x higher) than that detected by the non-neural network based algorithm when the target density per area is greater than 2000 per area.
  • FIG. 3F shows the false negative per cell for both the neural network (“new algorithm”) and non- neural network based algorithm (“POR-YOLO”).
  • the false negative per area using the neural network and methods herein are much lower (lOx or more) than that detected by the non-neural network based algorithm when the target density per area is greater than 1000 per area.
  • FIG. 3G shows comparison of accuracy of identifying transcripts using the methods, e.g., 2800, and a traditional non-neural network based algorithm.
  • simulated flow cell images of in situ sample with multiple cells are used. Each cell may include a number of transcripts ranging from 0 to 6000.
  • the neural network herein, e.g., in method 2800, and a classic non-neural network based algorithm are used to predict/detect transcripts in such cells. And the prediction/determination is then compared with ground truths (or equivalently, the reference polony map) for accuracy.
  • the R 2 values show correlations of the prediction/determination with the references.
  • the neural network and method herein e.g., method 2800, showed consistently higher correlation with all the different numbers of transcripts per cell than the correlation using classic non- neural network based algorithm, thereby indicating higher accuracy in identifying polonies or clusters in flow cell images (e.g., transcripts) of in situ samples.
  • the method 500, 2800 may include an operation of determining a biological analyte including but not limited to a morphological feature, a transcript, a RNA, a mRNA, a protein, or their combinations based on the base calling or classification of the polony in one or more sequencing cycles.
  • base calling or classification sequence of a polony in 6 consecutive sequencing cycles of ATTCGA may indicate a cellular protein that may be labeled by the unique barcode of “ATTCGA.”
  • the method 2800 further include an operation (iv) of: in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background (e.g., the classifications may include A, T, C, G, U, or background), determining a first morphological feature, a first RNA or mRNA, or a first protein based on the one or more predicted classifications.
  • the method 2800 further include an operation (v) of in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification (e.g., the classifications may include A, T, C, G, U, or background), determining a second morphological feature, a second RNA or mRNA, or a second protein based on the one or more predicted classifications.
  • the classifications may include A, T, C, G, U, or background
  • the method 2800 further include an operation (iv) of determining a first morphological feature, a first RNA or mRNA, or a first protein based on predicted base calls of a first pixel in one or more cycles. In some embodiments, the method 2800 further include an operation (v) of determining a second morphological feature, a second RNA or mRNA, or a second protein based on predicted base calls of a second pixel in one or more cycles.
  • the method 2800 further include an operation of determining a spatial relationship of the first pixel and the second pixel which may include one or more of visualizing the first and second pixels within a common coordinate system, calculating a spatial distance in 2D or 3D between the first and second pixels; and determining whether the first and second pixels are within a same polony or not.
  • the method 2800 further comprises: (iv) in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background classification, determining at least a first target of a first morphological feature; a first RNA or mRNA; and a first protein based on the one or more predicted classifications; and (v) in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification, determining at least a second target different from the first target from: the first morphological feature; the first RNA or mRNA; and the first protein based on the one or more predicted classifications.
  • the second target is of a different type of target from the first target (e.g., a protein vs. a morphological feature) thereby advantageously enable multi-omics analysis and research of the biological analyte(s) of interest using the methods herein.
  • the first target and the second target correspond to the biological analyte(s) of the sample.
  • the method 2800 further comprises: spatially aligning the location of the first and the second targets based on the one or more predicted classifications; and determining a biological analyte of the sample immobilized on the support based on the spatial alignment.
  • the methods 2800 herein advantageously allow spatial alignment or in other words, co-localization of two or more different biological analytes using the neural network disclosed herein.
  • Such different biological analyte may be of a different type.
  • a first biological analyte may be a morphological feature
  • a second biological analyte may be a protein or mRNA.
  • Such different biological analytes may be sequenced within a same sequencing run in same or different sequencing cycles.
  • Exemplary embodiment of staining and sequencing different target analytes within cells or tissue are disclosed in PCT application No. PCT/US2025/10310, filed January 3, 2025, the contents of which are incorporated by reference in their entireties.
  • the number of different biological analytes may be limited by the availability of unique barcodes that may be used to differentiate the biological analyte from others.
  • the number of different biological analytes can be in a range from 2 to 100, 4 to 350, 10 to 500, 50 to 1000, or more.
  • protein A may be localized to be within the nucleus of a specific cell type, while protein B may be localized to be adjacent to a certain transcript within the mitochondria but not within the cytosol based on the prediction of intensities, base calling, and/or classification in one or more cycles using methods 500 or 2800.
  • Identification of such different biological analytes may advantageously provide more information, e.g., spatial relationships, which may facilitate biological, physiological, or pathological analysis of the sample(s) being sequenced.
  • the biological analytes herein may be any physical features of the sample(s) or source of sample(s).
  • the detection, localization, and spatial alignment of the biological analytes may correspond to various physiological, biological, pathological characteristics of cells or tissue which may advantageously provide information that may advance understanding of cellular function, regulation, and interactions which in turn may advance existing biomedical research, including but not limited to, more effective disease modeling and drug discovery efforts.
  • the method 2800 further comprises an operation of (iv) determining a location of one or more of a first morphological feature, a first RNA or mRNA, a first transcript, and a first protein based on the corresponding location of the one or more predicted base calls or predicted classifications. In some embodiments, the method 2800 further comprises an operation of (v) determining a location of one or more of: a second morphological feature, a second RNA or mRNA, a second transcript, and a second protein based on the corresponding location of one or more second predicted base calls or predicted classifications.
  • the method 2800 further comprises an operation of (vi) spatially aligning the location of one or more of: a second morphological feature, a second RNA or mRNA, and second protein with the location of one or more of: the first morphological feature, the first RNA or mRNA, and the first protein; and an operation of (vii) determining a biological character of the sample immobilized on the support based on the spatial alignment.
  • the method 2800 may include an operation of saving the base calls obtained in operation 2812 in a predetermined format, e.g., in a FastQ file compatible with subsequent operations so that subsequent analysis such as adaptor trimming and secondary analysis can be performed.
  • the method 2800 may include an operation 2812 of (iii) performing, by the processor, a corresponding base calling for each of the determined polonies.
  • the operation 2812 comprises extracting a plurality of patches from the second plurality of flow cell images based on the polony map.
  • the polony map may be generated using various algorithms, for example, from operation 2804 or 2804’.
  • the operation 2812 further comprises providing input to the neural network, the input comprising the plurality of patches, wherein each patch comprises one or more patch images from the multiple color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images; and predicting a plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch.
  • each corresponding patch comprises a polony located at or in close vicinity to a center of the corresponding patch.
  • the polony may be no more than 1 to 10 pixels away from the center of the corresponding patch.
  • each patch comprises 3 to 128 pixels along a spatial dimension, e.g., along x or y direction.
  • the size of the patches are maintained to be relatively small comparing to the size of the flow cell images, e.g., lOx, 20x, 50x, lOOx, 500x, lOOOx or less than the size of the flow cell image.
  • the plurality of patches comprises 100 to 10 8 patches.
  • each patch may contain more than one, two, three, five, or ten polonies therewithin, but only the pixel(s)of the single polony at its center is used for generating base call(s) corresponding to the patch.
  • a first patch may include pixels 1- 32 in both x and y directions to cover a polony centered at pixels (16, 16) of the flow cell images
  • a second patch may include pixels 2-33 in both x and y directions to cover a second polony centered at pixels (17, 17.5)
  • a third patch may include pixels 5-36 in both x and y directions to cover a third polony centered at pixels (19, 19) of the flow cell images.
  • a very limited number of polonies in each patch may be used instead of using only the single polony for generating reference base calls.
  • the very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100.
  • the very limited number of polonies can be lOOx, lOOOx, 10 4 x, 10 5 x, 10 6 x, 10 7 x, or 10 8 x less than a total number of polonies in a corresponding flow cell image.
  • the number of pixels within each patch can be optimized to balance the computational complexity and spatial context information to be included for training the neural network(s).
  • the number of patch images within each patch can be optimized to balance the computational complexity and the spatial context information within each patch for accurate and reliable prediction using the neural network.
  • the number of pixels within each patch can be at least partly based on polony density of the sample being imaged.
  • each patch may include multiple pixels, but prediction may only be performed for a single polony at or near the center of the patch. In training the neural network, e.g., using methods 2900, for predicting the base call, similarly reference base calls are only for a single polony at or near the center of the patch.
  • a very limited number of polonies in each patch may be used for training the neural network(s) or making predictions.
  • the very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100.
  • the very limited number of polonies can be lOOx, lOOOx, 10 4 x, 10 5 x, 10 6 x, 10 7 x, or 10 8 x less than a total number of polonies in a corresponding flow cell image.
  • each patch may comprise multiple patch images corresponding to different color channels.
  • each patch may comprise a patch image covering same pixels within the x-y plane in three different color channels. The same pixels may be pixels determined after registration to correct for the spatial offset across different color channels.
  • each patch may comprise multiple patch images corresponding to different cycles, e.g., continuous cycles n-1, n, n+1, within a sequencing run.
  • each patch may comprise 3 images, each from a different color channel in 4 adjacent cycles, so that each patch may comprise 12 patch images in total.
  • each patch may include 5 different z levels to make the total number of patch images of 60.
  • At least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise some identical pixels.
  • each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
  • the first plurality of flow cell images are acquired only from a single color channel so that flow cell images acquired from different color channels may require different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
  • the first plurality of flow cell images are acquired only from a single z level, so that flow cell images acquired at different z levels of 3D sample(s), e.g., in situ cells, may require different neural network for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
  • the first plurality of flow cell images are acquired from the one or more cycles.
  • the one or more cycles comprises a plurality of cycles in a sequencing run.
  • the one or more cycles comprises a current cycle N, and the first plurality of flow cell images are acquired from at least one cycle prior to the current cycle N.
  • the current cycle N is a cycle in which sequencing is currently being performed in of a sequencing cycle.
  • the flow cell images may have been acquired in the current cycle N, but no flow cell images have been acquired in the next cycle N+1.
  • the operation 2802 (ii) of providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network comprises: (ii) providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network without providing a polony map or locations of polonies in the second plurality of flow cell images as the input to the neural network.
  • the operation (ii) of method 2800 does not require the input of a polony map, a location list of polonies, or the like to be provided as input to the neural network in order to predict the base calls.
  • the spatial location of the polonies within the flow cell images are not used in predicting the base calling using the neural network.
  • each patch may contain relative spatial information of the polony with respect to the rest of the pixels in the same patch(es) that may be used for predicting the base calling using the neural network.
  • the method 2800 may predict base calling, e.g., in operation 2812, without using the input of a polony map, a location list of polonies, or the like. Instead, the polony map, the location list of polonies, or the like may be used to extract the plurality of patches from the second plurality of flow cell images.
  • the operation of predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: predicting a probability map for each channel of the multiple color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the probability maps. For example, for flow cell images from 4 different color channels, 4 different probability maps may be generated. Each probability map may have the same size and dimension as the flow cell images or covering at least a portion of the flow cell images. Each pixel in the probability map may a probability value corresponding to the channel.
  • pixel (12,12) may have a probability value of 0.2, 0.01, 0.2, and 0.59 in 4 different channels representing nucleotides A, T, C, and G, and the base call of pixel (12, 12) may be determined as the largest probability among probabilities of different color channels, which is 0.59 and correspond to nucleotide G for its base calling.
  • the neural network may be trained to predict probability maps.
  • training of the neural network to predict probability maps can be based on reference polony maps or any equivalent information indicative of polony locations, e.g., a location list of polonies.
  • the neural network to predict probability maps can be trained by comparing each probability map to a corresponding reference polony map.
  • the neural network may be trained to minimize a loss function based on the comparison of the probability map and the corresponding reference polony map.
  • a probability map may be initialized to have random values in each pixel, and the neural network may be trained to produce higher value for pixel(s) corresponding to polonies than pixels corresponding to non-polony structure(s) in the probability map.
  • the sum of values for each pixel in all probability maps of different color channels may add up to a fixed number, e.g., 1, 10, 100, etc.
  • pixel (24, 25) in 3 probability maps corresponding to 3 different color channels may be 0.24, 0.51, and 0.25, which adds up to 1.
  • each base call corresponds to a corresponding patch which includes one or more patch images.
  • the operation of predicting the plurality of base calls using the neural network and based on the input comprises: generating a first single intensity for a first channel of the multiple color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the single intensity.
  • a first single intensity of a first color channel may be determined using prediction by the neural network disclosed herein.
  • the first single intensity may or may not be normalized.
  • the first single intensity may correspond to the single polony of the corresponding patch containing one or multiple patch images of the same polony at adjacent cycles of a sequencing run.
  • the first single intensity may correspond to one of the adjacent cycles, e.g., a current cycle.
  • a base call may be determined based on the first single intensity of the current cycle, e.g., by comparing the first single intensity with other intensities of the same polony from other color channels.
  • the other intensities may be predicted similarly using the same or different neural networks.
  • the method further comprises an operation of predicting a second single intensity for a second channel of the multiple color channels corresponding to the corresponding patch using a second neural network; and determining the base call of the corresponding patch based on at least the first single intensity and the second single intensity.
  • the method further comprises an operation of predicting a second single intensity for a second channel of the multiple color channels corresponding to the corresponding patch using a second neural network or the same first neural network; and an operation of predicting a third single intensity for a third channel of the multiple color channels corresponding to the corresponding patch using a third neural network or the same first neural network; and determining the base call of the corresponding patch based on at least the first, second, and third single intensities.
  • the first, second, and third intensities may be predicted using different neural networks (e.g., each of the neural networks may be trained using different training data but with identical neural network layers and numbers of parameters) to be 50, 690, 80 for the same polony.
  • the base call of the polony may correspond to the nucleotide that lights up in the second color channel with an intensity of 690 but not the first or third color channel.
  • the operation (iii) of predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network comprises: determining two or more pixels of the second plurality of flow cell images as duplications of a single polony; and selecting one pixel of the two or more pixels as a center of the single polony.
  • the two or more pixels may be at a same z level. In some embodiments, the two or more pixels may be at different z levels.
  • Exemplary embodiments of the operation of determining two or more pixels of the second plurality of flow cell images as duplications of a single polony and selecting one pixel of the two or more pixels as a center of the single polony are disclosed in PCT Application No. PCT/US23/76125, and is incorporated herein by reference in its entirety.
  • the methods 500 and 2800 herein may be performed using artificial intelligence-based models other than neural networks.
  • the methods 600, 700 and 2900 may be used to train artificial intelligencebased models other than neural networks for making predictions or inferences using methods 500 or 2800.
  • Some non-limiting examples of the artificial intelligence-based models include: random forest, decision tree, k-mean clustering, and gradient boosted tree.
  • the artificial intelligence-based models may be used to predict intensities, classifications, or base calls by working on intensities from flow cell images and/or the high resolution flow cell images.
  • the artificial intelligence-based models other than neural networks may predict intensities, classifications, or base calls using information only including intensities, and such information may lack spatial context of the intensities, shapes of the polonies, background noise, signal from other cellular structures, etc.
  • the neural networks herein predict intensities, classifications, or base calls by advantageously using the flow cell images or high resolution flow cell images which not only include the intensities but also other information including but not limited to background noise, polony sizes and shapes, spatial relationship among polonies, etc. for more accurate predictions or inferences.
  • the neural network herein is a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the neural network is a 3D CNN.
  • the neural network is a 2D CNN.
  • the neural network comprises one or more convolutional layers.
  • the neural network is a recurrent neural network (RNN).
  • RNN recurrent neural network
  • the neural network is a 3D RNN.
  • the neural network is a 2D RNN.
  • the neural network comprises one or more long short-term memory (LSTM) layers.
  • LSTM long short-term memory
  • the neural network is a U-Net.
  • the neural network includes a residual network (ResNet).
  • the neural network can include a transformer based model like a vision transformer (ViT).
  • the neural network comprises a U-Net with a first predetermined repetition of down-sampling and convolution operations and then a second predetermined repetition of up-sampling, concatenation, and convolution operations.
  • the first and second predetermined repetition can have an identical quantity, e.g., 3 or 4.
  • the neural network is a U-Net with a first predetermined number of filters in each repetition of down sampling, and then a second predetermined number of filters in each repetition of up sampling and/or concatenation.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 256, 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 2812 may comprise: performing, by the processor, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising: (a) performing, by the processor, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (b) performing, by the processor, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result.
  • the second convolution may comprises a corresponding number of filters, thereby generating a third convolution result after the repetitions.
  • the operation 2812 may further comprise: performing, by the processor, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (c) performing, by the processor, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (d) performing, by the processor, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
  • the second convolution may comprise a corresponding number of filters, thereby generating a sixth convolution result after the repetitions.
  • the first convolution comprises a 3D convolution with a convolution kernel.
  • the convolutional kernel may have 4 dimensions.
  • the convolutional kernel is m*m*m for the first three spatial dimensions and the size of its fourth dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be 512x512 flow cell images, and the z-stack can have 12 slices.
  • the first convolution can include 32 filters and each filter has one kernel that is 3x3x3xl.
  • the output from that convolutional block is 512x512x12x32.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x12x32 and the output is 512x512x12x32.
  • Each filter uses a kernel sized 3x3x3x3x32. The number of filters may correspond to features of the input.
  • the second convolution comprises two 3D convolutional layers, e.g., as shown in the pseudo code.
  • the second convolution comprises two repetition or blocks of the first convolution in 3D, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the first convolution comprises a 2D convolution with a convolution kernel.
  • the convolutional kernel may have 3 dimensions.
  • the convolutional kernel is m x m for the first two spatial dimensions and the size of its third dimension is determined by the filter number in the corresponding repetition.
  • m can be an integer in the range of 2 to 20.
  • the input can be flow cell images with a size of 512x512x1.
  • the first convolution can include 64 filters and each filter has one kernel that is 3x3x1.
  • the output from that convolutional block is 512x512x64.
  • a double convolutional block i.e., the second convolution having two first convolutions with 32 filters.
  • the input to both of those blocks is 512x512x64 and the output is 512x512x32.
  • Each filter can use a kernel sized 3x3x32.
  • the second convolution comprises at least two convolutional layers or exactly two convolutional layers, e.g., as shown in the pseudo codes.
  • the second convolution comprises two repetition or blocks of the first convolution, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image.
  • the depth of image may increase as the number of features or filters increases.
  • the first and second resolution is in 2D or 3D.
  • the second convolution in operation (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
  • the second convolution in operation (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • n can be an integer in the range from 8 to 256.
  • operation (a) comprises 32, 64, 128, and 256 filters in three repetitions
  • operation (c) comprises 128, 64, 64, and 32 filters in the corresponding three repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
  • operation (a) comprises 32, 64, 128, and 256 filters in four repetitions
  • operation (c) comprises 256, 128, 64, and 32 filters in the corresponding four repetitions.
  • the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n filters in a last repetition, last minus one, last minus two, repetition, respectively.
  • operation (a) comprises 32, 64, 128 filters in three repetitions and operation (c) comprises 128, 64, and 32 filters in the corresponding three repetitions.
  • the operation 2800 may further comprise: performing, by the processor, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and predicting, by the processing, the second plurality of flow cell images based on the seventh convolution result.
  • Each of the second plurality of flow cell images may correspond to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions.
  • the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
  • the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles.
  • the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 2 -10 15 per mm 2 . In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 10 3 -10 10 2 per mm .
  • the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the first resolution is in a range of 0.01 um to 10 um. In some embodiments, the second resolution is in a range of 0.02 um to 2 um. In some embodiments, the second resolution is in a range of 0.001 um to 3 um. In some embodiments, the down-sampling factor is 2, 4, 6, 8, 16, or more. In some embodiments, the up-sampling factor is 2, 4, 6, 8, 16, or more.
  • one or more of operations are performed while a sequencing run is being performed. In some embodiments, one or more operations are performed in parallel as the corresponding sequencing run to reduce sequencing analysis time.
  • the sequencing analysis time includes a total time required from when the raw flow cell images are acquired in each cycle of a sequencing run to when the base calls for each cycle of the sequencing run are generated. [0410] In some embodiments, the sequencing analysis time includes a total time required from when a sequencing run starts to when the base calls for each cycle of the sequencing run are generated.
  • the sequencing analysis time includes a first time duration to complete a sequencing run and a second time duration to generate base calls for the sequencing run.
  • the first and second time durations may overlap at least partly with each other (e.g., performing base calling while the sequencing run is still in progress) to reduce the sequencing analysis time.
  • the one or more cycles comprises a current cycle N.
  • N may be in a range from 1 to 1000.
  • one or more of operations are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
  • the training data set of training flow cell images comprises z-stacks of training flow cell images taken at different z-locations.
  • Each z-stack may represent an individual FOV of a 3D sample(s), e.g., an in situ cellular sample.
  • the z-axis is orthogonal to image planes of the flow cell images.
  • the training data set of training flow cell images comprises flow cell images from multiple sequencing cycles.
  • One or more sequencing cycles may be of unbalanced nucleotide diversity so that image appear dimmer or the number of polonies are less than images from sequencing cycles of high nucleotide diversity.
  • the number of polonies in the training flow cell images in a particular cycle may vary from 1% to 99% of a total number of polonies within a FOV of that cycle.
  • the number of polonies in the training flow cell image of a particular cycle is from 1% to 5% or 1% to 10% of the total number of polonies within that cycle, it is of low or unbalance diversity.
  • the number of polonies in the training flow cell image of a particular cycle is greater than 10% or 15% of the total number of polonies within that cycle, it is of high or unbalanced diversity.
  • the training data set of training flow cell images comprises flow cell images from multiple samples and multiple sequencing cycles, and the training flow cell images include a subset of flow cell images with unbalanced diversity in multiple sequencing cycles and another subset of flow cell images with balanced diversity in multiple sequencing cycles.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises: performing, by the processor, the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
  • the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
  • operation (a) comprises: performing, by the processor, the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing, for one or more times, operations comprising (c), (d), and (e), wherein (e) is after operation (c) and before operation (e), and wherein (e) comprises: concatenating, by the processor, the first up- sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous down-sampling repetition.
  • operation (e) is in each repetition.
  • repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing operations comprising (c), (d), and (e) in each repetition of one or more repetitions.
  • the kernel may take any size that is smaller than the size of the flow cell image undergoing the convolution.
  • the kernel can be 2 by 2 by 2, 3 by 3 by 3, 4 by 4 by 4, 5 by 5 by 5, 6 by 6 by 6, 10 by 10 by 10 in the first three spatial dimensions.
  • the kernel size can be customized to remove at least some of the noise and unwanted signal that are larger than the kernel size.
  • the kernel can be circular.
  • the kernel can be in various other shapes.
  • the focus of the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis.
  • Polonies or clusters that are within the range of focus can appear in-focus or about in-focus in the flow cell image.
  • Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image, but at different z levels. So, such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurred signal spots represent out-of-focus polonies or clusters. Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
  • Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample.
  • the undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria.
  • Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour.
  • background objects can include any objects within the 3D sample but are not polonies or clusters.
  • the method 2800 include an operation of registering the second plurality of flow cell images.
  • the images are registered across channels and/or across different cycles.
  • the flow cell images are registered before any base calling are performed in operation 2812 or 2804, 2804’.
  • the images are registered across channels and different cycles before generating or obtaining the 3D polony maps.
  • the flow cell images are registered across channels and different cycles before one or more primary analysis steps here.
  • the flow cell images can be registered after one or more preprocessing operations disclosed herein are performed.
  • Various image registration techniques can be used to register the flow cell images.
  • Various image registration techniques can be used to register the images.
  • the flow cell images can be registered using 2D or 3D registration techniques.
  • the operation of registering the flow cell images is with respect to a reference coordinate system. In some embodiments, the operation of registering the flow cell images is with respect to one or more template images.
  • the operation of registering the images can comprise generating the one or more template images in a reference coordinate system. In some embodiments, the operation of registering the images can comprise registering polonies to template polonies in the one or more template images.
  • the operation of registering the images can comprise determining a plurality of transformations based on the one or more template images. Each of the plurality of transformations can corresponds to a corresponding subtile of the flow cell images, the processed images, or the filtered images and configured to register the subtile to the one or more template images. Each transformation can be used to register a corresponding subtile or tile to the one or more template images.
  • the plurality of transformations can comprise one or more affine transformations.
  • the operation of registering the images can comprise performing image registration of the polonies based on fiducial markers.
  • the fiducial markers can be located on the flow cell. Alternatively, the fiducial markers can be external to the flow cell.
  • the image registration herein is configured to align images from different cycles and/or different channels, for example, with respect to a template image or a reference coordinate system. In some embodiments, the image registration herein is configured to register polonies or clusters from different cycles and/or different channels, e.g., in the filtered image, to a template image or a reference coordinate system.
  • the base calling can be performed using the filtered images from different channels in cycle N after the filtered images from different channels are registered relative to the corresponding template image disclosed herein.
  • the location information of such polony can be obtained from the polony map, e.g., 2D coordinates of the polony and the z level.
  • the corresponding flow cell image and its pixel(s) can be determined. Image intensity of such pixels can be extracted from the corresponding processed image after one or more primary analysis steps as intensity of such pixel for performing base calling.
  • the operation of registering the flow cell images may be based on background objects in the flow cell images.
  • the background objects can be used to align the flow cell image to the cell images by using one or more transformation(s).
  • the cell staining images herein are staining images of the sample(s) immobilized on the support, with possible transformation (e.g., translation) from the sample(s) in the flow cell images.
  • the transformation may be represented by a single transformation of the whole image or be separated into multiple transformations, each representing a portion of the whole image.
  • the polonies or clusters can be registered to the cell staining images.
  • the method 2800 may further include an operation of registering the base callings, e.g., of a 3D sample, to the cell staining images containing morphological information of the sample.
  • such registration may be based on fiducial markers.
  • fiducial markers can also be included in the cell staining images. Aligning the fiducial markers can generate the transformation(s) between the flow cell images or between flow cell images and cell staining images. The transformation(s) can be used to register or align polonies or clusters between the sequencing images and the cell images.
  • the fiducial markers can be within the sample or external to the sample.
  • the fiducial markers can be biological features inherent to the sample(s).
  • the fiducial markers may be immobilized on the flow cell but external to the sample.
  • the method 2800 further comprises an operation of determining a location of one or more of: a morphological feature, a RNA or mRNA , and a protein based on the corresponding location of each predicted base call.
  • the samples may be labeled so that the base calls may uniquely identify a morphological feature, a RNA or mRNA, or a protein of the sample in 3D. Such information can be used to advantageously provide nucleotide sequencing in spatial context of the sample.
  • the same pretrained neural network (e.g., with same parameters and neural network structure) can be advantageously used for predicting the polony map and for predicting the base calls.
  • the same neural network herein can be trained before operation 2806 and 2812, and requires no additional training in between the operations of 2806 and 2812.
  • the operation 2806 further comprises predicting, by the first reconfigurable device or the integrated circuit, a base call corresponding to each polony of the second plurality of flow cell images using the neural network at a third resolution; and determining the polony map based on the predicted base calls and a corresponding quality index of each predicted base call at the third resolution.
  • the third resolution is at least 2 to 32 times greater than the first or second resolution in one or more spatial dimensions. In some embodiments the third resolution is greater than the first and second resolution in one or more spatial dimensions. In some embodiments, the third resolution is identical to the first or second resolution in one or more spatial dimensions.
  • the different patches may include some overlapped pixels.
  • the different patches does not include any overlapped pixel.
  • patch 1 may include 12 different patch images, each from one of the 4 different color channels and one of the three consecutive cycles in a sequence run.
  • Patch 2 may also include 12 different patches cropped from non-overlapped pixels of the same flow cell images.
  • Patch 3 may include 12 different patched images, each patch image with more than half of the pixels being identical to the patch images of patch 1.
  • a neural network e.g., CNN
  • the sequencing system herein comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 600, 700, 2900) to train the neural network.
  • a first reconfigurable logic device e.g., a FPGA unit
  • first reconfigurable routing channels each connecting at least some of the first plurality of data processing engines
  • a neural network deployed at least partly on the first reconfigurable logic device
  • a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g
  • the sequencing system herein comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit in data communication with the first reconfigurable logic device; a neural network deployed at least partly on the integrated circuit and/or the first reconfigurable logic device; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations in methods herein (e.g., methods 600, 700, 2900) to train the neural network.
  • methods herein e.g., methods 600, 700, 2900
  • the first reconfigurable logic device and the integrated circuit is within the same physical housing as the other elements of the sequencing system as show in FIG 1. In some embodiments, the first reconfigurable logic device and the integrated circuit is not physically external to the sequencing system 110 as show in FIG 1, e.g., not in the cloud 130.
  • FIG. 5B shows an exemplary method 600 for training the neural network, e.g., CNN, which can be used to predict high resolution flow cell images with improved detectable polony density.
  • CNN neural network
  • training can be done onboard using the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system. In such cases, training may be done using hardware elements within the physical housing of the sequencing system 110 shown in FIG. 1. In some embodiments, training can be performed external to the sequencing system 110. For example, training may be performed using hardware elements over the cloud 130. In some embodiments, training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • the neural network is trained with the same type of flow cell images as which the neural network may make predictions on after being trained.
  • the neural network is trained with 2D flow cell images at multiple z levels and then may be used to predict base calls for 2D flow cell images at multiple z levels to cover a 3D in situ sample.
  • the neural network is trained with 2D flow cell images from a single organ origin and then may be used to predict base calls for 2D flow cell images of samples extracted from the same organ, e.g., liver.
  • the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with z-stacks of flow cell images, training the neural networks with 2D flow cell images reduces the amount of computational effort, and reduces training time and cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
  • the sequencing system comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform operations to train the neural network comprising: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution; (c) generating a training output by inputting the training set to the neural network; (d) repeatedly performing, until the output error
  • the sequencing system comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising: processing sensor data to generate the first plurality of flow cell images, wherein the integrated circuit is configured to perform operations including: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the
  • the system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution; (c) generating a training output by inputting the training set to the neural network; (d) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error; and (e) generating a trained neural network with adjusted parameters.
  • the method 600 for training the neural network comprises an operation 610 of generating a corresponding plurality of training flow cell images for one or more sample(s) with a first resolution.
  • the operation 610 may be performed by simulation, thus the corresponding plurality of training flow cell images may be simulated images of 2D or 3D samples.
  • the simulation can be based on characteristics of actual flow cell images of sample(s). Such characteristics may include but is not limited to: image resolution, FOV, pixel size, and/or characteristics of the optical system, field of depth, point spread function, etc.
  • the operation 610 may be performed using the imager 116 of the sequencing system.
  • the corresponding plurality of training flow cell images may be real images of 2D or 3D samples with a first resolution. It is worth noting that the training flow cell images may be generated based on the characteristics of the sample(s) that predictions are going to be made. For example, for predicting polony locations in 3D samples, the training flow cell image may only include images (simulated or real images) of 3D samples of similar characteristics, e.g., liver samples, kidney samples, etc. As another example, for prediction polony locations in traditional 2D samples, the training flow cell images may only include 2D flow cell images with similar plexity and/or polony density. In some embodiments, the training flow cell images may include a combination of flow cell images, either of 2D or 3D samples, and with or without similar characteristics.
  • the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels.
  • the corresponding plurality of training flow cell images may include z-stacks of flow cell images, each z-stack may include a 3D volume made up from multiple z-levels of flow cell images comprised in the z-stack.
  • the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels (2D images) but not a z-stack of flow cell images.
  • the training data set of flow cell images comprises simulated flow cell images of in situ samples at different z-locations.
  • the training data set of flow cell images comprises actual flow cell images acquired from in situ samples at different z-locations.
  • polony locations are identified in such actual flow cell images at a sub-pixel resolution to provide the high resolution “truth maps” in the training data set. Identification of polony or cluster locations at a sub-pixel resolution, e.g., at 0.02 pixel, 0.05 pixel, 0.1 pixel, 0.25 pixel, etc., may be performed using various image processing methods. For example, embodiments of identification of polony or cluster locations at a sub-pixel resolution has been disclosed in U.S. Patent No. 11,200,446, and is incorporated herein by reference in its entirety.
  • the method 600 comprises an operation 620 of (1) up- sampling, by the processor, the corresponding plurality of training flow cell images for each cellular sample to a second resolution to generate a reference set comprising high resolution training flow cell images or (2) generating, by the processor, a reference set of reference flow cell images at a second resolution higher than the first resolution, each reference flow cell image in the reference set corresponding to an individual image of the corresponding pluralities of training flow cell images.
  • the operation of up-sampling in (1) can be based on the imaging process. For example, the point spread function can be virtually improved by 4x if the up-sampling is to achieve 4x spatial resolution. In some embodiments, the operation of up-sampling is in 2D.
  • each corresponding plurality of training flow cell images may include a z-stack with more than one z levels to cover a 3D volumetric sample.
  • the resolution in x and y may be different from the resolution in z direction.
  • FIGS. 2A and 2D-2E show exemplary flow cell images that are generated for training the neural network, e.g., CNN.
  • the simulated flow cell images with higher resolution e.g., FIG. 2E
  • Such images are used as “ground truth.”
  • such images have no signal originating from pixels other than the polonies.
  • such images have no signal originating from cellular background in the sample(s).
  • such images may include features that are specific to polonies in flow cell images during sequencing runs, such as polony intensity, polony shape, pattern of distribution (e.g., within regions determined by the cell boundaries).
  • the method includes generating simulated flow cell images with low resolution, e.g., FIG. 2D, which mimic the real flow cell images that a user would acquire during sequencing of cells and are included in a training set.
  • Such simulated flow cell images may have polony features, cell features, background, noise, etc.
  • the low resolution simulated flow cell images may then be up-sampled to be at the high resolution.
  • the simulated images may include a z-stack of flow cell images taken at different z-locations to simulate flow cell images of a volumetric sample.
  • generating simulated images may add additional computational load to the training process, and may require specific criteria in order to mimic polony features and other information may be contained within the real flow cell images during sequencing.
  • simulated images may remove possible imaging artifacts, e.g., caused by vibration, over-heating, bubbles, etc., and avoid training on such distracting features that are not part of the polonies in the sample and may reduce accuracy and reliability of training the neural network.
  • the training set may include flow cell images from different cell geometries, different in situ samples, different image intensities, different polony densities, different nucleotide diversities, etc.
  • the method 600 comprise an operation 630 of providing, by the processor, the training set as inputs to the neural network to generate corresponding training outputs.
  • Each corresponding training output may include output flow cell images, e.g., a z-stack of output images.
  • the method 600 comprises the operation 640 of repeatedly training the neural network, e.g., CNN, by performing one or more operations until the output error satisfies a stopping criterion.
  • the training operation 640 comprises one or more operations including: the operation 655 of determining an output error by comparing the training output and the reference set; and the operation 660 of adjusting current values of parameters of the convolutional neural network based on the output error. Determining the output error can be based on various metrics.
  • the metrics can include minimum mean square error of images intensities from some or all of the pixels of the training output to the corresponding z-stack in the reference set.
  • Values of the parameters of the neural network can be adjusted based on the output error or one or more previous output errors.
  • the stopping criterion can be customized based on but not limited to training time, computational complexity, required accuracy, power consumption, and/or convergence rate.
  • the stopping criterion can be (1) stop after 10 epochs to reduce training time.
  • the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0.
  • z-stacks of training flow cell images from a same color channel can be used to train the neural network, e.g., CNN, for that particular channel.
  • a certain percentage, e.g., 80%, of the training set may be used for training, and the rest of the training set, e.g., 20%, may be used for validation.
  • Batch size can be one
  • Epochs can be about 10, 12, 15, 20, or more.
  • Various optimizers can be used.
  • the convolutional neural network comprises one or more U-Net units.
  • comparing the training output to the reference set comprises: calculating mean square error in image intensity of one or more pixels in each pair of an image from the reference set and a corresponding image from the training output. In some embodiments, comparing the training output to the reference set comprises: determining one or more values of a loss function. In some embodiments, each pair of the image from the reference set and the corresponding image from the training output comprises a same image size, a same field of view, a same resolution, or a combination thereof. In some embodiments, the one or more pixels excludes pixels that are outside of cell boundaries. In some embodiments, the cell boundaries are determined based on image segmentation of cell boundaries of the high resolution flow cell images in the reference set.
  • the method 600 includes an operation 670 of generating a trained neural network with the adjusted values in parameters obtained in operation 660.
  • the trained neural network may be used to predict high resolution intensities that can be used to determine high resolution base calls of flow cell images, e.g., using method 500.
  • FIG. 5E shows an exemplary method 700 for training the neural network, e.g., CNN, which can be used to predict high resolution flow cell images with improved detectable polony density.
  • CNN neural network
  • training of the neural networks using the methods 600, 700 may utilize training images that are real flow cell images of samples, simulated flow cell images, or a combination thereof.
  • Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue.
  • Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity.
  • the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc.
  • the values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s). For example, the error rate of base calling using a first neural network trained on simulated flow cell images can be determined in comparison with base calling using an existing primary analysis method without neural networks. The error rate in base calling using a second neural network trained using real images of in situ sample can also be obtained in comparison with base calling using the existing primary analysis method without neural network.
  • the error rate in base calling using the first neural network can be higher than the error rate in base calling using the second neural network.
  • the error rate in base calling using the first neural network can be 2x, 3x, 4x, 5x, 6x, lOx, or higher than the error rate in base calling using the second neural network.
  • training of the neural networks herein can be completed using only the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system 100.
  • training can be performed at least partly external to the sequencing system. For example, at least part of the training may be performed using hardware over the cloud.
  • the sequencing system 110 comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network (e.g., trained neural network) deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations of the sequencing method 600, 700; a second processor or the first processor to control the integrated circuit to perform one or more operations of the sequencing methods 600, 700 to facilitate generating the sequencing analysis result(s).
  • a first reconfigurable logic device e.g., a FPGA unit
  • an integrated circuit e.g., a NPU chip or Al
  • the operations performed by the first reconfigurable logic device may comprise processing or receiving sensor data to generate the first plurality of flow cell images after operation 705.
  • the first reconfigurable logic device or the integrated circuit is configured to perform operations including operation 715 of up-sampling the corresponding plurality of training flow cell images to generate high resolution training flow cell images having a second resolution.
  • the sequencing system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform one or more operations of the methods 600, 700, 2800, and/or 2900.
  • training the neural network using the methods 600, 700, and/or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 20x, 40x, 60, 80x, lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods 600, 700, and/or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips may require at least 2x, 8x, 10, 15x, 20x, 40x, 50x, or lOOx less power than training the same neural network(s) with identical training images using CPUs or GPUs.
  • the sequencing system further comprises: a power source that is configured to supply identical or different power levels to the first reconfigurable logic device and the integrated circuit.
  • a maximum power output of the power source to the sequencing system in training the neural network using methods 600, 700, and/or 2900 is less than 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, or 300 Watts.
  • the neural network is trained with traditional 2D flow cell images at a single z-level.
  • each neural network is trained with 2D flow cell images at a single z-level, and multiple neural networks may be trained to cover a 3D volumetric sample, e.g., in situ sample.
  • the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with 3D flow cell images (3D volumetric image), training the neural networks with 2D flow cell images reduces the amount of computation, training time and training cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
  • the sequencing method 700 comprises an operation 705 of acquiring, by the imager 116 of the sequencing system 110, a training set comprising corresponding a plurality of training flow cell images with a first resolution.
  • the first resolution can be a standard resolution that can be achieved using the imager disclosed herein.
  • the first resolution can be within the range from 0.01 um to 15 um.
  • the first resolution can be within the range from 0.1 um to 5 um.
  • the plurality of training flow cell images in the training set can be from one or more color channels.
  • the plurality of training flow cell images in the training set can be from 2, 3, 4, or more color channels.
  • the plurality of training flow cell images in the training set can be from one or more cycles.
  • the one or more cycles can be any number ranging from 1 to 10, 1 to 20, Ito 30, 1 to 50, 1 to 100, 1 to 200, or 1 to 500.
  • the plurality of flow cell images can be at a single z level or multiple z levels.
  • the sequencing method 700 comprises an operation 715 of up-sampling, by the sequencing system, the corresponding plurality of training flow cell images to generate high-resolution training flow cell images having a second resolution.
  • the second resolution can be 2x, 4x, 8x, 16x, or higher than the first resolution.
  • the first resolution is in the range from 0.01 um to 5 um
  • the corresponding second resolution that is 4x higher than the first resolution can be in the range from 0.0025 um to 1.25 um.
  • Various up-sampling methods can be used for generating the high- resolution training flow cell images.
  • Each high-resolution training flow cell image corresponds to a training flow cell image at the first resolution.
  • the operation 715 is optional.
  • the high resolution images may be directly generated via computer simulation or acquisition using the sequencing system disclosed herein.
  • the sequencing method 700 comprises determining, by the sequencing system, a location list of polonies in the plurality of flow cell images; and extracting, by the sequencing system, intensities in the plurality of flow cell images based on the location list. [0475] In some embodiments, the sequencing method 700 comprises determining, by the sequencing system, a location list of polonies in the high resolution training flow cell images; and extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
  • the sequencing method 700 comprises an operation of processing the high resolution training flow cell images to determine a location list of the polonies (e.g., bright spots in the image) and their processed intensities. Their processed intensities may have been processed using standard image processing such as background noise reduction, filtering, and intensity normalization.
  • the operation of processing the training flow cell images or the high resolution training flow cell images can include polony map generation using the methods disclosed in details in U.S. Patent No.
  • the method 700 comprises an operation 725 of generating, by the sequencing system, reference intensities corresponding to the intensities (e.g., processed intensities) in the high resolution training flow cell images based on base calls of the high resolution training flow cell images.
  • the operation 725 may be based on the location list so that only signals from polonies identified are used for generating the reference intensities, other signals, including background noise, possible artifacts from cellular structures in the images can be excluded.
  • the operation 725 may be based on one or more image processing steps of the training flow cell images (e.g., cell segmentation, cell contouring, noise removal) so that only signals from polonies that are within an area of interest (e.g., within cells) are used for generating the reference intensities.
  • image processing steps of the training flow cell images e.g., cell segmentation, cell contouring, noise removal
  • At least part of the one or more samples comprises predetermined bases in the one or more cycles.
  • the base calls for at least some of the polonies in the flow cell images in cycle(s) are predetermined.
  • the base calls can be predetermined by sequencing known barcode sequences in the one or more cycles.
  • the operation of generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity.
  • the intensities may undergo color correction, phasing/dephasing, normalization, and/or other corrections to reach the reference intensities.
  • the intensities may undergo de-noising to generate the reference intensities. As a nonlimiting example, as shown in FIG.
  • the intensities of the high resolution training flow cell images from two different channels are plotted. Each plot is plotted as a dot with its corresponding intensity in channels 1, 2, 3, and 4. Based on the predetermined base call, the polonies within area 790 would have a base call of A, thus, corresponding reference intensity of each polony having a base call of A can be obtained by projecting the dots to the fitted line in the region 790, e.g., projection with the shortest distance. Then vertical axis of the projected intensity on the line may be the reference intensity of the polonies in channel 2, and the horizontal axis of the projected intensity would be the reference intensities in channel 1.
  • corresponding reference intensity of each polony in area 791 can be obtained by projecting the dots to the fitted line in the region 791, e.g., projection with a shortest distance. Then horizontal axis of the projected intensity on the line may be the reference intensities of the corresponding polonies in area 790 in channel 1, and the vertical axis of the projected intensity on the fitted line in area 791 may be the reference intensity for the corresponding polonies within area 791 in channel 2. Similar projection may be performed for polonies plotted in the right panel for channels 3 and 4.
  • noises and artifacts such as noise correlated with different channels, e.g., channel optics, illumination, etc.
  • reference intensity determination can be based on various methods for noise reduction and is not limited to the shortest distance projection in FIG. 5F.
  • the algorithm for determining the reference intensity may be iterative such that the reference intensities obtained in earlier iteration(s) can be improved based on customized quality criteria in later iterations.
  • the number of repetitions can be various numbers in a range from 1 to 10, 1 to 100, or more.
  • later iterations can use a different projection method that generates a smaller total distance to the fitted line as shown in FIG. 5F than the projection method that was used in earlier iteration(s).
  • the sequencing methods 700 may include an operation 730 of providing the reference intensities for comparison to training output(s) of the neural network.
  • the reference intensities may be provided as flow cell image(s).
  • the reference intensities may be provided as a list of intensities corresponding to their locations in the flow cell images, e.g., as a array with a first column of reference intensity values and a second column with corresponding spatial coordinates of the reference intensity value. It is advantageous to use the list of intensities to save storage space, reduce data size, and allow efficient data communication.
  • the input to the neural network may also include the location list.
  • the operation 730 comprises an operation of providing the reference intensities in a plurality of patches for comparison to training output(s) of the neural network, wherein each patch comprise one or more patch images from one or more color channels, one or more cycles, one or more z-levels, or a combination thereof .
  • the patches of the flow cell images may be used for training.
  • Each patch may comprise one or more patch images cropped from the flow cell images (e.g., the second plurality of flow cell images).
  • the training method 700 is configured to train the neural network for predicting one or more base calls within each individual patch, e.g., a single base call at or close to the center of the patch.
  • the one or more base calls may be much less than the total number of base calls in the flow cell images.
  • the one or more base calls may be lOx, lOOx, 500x, lOOOx, 5000x, 10 4 x, 10 5 x, 10 6 x, or more times less than the total number of base calls in the corresponding flow cell images.
  • the method of training using patches of flow cell images does not require training of a large number of polonies (e.g., 1000 polonies) within a patch, thus may advantageously reduce computational complexity and increase training efficiency and accuracy.
  • the sequencing method 700 herein include an operation 740 of repeatedly performing, until the output error satisfies a stopping criterion, one or more training operations comprising: an operation 755 of determining an output error by comparing the training output to the reference intensities; and an operation 760 of adjusting current values of parameters of the neural network, e.g., CNN, based on the output error.
  • the operation 740 repeats itself using its output (e.g., adjusted parameters of the neural network) from the previous iteration as input to the current iteration.
  • the output error may be based on a comparison between the reference intensities and the predicted intensities during an iteration of training.
  • the comparison may be limited to those intensities and locations included in the location list. In some embodiments, the comparison may be limited to only a subset of intensities and corresponding locations in the location list.
  • the operation of 740 may stop when a stop criterion is met.
  • the stop criterion can be customized.
  • the stopping criterion can be customized based on training time, computational complexity, convergence rate, and/or various other metrics.
  • Exemplary stopping criterion may include a fixed number of iterations, a fixed duration of training time, or a loss function belong a threshold.
  • the stopping criterion can be (1) stop after 10 epochs to reduce training time.
  • the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0. Determining the output error can be based on various metrics, e.g., a loss function.
  • Nonlimiting examples of the loss function can include: the sum of root mean square of the difference between the predicted intensities and the corresponding reference intensities based on the location list, or the sum of mean square errors.
  • the method 700 may further comprises an operation 770 of generating the trained neural network with the adjusted parameters obtained in operation 760, e.g., in the last iteration or any other iterations during the repetition of operation 740.
  • the trained neural network may then be used to predict high resolution intensities that can be used to determine high resolution base calls of flow cell images, e.g., using methods 500.
  • FIG. 29 shows an exemplary method 2900 for training the neural network, e.g., CNN, which can be used to predict polony locations (e.g., in operation 2804-2806), intensities of polonies, base calls, and/or classifications of one or more pixels.
  • the prediction using the neural network trained by method 2900 may advantageously allow improved detectable polony density in the sample(s).
  • predicting base calls using method 2800 (with operation 2804’, without predicting the polony map using the neural network in operation 2804, and with predicting the base calls using the neural network in operation 2812) at a polony density of 300,000/mm 2 or greater, e.g., 750,000/mm 2 , produces an error rate in base calling that is lower than the error rate of base calling using the classic non-neural network based algorithm.
  • predicting base calls using method 2800 (with operation 2804’, without predicting the polony map using the neural network in operation 2804, and with predicting the base calls using the neural network in operation 2812) at a polony density of 750,000/mm 2 , produces an error rate in base calling that is 40%, 50%, 60%, 70% or less of the error rate of base calling using the classic non-neural network based algorithm.
  • predicting base calls using method 2800 (with predicting the polony map using the neural network in operation 2804 and predicting the base calls using the neural network in operation 2812) at a polony density of 300,000/mm 2 or greater, e.g., 750,000/mm 2 produces an error rate in base calling that is lower than the error rate of base calling using the classic non-neural network based algorithm.
  • predicting base calls using method 2800 (with predicting the polony map using the neural network in operation 2804 and predicting the base calls using the neural network in operation 2812) at a polony density of 750,000/mm 2 produces an error rate in base calling that is 50%, 40%, 30%, 20%, 10%, 5% or less of the error rate of base calling using the classic non-neural network based algorithm.
  • training of the neural networks using the methods 600, 700, or 2900 may use training images that are real flow cell images of samples, simulated flow cell images with distribution of signal spots and noise level that is similar to real flow cell images, or a combination thereof.
  • Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue.
  • Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity.
  • the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc.
  • the values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s). For example, the error rate of base calling using a first neural network trained on simulated flow cell images can be determined in comparison with base calling using an existing primary analysis method without using any neural networks.
  • the error rate in base calling using a second neural network trained using real flow cell images of in situ sample can also be obtained in comparison with base calling using the same existing primary analysis method without using any neural networks.
  • the error rate in base calling using the first neural network can be higher than the error rate in base calling using the second neural network.
  • the error rate in base calling using the first neural network can be 2x, 3x, 4x, 5x, 6x, lOx, or higher than the error rate in base calling using the second neural network.
  • training of the neural networks herein can be done using only the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system 110. In such cases, training may be done using hardware elements within the physical housing of the sequencing system 110 shown in FIG. 1. In some embodiments, training can be performed at least partly external to the sequencing system. For example, at least part of the training may be performed using hardware over the cloud 130.
  • the sequencing system 110 comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network (e.g., trained neural network) deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations of the sequencing method 600, 700, or 2900; a second processor or the first processor to control the integrated circuit to perform one or more operations of the sequencing methods 600, 700, or 2900 to facilitate generating the sequencing analysis result(s).
  • a first reconfigurable logic device e.g., a FPGA unit
  • an integrated circuit e.g.
  • the sequencing system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform one or more operations of the sequencing method 600, 700, or 2900.
  • training the neural network using the methods 600, 700, or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 20x, 40x, 60x, 80x, lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
  • the neural network is trained with the same type of flow cell images as which the neural network may make predictions on after being trained.
  • the neural network is trained with 2D flow cell images at multiple z levels and then may be used to predict base calls for 2D flow cell images at multiple z levels to cover a 3D in situ sample.
  • the neural network is trained with 2D flow cell images from a single organ origin and then may be used to predict base calls for 2D flow cell images of samples extracted from the same organ, e.g., liver.
  • the neural network is trained with traditional 2D flow cell images at a single z-level.
  • each neural network is trained with 2D flow cell images at a single z-level, and multiple neural networks may be trained to cover a 3D volumetric sample, e.g., in situ sample.
  • the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with 3D flow cell images (3D volumetric image), training the neural networks with 2D flow cell images reduces the amount of computation, training time and training cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
  • the sequencing method 2900 comprises an operation 705 of acquiring, by the imager 116 of the sequencing system 110, a training set comprising a plurality of training flow cell images with a first resolution.
  • the plurality of training flow cell images may be real images that are acquired using a sequencing system disclosed herein.
  • the plurality of training flow cell images may be real images of one or more samples immobilized on a support, e.g., a flow cell device.
  • the training flow cell images may be of 2D or 3D samples as disclosed herein in operation 705 relative to methods 700.
  • the plurality of training flow cell images may include simulated flow cell images disclosed herein.
  • the training flow cell images may be generated based on the characteristics of the sample(s) that predictions are going to be made. For example, for predicting polony locations in cellular samples, the training flow cell image may only include images (simulated or real images) of 3D samples of similar characteristics, e.g., liver samples, kidney samples, etc. As another example, for prediction polony locations in traditional 2D samples, the training flow cell images may only include 2D flow cell images with similar plexity and/or sample density. In some embodiments, the training flow cell images may include a combination of flow cell images, either of 2D or 3D samples, and with or without similar characteristics.
  • the training flow cell images may be generated at multiple z-locations in order to cover characteristics of the sample at different z levels.
  • the corresponding plurality of training flow cell images (simulated or real images) may include flow cell images at multiple z-levels.
  • the corresponding plurality of training flow cell images may include z- stacks of flow cell images, each z-stack may include a 3D volume made up from multiple z-levels of flow cell images comprised in the z-stack.
  • the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels (2D images) but not a z-stack of flow cell images (e.g., a 3D volume ).
  • the systems and methods can be used to train neural networks to predict base calls for flow cell images acquired from one or more color channels, one or more cycles, and/or one or more z-levels in a sequence run.
  • the training data used to train the neural networks herein may be generated using real flow cell images, and the reference intensities of the training data are advantageously determined after removing errors therein that may be caused by various sources including but not limited to: color cross-talk, spatial misalignment of polonies, and/or phase and dephasing, blurriness of out-of-focus polonies, thereby allowing more reliable training.
  • the training data used to train the neural network herein does not include full flow cell images. Instead, the training data include patches (e.g., 16 pixels by 16 pixels patches) of the flow cell images from one or more color channels, one or more cycles, and/or one or more z-levels to provide spatial and temporal context for training.
  • the training using method 700 or 2900 may be training per polony as each patch only contain a very limited number of polonies, e.g., a single polony. The very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100.
  • the very limited number of polonies can be lOOx, lOOOx, 10 4 x, 10 5 x, 10 6 x, 10 7 x, or 10 8 x less than a total number of polonies in a corresponding flow cell image.
  • Each patch may include a patch image per color channel, per cycle, and per z- level.
  • Each patch image may share the same pixels of the corresponding portion of the flow cell images.
  • Each patch image may include a single polony at or near the center of the patch image or a very limited number of polonies.
  • Such training data may advantageously allow less complicated and more reliable training than training using flow cell images of one or more subtiles (e.g., 6000 pixels by 8000 pixels).
  • Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue.
  • Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity.
  • training with real flow cell images may advantageously allow reduced complexity of the neural network to achieve the predetermined quality than the neural network trained using simulated data.
  • the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc.
  • the values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s).
  • the methods 600, 700, 2900 may be used to train a neural network or any other artificial intelligence-based models using various references or ground truth that are not limited to reference base calls or reference intensities, e.g., in the second resolution.
  • the methods 2900 may include an operation 2925’ of generating, by the sequencing system, references corresponding to the intensities in the high resolution training flow cell images.
  • the references have the same spatial resolution as the high resolution training flow cell images.
  • the plurality of training flow cell images are acquired from one or more color channels, and the references comprises reference base calls. Each reference base call may correspond to a polony in the plurality of high resolution training flow cell images.
  • the references may be generated using various algorithms. The references may be based on existing datasets that are publicly available.
  • the plurality of training flow cell images are acquired from one or more color channels, and the references comprises reference classifications.
  • Each reference classification may correspond to a pixel in the plurality of high resolution training flow cell images from the one or more color channels.
  • Exemplary classifications may include nucleotides A, T, C, G, U, and background.
  • the classification of background can be for pixels that are not classified as any type of nucleotides, e.g., not classified as A, T, C, G, or U.
  • the plurality of training flow cell images are acquired from one or more color channels in one or more cycles at one or more z-levels, and the references comprise reference classifications.
  • a first reference classification may correspond to a pixel of a polony, and may have a classification that is a base call of that polony
  • a second reference classification may correspond to a pixel outside any polony in the plurality of high resolution training flow cell images from multiple color channels, e.g., a background classification.
  • the background classification may or may not be within a cell boundary of in situ cellular sample(s).
  • the plurality of training flow cell images are acquired from a single color channel from one or more sequencing cycles at one or more different z- levels
  • the references comprise reference polony maps.
  • Each reference polony map may correspond to at least a portion of an image of the plurality of high resolution training flow cell images in a sequencing cycle.
  • each reference polony map may correspond to a patch extracted from the high resolution training flow cell images so that each pixel in the polony map corresponds to a corresponding pixel of the patch, and the reference polony map indicates which pixel(s) are within a polony, and which pixel(s) are not.
  • the reference polony maps are generated using various algorithms for polony map generation.
  • Exemplary polony map generation algorithms for generating 2D or 3D polony maps have been disclosed in U.S. Application No. 18/078,797 and 18/078,820, and U.S. Patent No. 10,266,888, and are incorporated herein by reference in their entireties.
  • the first resolution can be a standard resolution that can be achieved using the imager disclosed herein.
  • the first resolution can be within the range from 0.01 um to 15 um.
  • the first resolution can be within the range from 0.01 um to 5 um.
  • the plurality of training flow cell images in the training set can be from one or more color channels.
  • the plurality of training flow cell images in the training set can be from 4 color channels.
  • the plurality of training flow cell images in the training set can be from one or more cycles.
  • the one or more cycles can be any number ranging from 1 to 10, 1 to 20, Ito 30, 1 to 50, 1 to 100, 1 to 200, or 1 to 500.
  • the plurality of flow cell images can be at a single z level or multiple z levels.
  • the sequencing method 2900 comprises an operation 715 of up-sampling, by the sequencing system, the plurality of training flow cell images to generate high-resolution training flow cell images having a second resolution.
  • the second resolution can be 2x, 4x, 8x, 16x, or higher than the first resolution.
  • the first resolution is in the range from 0.01 um to 5 um
  • the corresponding second resolution that is 4x higher than the first resolution can be in the range from 0.0025 um to 1.25 um.
  • Various up-sampling methods can be used for generating the high- resolution training flow cell images. Each high-resolution training flow cell image corresponds to a training flow cell image at the first resolution.
  • the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies in the plurality of flow cell images (e.g., a polony map containing locations of polonies or a polony map containing a location list of the polonies); and optionally extracting, by the sequencing system, intensities in the plurality of flow cell images based on the location list.
  • locations of polonies in the plurality of flow cell images e.g., a polony map containing locations of polonies or a polony map containing a location list of the polonies
  • intensities in the plurality of flow cell images based on the location list.
  • the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies (e.g., a polony map containing locations of polonies); in the high resolution training flow cell images; and optionally extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
  • locations of polonies e.g., a polony map containing locations of polonies
  • the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies (e.g., a polony map containing locations of polonies); in the high resolution training flow cell images; and optionally extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
  • the sequencing method 2900 comprises an operation of processing the high resolution training flow cell images to determine location of the polonies (e.g., bright spots in the image) and their processed intensities.
  • Their processed intensities may have been processed using image processing steps including but not limited to background removal, noise reduction, filtering, intensity normalization, intensity offset adjustment, phase and paraphrasing, image registration, color correction, and deconvolution.
  • the operation of processing the training flow cell images or the high resolution training flow cell images can include polony map generation.
  • Exemplary polony map generation embodiments are disclosed in details in U.S. Patent No. 11,200,446 and U.S. patent application Nos.18/078,820 and 18/078,797, which are incorporated herein by reference in their entireties.
  • the method 2900 comprises an operation 2925 of generating, by the sequencing system, reference base calls of the high resolution training flow cell images.
  • the operation 2925 may be based on locations of polonies (e.g., the polony map) so that only signals from polonies identified in the polony map are used for generating the reference base calls, other signals, including background noise, possible artifacts from cellular structures in the images can be excluded.
  • the reference base calls of the high resolution training flow cell images may be generated based on multiple patches, and each patch comprises one or more patch images from one or more color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images.
  • the patches may be generated based on the location list or the polony map so that each patch image has a single polony at or near its center pixel(s).
  • Each patch image corresponds to a reference base call of the single polony at or near its center pixels.
  • each patch image corresponds to a very limited number of reference base calls in the patch image, e.g., in a range from 1 to 10 or from 1 to 100.
  • the method 2900 comprises an operation 2925’ (which replaces operation 2925) of generating, by the sequencing system, references, instead of reference base calls, of the high resolution training flow cell images.
  • the operation 2925’ may be based on locations of polonies (e.g., the polony map) so that only signals from polonies identified in the polony map are used for generating the references, including background noise, possible artifacts from cellular structures in the images can be excluded.
  • the operation 2925’ may generate the references for some or all of the pixels of the flow cell images without requiring locations of the polonies or the polony map.
  • the operation 2925’ may generate the references as reference classifications of A, T, C, G, U, or background for each pixel of the flow cell images after aligning flow cell images from different color channels.
  • the patches extracted from the training flow cell images have properties that are similar as patches that predictions are going to be generated, e.g., using methods 2800.
  • properties can include patch size, location of the single polony within the patch, range of intensities for pixels within patches.
  • each patch comprises a single polony located at or in close vicinity to a center of the corresponding patch.
  • the polony may be no more than 1 to 10 pixels away from the center of the corresponding patch.
  • each patch comprises 3 to 128 pixels along a spatial dimension, e.g., along x or y direction.
  • the size of the patches are maintained to be relatively small comparing to the size of the flow cell images, e.g., lOx, 20x, 50x, lOOx, 500x, lOOOx or less than the size of the flow cell image.
  • the plurality of patches comprises 100 to 10 8 patches.
  • each patch may or may not contain more than one, two, three, five, or ten polonies therewithin, but only the pixel(s) of the single polony at its center is used for generating base call(s) corresponding to the patch.
  • a first patch may include pixels 1-32 in both x and y directions to cover a polony centered at pixels (16, 16) of the flow cell images
  • a second patch may include pixels 2-33 in both x and y directions to cover a second polony centered at pixels (17, 17.5)
  • a third patch may include pixels 5-36 in both x and y directions to cover a third polony centered at pixels (19, 19) of the flow cell images.
  • the number of pixels within each patch can be optimized to balance the computational complexity and spatial context information to be included for training the neural network(s).
  • the number of pixels within each patch can be at least partly based on polony density of the sample being imaged.
  • the number of patch images within each patch can be optimized to balance the computational complexity and the spatial context information within each patch for accurate and reliable prediction using the neural network.
  • each patch may comprise multiple patch images corresponding to different color channels.
  • each patch may comprise a patch image covering same pixels within the x-y plane in three different color channels. The same pixels may be pixels determined after registration to correct for the spatial offset across different color channels.
  • each patch may comprise multiple patch images corresponding to different cycles, e.g., continuous cycles n-1, n, n+1, within a sequencing run.
  • each patch may comprise 3 images, each from a different color channel in 4 adjacent cycles, so that each patch may comprise 12 patch images in total.
  • each patch may include 5 different z levels to make the total number of patch images of 60.
  • At least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise some identical pixels.
  • each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
  • the training flow cell images are acquired only from a single color channel in one or more sequencing cycles and/or one or more z-levels, so that training flow cell images acquired from different color channels may be used to train different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein, for a single color channel.
  • the training flow cell images are acquired only from a single z level from one or more color channels in one or more sequencing cycles, so that training flow cell images acquired at different z levels of 3D sample(s), e.g., in situ cells, may be used to train different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
  • the training flow cell images are acquired from the one or more cycles from one or more color channels and at one or more z-levels.
  • the one or more cycles comprises a plurality of consecutive cycles in a sequencing run.
  • the operation 2925 or 2925’ of generating the reference base calls or references of the high resolution training flow cell images is for each patch of the plurality of patches.
  • reference intensities of the high resolution training flow cell images may be determined using an operation similar to operation 725 disclosed herein.
  • the operation of generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity.
  • the intensities may undergo color correction, phasing/dephasing, normalization, and/or other corrections to reach the reference intensities.
  • FIG. 5F the intensities of the high resolution training flow cell images from two different channels are plotted. Each plot is plotted as a dot with its corresponding intensity in channels 1, 2, 3, and 4.
  • the polonies within area 790 would have a base call of A, thus, corresponding reference intensity of each polony having a base call of A can be obtained by projecting the dots to the fitted line in the region 790, e.g., projection with shortest distance.
  • vertical axis of the projected intensity on the line may be the reference intensity, and the horizontal axis of the projected intensity would be close to zero. It is understood that the reference intensity determination is not limited to the shortest distance projection in FIG. 5F.
  • the algorithm for determining the reference intensity may be iterative such that the reference intensities obtained in earlier iteration(s) can be improved based on customized quality criteria in later iterations.
  • the number of repetitions can be various numbers in a range from 1 to 10, 1 to 100, or more.
  • later iterations can use a different projection method that generates a smaller total distance to the fitted line as shown in FIG. 5F than the projection method that was used in earlier iteration(s).
  • the plurality of patches can be extracted from the high resolution training flow cell images after reference intensities are generated.
  • Each patch may include a patch image corresponding to a different color channel, and reference base calls may be determined based on the reference intensities from all color channels.
  • reference classifications of the patch may be determined similarly except that patches that satisfy certain customized conditions are background but not any type of nucleotides. For example, for a patch with 4 different patch images each corresponding to a color channel, if the reference intensities from all 4 channels are very similar to each other and all below a predetermined signal level, the patch then can have background classification.
  • each patch may only include a single patch image from a single color channel.
  • the operation 2925 or 2925’ may comprise generating a first single reference intensity for a first channel of the multiple color channels corresponding to the corresponding patch in a single sequencing cycle.
  • each patch may include multiple patch images from the same single color channel but from different sequencing cycles.
  • the operation 2925 or 2925’ may comprise generating reference intensities for a first channel of the multiple color channels corresponding to the corresponding patch in one or more sequencing cycles.
  • the operation 2930 or 2930’ may include providing the reference base calls or references so that they are available for comparison to training output(s) of the neural network, depending on how the user may want to train the neural network which may include: a single intensity of a single color channel at one sequencing cycle for each patch (for training a different neural network for each color channel), multiple intensities of a single color channel at multiple sequencing cycles (for training a different neural network for each color channel), or multiple intensities of multiple different color channels at one or more sequencing cycles (for training a single neural network for different color channels).
  • patches of reference base calls or references may also be separated based on z-levels of a 3D sample in order to train different neural networks at different z levels.
  • a single neural network may be trained using patches from different z levels.
  • the method 2900 may include an operation 2930’ of providing, by the processor, the references for comparison to training output(s) of the neural network.
  • the high resolution training flow cell images are also provided in operation 2930 or 2930’ for comparison to training output(s) of the neural network.
  • the method 2900 may include an operation 2955’ of determining an output error by comparing the training output and the references, instead of operation 2955.
  • At least part of the one or more samples comprises predetermined nucleotide bases in the one or more cycles.
  • the base calls for at least some of the polonies in the flow cell images in cycle(s) are predetermined.
  • the base calls can be predetermined by sequencing known barcode sequences in the one or more cycles.
  • the operation of generating the reference base calls in the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity used for generating reference base calls.
  • the algorithm for determining the reference base calls is based on determination of the reference intensities as disclosed herein, e.g., in methods 700.
  • the sequencing methods 2900 may include an operation 2930 of providing the reference base calls so that they can be compared to the training output(s) of the neural network for training.
  • the operation 2930 is similar to operation 730 in method 700.
  • the reference base calls may be provided as flow cell image(s) or alternatively as patches, each patch may comprise one or more patch images, and each patch image have a polony at or near its center.
  • the reference base calls may be provided as a list of base calls corresponding to their locations in the flow cell images.
  • the operation 2930 comprises an operation of providing the reference base calls in a plurality of patches for comparison during training to the training output(s) of the neural network, wherein each patch comprise one or more patch images from multiple color channels.
  • the patches of the flow cell images may be used for training per polony.
  • Each patch may comprise one or more patch images cropped from the flow cell images (e.g., the second plurality of flow cell images).
  • the training method 2900 may be configured to train the neural network, e.g., CNN for predicting a single base call at or close to the center of the patch.
  • the method of training 2900 using patches of flow cell images does not require training of a large number of polonies within a patch, thus may advantageously reduce computational complexity and increase training efficiency and accuracy.
  • the input to the neural network may also include the location list, e.g., a polony map.
  • the sequencing method 2900 herein include an operation 740 of repeatedly performing, until the output error satisfies a stopping criterion, one or more training operations comprising: an operation 2955 of determining an output error by comparing the training output and the reference base calls; and an operation 760 of adjust current values of parameters of the convolutional neural network based on the output error.
  • the operation 740 repeats itself using its output (e.g., adjusted parameters of the neural network) from the previous iteration as input to the current iteration.
  • output e.g., adjusted parameters of the neural network
  • the output error may be based on a comparison between the reference base calls and the predicted base calls during an iteration of training.
  • the comparison may only include locations in the location list, e.g., the polony map.
  • the comparison may include a subset of location in the location list.
  • the operation of 740 may stop when a stop criterion is met.
  • the stop criterion can be customized.
  • the stopping criterion can be customized based on training time, computational complexity, and convergence rate.
  • Exemplary stopping criterion include a fixed number of iterations, a fixed duration of training time, or a minimized loss function.
  • the stopping criterion can be (1) stop after 10 epochs to reduce training time.
  • the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0. Determining the output error can be based on various metrics, e.g., a loss function.
  • Nonlimiting examples of the loss function can include: the sum of root mean square of the difference between the predicted intensities and the corresponding reference base calls based on the location list, or the sum of mean square errors.
  • the method 2900 may further comprises an operation 770 of generating the trained neural network with eh adjusted parameters obtained in operation 760.
  • Values of the parameters of the neural network can be adjusted based on the output error or one or more previous output errors.
  • z-stacks of training flow cell images from a same channel can be acquired, e.g., in operation 705, to train the neural network, e.g., CNN, for that particular channel.
  • a certain percentage, e.g., 80%, of the training set may be used for training, and the rest of the training set, e.g., 20%, may be used for validation.
  • Batch size can be one
  • Epochs can be about 10, 12, 15, 20, or more.
  • various optimizers can be used.
  • the neural network comprises one or more convolution neural networks. In some embodiments, the neural network comprises one or more U-Net units.
  • comparing the training output to the reference set comprises: calculating mean square error in the predicted intensities generated by the neural network being trained and the corresponding reference intensities based on the location list.
  • the sequencing system is configured to acquire one or more cell images that may include images of the cell and/or tissue with various types of staining, e.g., fluorescent staining, configured to show morphological information of the sample.
  • the one or more cell images can comprise staining of cellular structures that help locate polonies or clusters relative to the stained structures.
  • staining can be of cellular structures or components including but not limited to membranes, nuclei, and mitochondria. Different staining colors may be used to stain different components of the cell.
  • the cell membrane after sequencing analysis and imaging using the sequencing system and reactions can be permeabilized.
  • the one or more cell images can comprise staining of lipids, such as lipids comprised in the cell membrane.
  • the one or more cell images can comprise staining of one or more transmembrane proteins.
  • the transmembrane proteins can be proteins embedded in the permeabilized membrane.
  • the one or more cell images comprise fluorescence or luminescence signals from cell membranes.
  • the one or more cell images can be microscopic images.
  • the one or more images can be fluorescent images.
  • different fluorescent colors can be included in the cell images.
  • the nuclei and the cell membrane can be stained with different colors.
  • the one or more cell images can comprise segments of: cells, membranes, nuclei, and/or other morphological structures.
  • the edge(s) of each segment encompass the entire membrane of the cell within the segment. There can be only one cell in each segment. Some segments may not have any cell in them. In some embodiments, adjacent segments do not overlap with each other. In some embodiments, adjacent segments only overlap with each other by sharing one or more edges. In some embodiments, various segmentation algorithms can be used for segmenting the cells.
  • the cell images disclosed herein are stained.
  • the staining can occur after acquiring flow cell images using the sequencing system 110. In some embodiments, the staining can occur before acquiring sequencing images.
  • the methods of staining the 3D sample such as the cells, tissue can include one or more operations disclosed herein.
  • the staining of the 3D sample can use various methods that can specifically label one or more cell protein(s) that are located mostly in the membrane but with negligible occurrence in other regions of the cell (e.g., less than 10%, 5%, 2% in amount or concentration).
  • the cell images may be acquired using the sequencing system 100 herein without moving the sample(s) from its position during sequencing. It is advantageous to stain the sample after sequencing and acquire the cell images while keep the samples immobilized to the sample stage of the sequencing system. Some transformation, e.g., rotation, translation, shearing may still occur so that there is a need to registered the flow cell images during sequencing to the cell images acquired after sequencing and staining.
  • the cell images may be acquired using optical device(s) external to the sequencing system 100 after the sequencing run has been completed and after moving the sample away from the sequencing system 100.
  • the sequencing system including optical system advantageously enables sequencing and imaging of target analyte(s) or features while they remain intact inside the cell or tissue.
  • the cell or tissue and the targets e.g., target analytes, structure elements, organelles, etc.
  • the one or more samples being imaged using the optical systems herein can be 2D or 3D samples.
  • the 2D sample(s) may include traditional nucleotide acid molecules extracted from various sources.
  • the 3D samples can include various samples in which polonies within the sample does not fit into a single z level while keeping the polonies in focus.
  • the 3D samples may include in situ samples such as cells and/or tissues.
  • the cells or tissue samples are immobilized on the flow cell device or otherwise substrate for sequencing and/or imaging without modifying the spatial locations of targets within the cells or tissue.
  • the cells or tissue samples are immobilized on the flow cell device or otherwise substrate for sequencing or imaging without modifying the spatial relationship of targets or target analytes within the cells or tissue.
  • the cells and/or tissue are immobilized with the morphological features, RNA, mRNA, and protein targets of the samples intact inside the cell(s) or tissue during sequencing and/or imaging.
  • the spatial locations or relationships of the target analytes or targets remain intact during sequencing and/or imaging.
  • the spatial locations or relationships of the target analytes or targets during sequencing and/or imaging are not manually reconstructed using artificially added structure or features in the sample.
  • the nucleus, cell membrane, mitochondria, and extracellular matrix can retain their relative spatial relationship to each other in the sample(s) during imaging and/or sequencing.
  • the one or more samples include target analyte(s) that are located inside the sample(s) or on the membrane of the sample(s). In some embodiments, the one or more samples include target analyte(s) that are on the exterior or interior surface of the cell. In some embodiments, the one or more samples include target analyte(s) that are on the exterior or interior surface of the cell membrane. In some embodiments, the one or more samples include target analyte(s) that are part of the extracellular matrix. In some embodiments, the one or more samples include target analyte(s) that are part of and/or located on one or more organelles within the cell or tissue. In some embodiments, the one or more samples include target analytes that are on or in the glycocalyx or belong to part of the glycocalyx.
  • the target analyte(s) comprise at least one polypeptide, lipid, nucleic acid or polysaccharide. In some embodiments, the target analyte(s) comprise at least one polypeptide, enzyme or lipid located anywhere in the sample(s) including the cytoplasm and nucleus. In some embodiments, the target analyte(s) comprise at least one polypeptide, enzyme or lipid located in or on a cellular structure including without limits any cellular membrane, nucleus, nucleolus, mitochondria, chloroplast, Golgi apparatus, ribosome, endoplasmic reticulum, microtubules, peroxisome and lysosome.
  • the methods, devices, and systems disclosed herein allow sequencing and analysis of various samples and sources.
  • the samples may include nucleic acids extracted from any of a variety of biological samples, e.g., blood samples, saliva samples, urine samples, cell samples, tissue samples, and the like.
  • the samples here may include a variety of different cell, tissue, or sample types known to those of skill in the art.
  • the sample(s) may be from eukaryotes (such as animals, plants, fungi, protista), archaebacteria, or eubacteria.
  • the sample(s) may include prokaryotic or eukaryotic cells, such as adherent or non-adherent eukaryotic cells.
  • the sample(s) may be from, for example, primary or immortalized rodent, porcine, feline, canine, bovine, equine, primate, or human cell lines.
  • the sample(s) may include a variety of different cell, organ, or tissue types (e.g., white blood cells, red blood cells, platelets, epithelial cells, endothelial cells, neurons, glial cells, astrocytes, fibroblasts, skeletal muscle cells, smooth muscle cells, gametes, or cells from the heart, lungs, brain, liver, kidney, spleen, pancreas, thymus, bladder, stomach, colon, or small intestine).
  • the sample(s) may include normal or healthy cells.
  • the sample(s) may include diseased cells, such as cancerous cells, or from pathogenic cells that are infecting a host.
  • the sample(s) may include a distinct subset of cell types, e.g., immune cells (such as T cells, cytotoxic (killer) T cells, helper T cells, alpha beta T cells, gamma delta T cells, T cell progenitors, B cells, B-cell progenitors, lymphoid stem cells, myeloid progenitor cells, lymphocytes, granulocytes, Natural Killer cells, plasma cells, memory cells, neutrophils, eosinophils, basophils, mast cells, monocytes, dendritic cells, and/or macrophages, or any combination thereof), undifferentiated human stem cells, human stem cells that have been induced to differentiate, rare cells (e.g., circulating tumor cells (CTCs), circulating epithelial cells, circulating endothelial cells, circulating tumor cells (CTCs), circulating epi
  • the methods disclosed herein may comprise an operation of registering, by the reconfigurable logic device and/or the integrated circuit, the one or more cell images (e.g., with staining) to sequencing images or results of the sample, e.g., base calls of the determined polonies.
  • such operation is performed by the different combinations of the first plurality of data processing engines and the first reconfigurable routing channels after the operation of determining polonies from the second plurality of flow cell images or after the operation of performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
  • such of operation of registering the cell images to flow cell images or base calls may be performed by the integrated circuits, and the registration results, e.g., the transformation(s), may be communicated from the integrated circuit to the reconfigurable logic device or the one or more processors of the sequencing system.
  • the methods herein include saving the registration results, by the reconfigurable logic device, the integrated circuit, or the one or more processors into a predetermined file format, e.g., a FastQ data file, so that it can be accessed using similar software that is configured to access sequencing results such as base calls.
  • the methods further include an operation of accessing both the registration results of the cell images and other sequencing results to present sequencing results in correspondence with the morphological information of the sample, e.g., to a user.
  • the methods may include an operation of displaying a base calling results in color that is spatially registered to cellular features, e.g., the nucleus, so that the aligned results can conveniently allow the user to identify base calls in relation to the morphological information of cells.
  • saving and access the registration results of the cell images and other sequencing results may be performed by the one or more processors, the reconfigurable logic device, and/or the integrated circuit.
  • the registration results of the cell images and other sequencing results may be saved into a memory device that is within the housing of the sequencing system. In some embodiments, the registration results of the cell images and other sequencing results may be saved into a memory device that is on the cloud 130 external to the sequencing system.
  • the fiducial markers can be internal or external to the sample.
  • internal fiducial markers can include at least some of the polonies or clusters or background objects in the sample.
  • external fiducial markers can be microspheres coated on the flow cell so that the signal from the microspheres can function similarly as internal fiducial markers for registration.
  • the same fiducial markers can appear in sequencing images, e.g., the flow cell image(s), the cell images so that transformation(s) can be derived from aligning the fiducial markers in different images.
  • the transformation(s) can be used for registering or aligning the sequencing image(s) and cell image(s) and objects that appear in them. Exemplary embodiments of image registration methods are described in PCT patent application No. PCT/US2023/067931 (where the contents of the patent are hereby incorporated by reference in its entirety).
  • a polony or other object e.g., background objects as fiducial markers
  • image intensity I centers at location (xl,yl) in a sequencing image can appear at location (x2, y2) with intensity F in a cell image, where (x2,y2) Mr *(xl,yl), and Mr is the transformation matrix.
  • the inverse transformation matrix Mr 1 can be determined such that (xl,y 1) — Mr -1 *(x2,y2).
  • the registration of images can be in 2D and can include translation, scaling, rotation, and/or shearing of flow cell images among different channels. Multiple points in the sequencing image and their corresponding points in the cell image can be used to determine the transformation. The minimum number of points that is needed can be determined by the degree of freedom in the transformation.
  • the image registration can be 3D with coordinates in x, y, and z axes.
  • an image e.g., a flow cell images, a cell images, etc.
  • a transformation can be determined for each subtile to represent the transformation of the whole image.
  • the image transformation of each subtile can be uniquely represented by a transformation matrix.
  • the transformation matrix can be determined as below:
  • M M21 M22 M23 (2) 31 M32 M33
  • the transformation matrix can be defined as the inverse matrix of M, i.e., M’ 1 , so that equation (1) can be expressed differently as
  • the transformation matrix M is an estimation in equations (1) and (3) based on the 2D shifts.
  • the value of n may affect the accuracy of the estimation.
  • more than one region can be selected within a subtile for cross correlation calculation, and more than one 2D shift can be calculated for each subtile and used for estimating the transformation of the subtile.
  • n in equation (1) can be replaced by a larger number, e.g., 2*n when 2 regions are selected per subtile, and the transformation matrix M can be estimated using equations (1) and (2).
  • (al, bl) . . . (an, bn) in equations (l)-(3) are coordinates for selected region(s) (e.g., coordinates of a center pixel of the corresponding region(s)) after transformation, (xl, yl). . . (xn, yn) are coordinates of the selected region(s) before transformation, e.g., coordinates of a center pixel.
  • n is a number that is no less than 3. The larger the n, the more information can be used to estimate the transformation matrix M. In some embodiments, n is not greater than 9.
  • the transformation of one or more subtiles is linear. In some embodiments, the transformation of all subtiles is linear. In some embodiments, the transformation matrix is a matrix in which M31 and M32 is equal to 0, and M33 is 1. In some embodiments, one or more of the transformations per subtile is an affine transformation and the transformation matrix of the entire flow cell image is an affine matrix.
  • the transformation matrix M is an estimation in equations (1) and (3) based on the size of the selected region(s).
  • the size of selected region may affect the accuracy of the estimation.
  • the size of the select region can be about 128 x 128.
  • the size of the selected region can be about 32 x 32, 48 x 48, 64 x 64, 96 x 96, 160 x 160, 196 x 196, 256 x 256, or of various different sizes.
  • the transformations per subtile as disclosed herein can be calculated using a selected region within a subtile, the selected region can be equal to or smaller than the subtile.
  • the transformation estimated using the region can be used to estimate the transformation of the entire subtile given the intrinsic characteristics of image transformation across sequencing cycles.
  • the image transformation between cycles and/or between neighboring pixels can be relatively small, e.g., with less than about 8%, 5% or less than about 1% of scaling, rotation, and/or shearing.
  • the transformations disclosed herein can include an image translation with greater than about 5% difference between cycles and/or between neighboring pixels.
  • the transformation of the entire flow cell image can be accurately and reliably estimated by transforming individual subtiles using the plurality of transformations and combining the transformed subtiles into a transformed flow cell.
  • the techniques disclosed herein advantageously estimate the transformation of the flow cell image by determining a plurality of transformations of its individual subtiles.
  • the plurality of transformations can be linear and yet accurately and reliably estimate the transformation of the flow cell image even if the transformation is non-linear.
  • the techniques disclosed herein advantageously eliminate the need to calculate the transformation of the entire images to be registered or aligned which can be more computationally intensive and timeconsuming and prone to failure than estimating a plurality of transformations for the corresponding subtiles of the entire images.
  • Various aspects of the methods described herein, such as methods 500, 600, 700, 2800 and 2900, may be implemented, for example, using one or more computer systems, such as computer system 400 shown in FIG. 4.
  • One or more computer systems 400 may be used, for example, to implement any of the aspects discussed herein, as well as combinations and sub-combinations thereof.
  • Computer system 400 may include one or more hardware processors 404.
  • the hardware processor 404 may be central processing unit (CPU), graphic processing units (GPU), or their combination.
  • Processor 404 may be connected to a bus or communication infrastructure 406.
  • Computer system 400 may also include user input/output device(s) 403, such as monitors, keyboards, pointing devices, etc., which may communicate with communication infrastructure 406 through user input/output interface(s) 402.
  • the user input/output devices 403 may be coupled to the user interface 124 in FIG. 1.
  • One or more units of processors 404 may be a graphics processing unit (GPU).
  • a GPU may be a processor that is a specialized electronic circuit designed to process mathematically intensive applications.
  • the GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, vector processing, array processing, etc., as well as cryptography (including brute-force cracking), generating cryptographic hashes or hash sequences, solving partial hashinversion problems, and/or producing results of other proof-of-work computations for some blockchain-based applications, for example.
  • the GPU may be particularly useful in at least the image recognition and machine learning aspects described herein.
  • processors 404 may include a coprocessor or other implementation of logic for accelerating cryptographic calculations or other specialized mathematical functions, including hardware-accelerated cryptographic coprocessors. Such accelerated processors may further include instruction set(s) for acceleration using coprocessors and/or other logic to facilitate such acceleration.
  • Computer system 400 may also include a data storage device such as a main or primary memory 408, e.g., random access memory (RAM).
  • Main memory 408 may include one or more levels of cache.
  • Main memory 408 may have stored therein control logic (i.e., computer software) and/or data.
  • Computer system 400 may also include one or more secondary data storage devices or secondary memory 410.
  • Secondary memory 410 may include, for example, a main storage drive 412 and/or a removable storage device or drive 414.
  • Main storage drive 412 may be a hard disk drive or solid-state drive, for example.
  • Removable storage drive 414 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
  • Removable storage drive 414 may interact with a removable storage unit 418.
  • Removable storage unit 418 may include a computer usable or readable storage device having stored thereon computer software and/or data.
  • the software may include control logic.
  • the software may include instructions executable by the hardware processor(s) 404.
  • Removable storage unit 418 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device.
  • Removable storage drive 414 may read from and/or write to removable storage unit 418.
  • Secondary memory 410 may include other means, devices, components, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 400.
  • Such means, devices, components, instrumentalities or other approaches may include, for example, a removable storage unit 422 and an interface 420.
  • Examples of the removable storage unit 422 and the interface 420 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
  • Computer system 400 may further include a communication or network interface 424.
  • Communication interface 424 may enable computer system 400 to communicate and interact with any combination of external devices, external networks, external entities, etc. (individually and collectively referenced by reference number 428).
  • communication interface 424 may allow computer system 400 to communicate with external or remote devices 428 over communication path 426, which may be wired and/or wireless (or a combination thereof), and which may include any combination of LANs, WANs, the Internet, etc.
  • Control logic and/or data may be transmitted to and from computer system 400 via communication path 426.
  • communication path 426 is the connection to the cloud 130, as depicted in FIG. 1.
  • the external devices, etc. referred to by reference number 428 may be devices, networks, entities, etc. in the cloud 130.
  • Computer system 400 may also be any of a personal digital assistant (PDA), desktop workstation, laptop or notebook computer, netbook, tablet, smart phone, smart watch or other wearable, appliance, part of the Internet of Things (loT), and/or embedded system, to name a few non-limiting examples, or any combination thereof.
  • PDA personal digital assistant
  • desktop workstation laptop or notebook computer
  • netbook tablet
  • smart phone smart watch or other wearable
  • appliance part of the Internet of Things (loT)
  • embedded system to name a few non-limiting examples, or any combination thereof.
  • the framework described herein may be implemented as a method, process, apparatus, system, or article of manufacture such as a non-transitory computer-readable medium or device.
  • the present framework may be described in the context of distributed ledgers being publicly available, or at least available to untrusted third parties.
  • distributed ledgers being publicly available, or at least available to untrusted third parties.
  • blockchainbased systems One example as a modern use case is with blockchainbased systems.
  • present framework may also be applied in other settings where sensitive or confidential information may need to pass by or through hands of untrusted third parties, and that this technology is in no way limited to distributed ledgers or blockchain uses.
  • Computer system 400 may be a client or server, accessing or hosting any applications and/or data through any delivery paradigm, including but not limited to remote or distributed cloud computing solutions; local or on-premises software (e.g., “onpremise” cloud-based solutions); “as a service” models (e.g., content as a service (CaaS), digital content as a service (DCaaS), software as a service (SaaS), managed software as a service (MSaaS), platform as a service (PaaS), desktop as a service (DaaS), framework as a service (FaaS), backend as a service (BaaS), mobile backend as a service (MBaaS), infrastructure as a service (laaS), database as a service (DBaaS), etc.); and/or a hybrid model including any combination of the foregoing examples or other services or delivery paradigms.
  • “as a service” models e.g., content as a service (CaaS), digital
  • Any applicable data structures, file formats, and schemas may be derived from standards including but not limited to JavaScript Object Notation (JSON), Extensible Markup Language (XML), Yet Another Markup Language (YAML), Extensible Hypertext Markup Language (XHTML), Wireless Markup Language (WML), MessagePack, XML User Interface Language (XUL), or any other functionally similar representations alone or in combination.
  • JSON JavaScript Object Notation
  • XML Extensible Markup Language
  • YAML Yet Another Markup Language
  • XHTML Extensible Hypertext Markup Language
  • WML Wireless Markup Language
  • MessagePack XML User Interface Language
  • XUL XML User Interface Language
  • Any pertinent data, files, and/or databases may be stored, retrieved, accessed, and/or transmitted in human-readable formats such as numeric, textual, graphic, or multimedia formats, further including various types of markup language, among other possible formats.
  • the data, files, and/or databases may be stored, retrieved, accessed, and/or transmitted in binary, encoded, compressed, and/or encrypted formats, or any other machine-readable formats.
  • Interfacing or interconnection among various systems and layers may employ any number of mechanisms, such as any number of protocols, programmatic frameworks, floorplans, or application programming interfaces (API), including but not limited to Document Object Model (DOM), Discovery Service (DS), NSUserDefaults, Web Services Description Language (WSDL), Message Exchange Pattern (MEP), Web Distributed Data Exchange (WDDX), Web Hypertext Application Technology Working Group (WHATWG) HTML5 Web Messaging, Representational State Transfer (REST or RESTful web services), Extensible User Interface Protocol (XUP), Simple Object Access Protocol (SOAP), XML Schema Definition (XSD), XML Remote Procedure Call (XML- RPC), or any other mechanisms, open or proprietary, that may achieve similar functionality and results.
  • API application programming interfaces
  • Such interfacing or interconnection may also make use of uniform resource identifiers (URI), which may further include uniform resource locators (URL) or uniform resource names (URN).
  • URI uniform resource identifier
  • URL uniform resource locators
  • UPN uniform resource names
  • Other forms of uniform and/or unique identifiers, locators, or names may be used, either exclusively or in combination with forms such as those set forth above.
  • Any of the above protocols or APIs may interface with or be implemented in any programming language, procedural, functional, or object-oriented, and may be compiled or interpreted.
  • Non-limiting examples include C, C++, C#, Objective-C, Java, Scala, Clojure, Elixir, Swift, Go, Perl, PHP, Python, Ruby, JavaScript, WebAssembly, or virtually any other language, with any other libraries or schemas, in any kind of framework, runtime environment, virtual machine, interpreter, stack, engine, or similar mechanism, including but not limited to Node.js, V8, Knockout, j Query, Dojo, Dijit, 0penUI5, AngularJS, Expressjs, Backbone js, Ember.js, DHTMLX, Vue, React, Electron, and so on, among many other non-limiting examples.
  • a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon may also be referred to herein as a computer program product or program storage device.
  • control logic software stored thereon
  • control logic when executed by one or more data processing devices (such as computer system 400), may cause such data processing devices to operate as described herein.
  • the RNA is not extracted from the cellular sample and sequencing information does not need to be tracked and mapped back to an image of the cellular sample. Rather, RNA is retained inside the cellular sample to permit direct imaging of the spatial location of target RNAs within the cells. Additionally, RNA within the cellular sample is not fragmented and enrichment of target RNA is not necessary.
  • Use of target-specific and/or random-sequence reverse transcription primers enables detection of both poly-A and non-poly-A RNAs in either uni-plex or multi-plex modes.
  • the methods comprise repeatedly conducting a short number of sequencing cycles of the same region of the template molecules (e.g., concatemer molecules).
  • the RNA content of the cellular sample can be discovered.
  • the reiterative short sequencing cycles described herein use a reduced amount of sequencing reagents which reduces cost and saves time.
  • Methods for conducting reiterative short sequencing cycles has many uses including but not limited to detecting specific RNAs of interest, mutant RNA sequences, splice variants, and their abundance levels thereof.
  • the concatemers carry tandem repeat units of a cDNA-of-interest, the universal sequencing primer binding site, and the target barcode sequence.
  • the concatemers are sequenced inside the cellular sample where a short number of sequencing cycles are conducted for each round and multiple rounds of short read sequencing is conducted.
  • the full length of the target barcode and cDNA region are not sequenced. Instead, at least a portion of the target barcode region is reiteratively sequenced. In some embodiments, it is not necessary to sequence the cDNA region. In some embodiments, the target barcode and a portion of the cDNA region are reiteratively sequenced. It is not necessary to sequence the entire length of the cDNA region.
  • a short portion of the cDNA region in the concatemer is resequenced at least once (e.g., reiterative sequencing) from the same start position to generate overlapping sequencing reads that can be aligned to a reference sequence.
  • the same portion of the concatemer molecule can be sequenced at least two, three, four, five, or up to 50 times.
  • the start sequencing site can be any location of the concatemer and is dictated by the sequencing primers which are designed to anneal to a selected position within the concatemer.
  • the reiterative short sequencing reads increase the redundancy of sequencing information for individual bases in the cDNA region. Reiteratively sequencing one strand of the concatemer template molecule provides enough base coverage to reveal the presence of target RNAs in the cellular sample so that pairwise sequencing of the complementary strand is not necessary.
  • a concatemer template molecule includes multiple sequencing primer binding sites along the same concatemer molecule which can be used to generate multiple usable sequencing reads for increased sequencing depth. Together, reiteratively sequencing one strand of the concatemer templates increases sequencing base coverage and sequencing depth compared to sequencing a one-copy template molecule.
  • the methods described herein can be conducted in uni-plex or multi-plex modes. Two or more different target RNAs can be detected and imaged simultaneously inside a cellular sample using different reverse transcription primers, different target-specific padlock probes, and universal sequencing primers. For example, the presence of a housekeeping RNA and at least one target RNA in a cellular sample can be simultaneously detected and imaged using any of the reiterative short read sequencing methods described herein.
  • the present disclosure provides methods for detecting in situ at least two different target RNA molecules in a cellular sample comprising step (a): providing a cellular sample harboring a plurality of RNA which comprises at least a first target RNA molecule and a second target RNA molecule.
  • the cellular sample is fixed and permeabilized.
  • the cellular sample harbors 2-25 different target RNA molecules, or harbors 25-50 different target RNA molecules, or harbors 50-75 different target RNA molecules, or harbors 75-100 different target RNA molecules.
  • the cellular sample harbors more than 100 different target RNA molecules, or more than 250 different target RNA molecules, or more than 500 different target molecules, or more than 1000 different target RNA molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target RNA molecules. In some embodiments, the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor. In some embodiments, the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, an FFPE cellular sample, or a sectioned FFPE cellular sample. In some embodiments, the cellular sample is deposited onto a solid support.
  • the cellular sample is deposited onto a solid support which is passivated with a coating that promotes cell adhesion. In some embodiments, the cellular sample is deposited on a support that lacks immobilized capture oligonucleotides. In some embodiments, the cellular sample is cultured before or after depositing the cellular sample onto the solid support. In some embodiments, the cellular sample is cultured prior to conducting step (b) which is described below. In some embodiments, the cellular sample comprises an expanded cellular sample that has been cultured in a simple or complex cell culture media. In some embodiments, the cellular sample is not cultured or expanded prior to conducting step (b).
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (b): generating inside the cellular sample a plurality of cDNA molecules which include at least a first target cDNA molecule that corresponds to the first target RNA molecule, and the plurality of cDNA molecules includes a second target cDNA molecule that corresponds to the second target RNA molecule.
  • the method comprises generating at least 2-10,000 different target cDNA molecules that correspond to 2-10,000 different target RNA molecules.
  • the generating of step (b) comprises contacting the plurality of RNA inside the cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample (e.g., FIG. 7).
  • a plurality of reverse transcription primers e.g., a plurality of reverse transcriptase enzymes, and iii) a plurality of nucleotides
  • the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA, and comprises a second sub -population of targetspecific reverse transcription primers that hybridize selectively to the second target RNA.
  • the first and second sub-population of target-specific reverse transcription primers have the same sequence or different sequences.
  • the entire length of the first sub-population of targetspecific reverse transcription primers hybridize to a first target RNA molecule.
  • the first sub-population of target-specific reverse transcription primers comprise tailed primers having a portion that hybridizes to a first target RNA molecule and a portion that does not hybridize to a first target RNA molecule.
  • the first sub-population of target-specific reverse transcription primers comprise at least a portion having a poly-T sequence.
  • the first subpopulation of target-specific reverse transcription primers comprise at least a portion having a random sequence and/or at least a portion having a target-specific sequence.
  • the entire length of the second sub-population of targetspecific reverse transcription primers hybridize to a second target RNA molecule.
  • the second sub-population of target-specific reverse transcription primers comprise tailed primers having a portion that hybridizes to a second target RNA molecule and a portion that does not hybridize to a second target RNA molecule.
  • the second sub-population of target-specific reverse transcription primers comprise at least a portion having a poly-T sequence.
  • the second sub-population of target-specific reverse transcription primers comprise at least a portion having a random sequence and/or at least a portion having a target-specific sequence.
  • a target RNA molecule that is hybridized to a cDNA molecule can be subjected to enzymatic degradation using a ribonuclease under a condition suitable for degrading RNA in an RNA/DNA duplex.
  • a target RNA molecule that is hybridized to a cDNA molecule is not subjected to enzymatic degradation.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (c): contacting the plurality of cDNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes.
  • cDNA is not generated from RNA inside the cellular sample.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise contacting RNA inside the cell with a plurality of target-specific padlock probes and generating circularized padlock probes.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (c): contacting the plurality of RNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of targetspecific padlock probes.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different targetspecific padlock probes.
  • a target RNA molecule can be subjected to enzymatic degradation using a ribonuclease. In some embodiments, a target RNA molecule is not subjected to enzymatic degradation.
  • individual padlock probes in the plurality of first targetspecific padlock probes comprise first and second terminal regions (e.g., first and second padlock binding arms), wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule (or the first target RNA molecule), and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule (or the first target RNA molecule).
  • first and second terminal regions e.g., first and second padlock binding arms
  • the contacting of step (c) comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule (or the first target RNA molecule) to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions (e.g., FIG. 7, left).
  • the first target-specific padlock probe comprises a first target barcode sequence (target BC-1) that corresponds to and uniquely identifies the first target cDNA sequence (or the first target RNA sequence).
  • the first targetspecific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule (or the first target RNA sequence).
  • the first target-specific padlock probe comprises at least one universal adaptor sequence, such as for example a universal sequencing primer binding site (or a complementary sequence thereof).
  • the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof).
  • the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • individual padlock probes in the plurality of second targetspecific padlock probes comprise first and second terminal regions (e.g., first and second padlock binding arms), wherein the first terminal region selectively hybridizes to a first region of the second target cDNA molecule (or the second target RNA molecule), and the second terminal region selectively hybridizes to a second region of the second target cDNA molecule (or the second target RNA molecule).
  • first and second terminal regions e.g., first and second padlock binding arms
  • the contacting of step (c) comprises: hybridizing the first and second terminal regions of the second target-specific padlock probes to proximal positions on the second target cDNA molecule (or the second target RNA molecule) to form a circularized second targetspecific padlock probe having a nick or gap between the hybridized first and second terminal regions (e.g., FIG. 7, right).
  • the second target-specific padlock probe comprises a second target barcode sequence (target BC-2) that corresponds to and uniquely identifies the second target cDNA sequence (or the second target RNA sequence).
  • the second target-specific padlock probe comprises a second target barcode sequence that is located adjacent to one of the regions of the second target-specific padlock probe that selectively hybridizes to the second target cDNA molecule (or the second target RNA sequence).
  • the second targetspecific padlock probe comprises at least one universal adaptor sequence, such as for example a universal sequencing primer binding site (or a complementary sequence thereof).
  • the second target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof).
  • the second target-specific padlock probe comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the first target barcode sequence (target BC-1) and the second target barcode sequence (target BC-2) have different sequences and can be used to conduct multiplex RNA detection and sequencing. In some embodiments, the first target barcode sequence (target BC-1) and the second target barcode sequence (target BC-2) have the same sequence and can be used to conduct uni-plex RNA detection and sequencing.
  • the first and second target-specific padlock probes comprise a universal sequencing primer binding site and a target barcode sequence that are adjacent to each other so that the target barcode region of the concatemer is sequenced first.
  • the target barcode sequence can be any length, for example 3-15 bases, or 15-25 bases, or 25-40 bases, or longer.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (d): closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the cellular sample.
  • the closing the nick in the first and second circularized padlock probes comprises conducting an enzymatic ligation reaction.
  • closing the gap in the first and second circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the first or second target cDNA molecule (or the first or second RNA molecule) as a template, and conducting an enzymatic ligation reaction.
  • the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting one or more enzymatic reactions, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (e): conducting a rolling circle amplification reaction inside the cellular sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least a first concatemer molecule that corresponds to a first target RNA molecule, and the plurality of concatemer molecules includes at least a second concatemer molecule that corresponds to a second target RNA molecule.
  • the first concatemer molecule comprises tandem repeat units, wherein a unit comprises a sequence that corresponds to the first target cDNA (or the first target RNA), the first target barcode sequence, and the universal sequencing primer binding site (or a complementary sequence thereof).
  • the second concatemer molecule comprises tandem repeat units, wherein a unit comprises a sequence that corresponds to the second target cDNA (or the second target RNA), the second target barcode sequence, and the universal sequencing primer binding site (or a complementary sequence thereof).
  • the rolling circle amplification reaction of step (e) comprises contacting the covalently closed circularized padlock probes with an amplification primer (e.g., a universal rolling circle amplification primer), a stranddisplacing DNA polymerase, and a plurality of nucleotides, under a condition suitable for hybridizing individual amplification primers to a covalently closed padlock probe, and under a condition suitable for conducting primer extension using the covalently closed padlock probe as a template molecule to generate a nucleic acid concatemer.
  • an amplification primer e.g., a universal rolling circle amplification primer
  • a stranddisplacing DNA polymerase e.g., a stranddisplacing DNA polymerase
  • the method comprises conducting a rolling circle amplification reaction inside the cellular sample using the at least 2-10,000 covalently closed circular padlock probes as template molecules, thereby generating at least 2-10,000 concatemer molecules that correspond to at least 2-10,000 target RNA molecules.
  • the plurality of concatemers that are generated inside the cellular sample collapse into a DNA nanoball having a shape and size that is more compact compared to a non-collapsed concatemer.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (f): sequencing the plurality of concatemer molecules inside the cellular sample, which comprises sequencing the first concatemer molecule by conducting no more than 2-30 sequencing cycles to generate a plurality of first sequencing read products, and sequencing the second concatemer molecule by conducting no more than 2-30 sequencing cycles to generate a plurality of second sequencing read products (FIG. 8).
  • the sequencing of step (f) comprises sequencing no more than 2-30 bases of the first concatemer molecules to generate a plurality of first sequencing read products, and which comprises sequencing no more than 2-30 bases of the second concatemer molecules to generate a plurality of second sequencing read products.
  • the method comprises sequencing the at least 2-10,000 concatemer molecules inside the cellular sample, which comprises conducting no more than 2-30 sequencing cycles on the 2-10,000 concatemer molecules to generate a plurality of sequencing read products.
  • only the first target barcode region of the first concatemer molecules are sequenced (e.g., FIG. 8, top). In some embodiments, at least a portion or the full length of the first target barcode of the first concatemer molecules are sequenced (e.g., FIG. 8, top). In some embodiments, the first target barcode is sequenced and a portion of the first cDNA region (or the first RNA region) of the first concatemer molecules are sequenced. In some embodiments, at least a portion of the first cDNA region (or the first RNA region) of the first concatemer molecules are sequenced.
  • only the second target barcode region of the second concatemer molecules are sequenced (e.g., FIG. 8, bottom). In some embodiments, at least a portion or the full length of the second target barcode of the second concatemer molecules are sequenced (e.g., FIG. 8, bottom). In some embodiments, the second target barcode is sequenced and a portion of the second cDNA region (or the second RNA region) of the second concatemer molecules are sequenced. In some embodiments, at least a portion of the second cDNA region (or the second RNA region) of the second concatemer molecules are sequenced.
  • the sequencing of step (f) comprises contacting the plurality of concatemer molecules inside the cellular sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers.
  • the sequencing of step (f) further comprises conducting no more than 2-30 sequencing cycles to generate at least a first plurality of sequencing read products by sequencing at least the first target barcode region (Target BC-1), and optionally conducting no more than 2-30 sequencing cycles to generate at least a second plurality of sequencing read products by sequencing at least the second target barcode region (Target BC-2).
  • the nucleotide reagents comprise multivalent molecules, nucleotides and/or nucleotide analogs.
  • the sequencing of step (f) comprises sequencing at least a portion of the first and second nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm 2 .
  • FOV field-of-view
  • the plurality of first and second sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of first and second sequencing read products from the images obtained during the no more than 2-30 sequencing cycles.
  • the plurality of the first and second sequencing read products are detectable by imaging, and wherein the sequencing comprises simultaneously imaging the plurality of first and second detectable sequencing read products in the cellular sample (co-localization of the first and second sequencing read products).
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (g): removing the plurality of first sequencing read products from the first concatemer molecules and retaining the first concatemer molecules in the cellular sample, and removing the plurality of second sequencing read products from the second concatemer molecules and retaining the second concatemer molecules in the cellular sample.
  • methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (h): reiteratively sequencing the plurality of concatemers by repeating steps (f) and (g) at least once, wherein the sequences of the plurality of first sequencing read products confirms the presence of the first target RNA molecules in the cellular sample, and wherein the sequences of the plurality of second sequencing read products confirms the presence of the second target RNA molecules in the cellular sample.
  • reiteratively sequencing at least one region of the concatemer comprises repeating steps (f) - (g) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times.
  • reiteratively sequencing at least one region of the concatemer comprises repeating steps (f) - (g) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • steps (f) - (g) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • An example of reiterative sequence is shown in a schematic in FIG. 9-12.
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), (iii) an insert sequence that corresponds to a given target cDNA, and (iv) a target barcode sequence that corresponds to the given target cDNA (BC).
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., reference barcode sequence) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), (iii) an insert sequence that corresponds to a given target cDNA, and (iv) a target barcode sequence that corresponds to the given target cDNA (BC).
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., reference barcode sequence and the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • a first reference sequence e.g., reference barcode sequence and the insert sequence that corresponds to the target RNA
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), and (iii) an insert sequence that corresponds to a given target cDNA.
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq) and (ii) an insert sequence that corresponds to a given target cDNA.
  • universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence.
  • the reiterative sequencing can be conducted up to 50 times.
  • the sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
  • At least one concatemer is sequenced by conducting step (f) once (non-reiterative sequencing). In some embodiments, at least one concatemer is sequenced by conducting steps (f) - (g) once. In some embodiments, at least one concatemer is reiteratively sequenced by conducting steps (f) - (g) at least twice.
  • the plurality of universal sequencing primers can be hybridized to concatemer template molecules with a hybridization reagent comprising an SSC buffer (e.g., 2X saline-sodium citrate) buffer with formamide (e.g., 10-20% formamide).
  • SSC buffer e.g., 2X saline-sodium citrate
  • formamide e.g., 10-20% formamide.
  • the hybridization conditions comprise a temperature of about 20-30 °C, for about 10-60 minutes.
  • the plurality of sequencing read products can be removed from the concatemers and the plurality of concatemers can be retained inside the cellular sample using a de-hybridization reagent comprising an SSC buffer (e.g., saline-sodium citrate) buffer, with or without formamide, at a temperature that promotes nucleic acid denaturation such as for example 30 - 90 °C.
  • SSC buffer e.g., saline-sodium citrate
  • the plurality of nucleotide reagents of step (f) comprise a plurality of nucleotides that are detectably labeled or non-labeled.
  • individual nucleotides are linked to a detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the plurality of detectably labeled nucleotide analogs comprise a plurality of chain terminating nucleotides, where the chain terminating moiety is linked to the 3’ nucleotide sugar position to form a 3’ blocked nucleotide analog.
  • the chain terminating moiety can be removed to convert the 3’ blocked nucleotide analog to an extendible nucleotide having a 3’ OH group on the sugar.
  • the labeled nucleotide analogs are linked to a different fluorophore that corresponds to the nucleo-bases adenine, cytosine, guanine, thymine or uracil, where the different fluorophores emit a fluorescent signal during the sequencing of step (f).
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a sequencing polymerase and a detectably labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the detectably labeled chain terminating nucleotide into the terminal end of the sequencing primer, (2) detecting and imaging the fluorescent signal and color emitted by the incorporated chain terminating nucleotide, and (3) removing the chain terminating moiety (e.g., unblocking) and the fluorophore from the incorporated nucleotide and retaining the concatemer/sequencing primer duplex.
  • chain terminating moiety e.g., unblocking
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • the sequencing when the sequencing is conducted using different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U), then the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U)
  • the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • out-of-sync phasing and/or pre-phasing events can occur during synchronized sequencing reactions on clonally amplified template amplicons, where the sequencing reactions comprise polymerase-catalyzed sequencing reactions employing detectably labeled chain terminator nucleotides.
  • a sequencing reaction on one template molecule in the clonally-amplified template molecules moves ahead (e.g., pre-phasing) or fall behind (e.g., phasing) of the sequencing of the other template molecules within the clonally-amplified template molecules.
  • a fluorescent signal is typically detected which corresponds to incorporation of a labeled chain terminator nucleotide.
  • phasing and pre-phasing events can be detected and monitored using incorporation of a labeled chain terminator nucleotide.
  • the plurality of nucleotide reagents of step (f) comprise a plurality of multivalent molecules each comprising a core attached to a plurality of nucleotide-arms, wherein the nucleotide-arms are attached to a nucleotide unit.
  • individual multivalent molecules are labeled with a detectably reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the core of the multivalent molecule is labeled with a fluorophore, and wherein the fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • a fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • At least one of the nucleotide arms of the multivalent molecule comprises a linker and/or nucleotide base that is attached to a fluorophore, and wherein the fluorophore which is attached to a given nucleotide base corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • the nucleotide base e.g., adenine, guanine, cytosine, thymine or uracil
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a first sequencing polymerase to form a complexed polymerase, (2) contacting the complexed polymerase with a detectably labeled multivalent molecule under a condition suitable for binding a complementary nucleotide unit of the multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex, and the condition is suitable for inhibiting incorporation of the complementary nucleotide unit into the terminal end of the sequencing primer, (3) detecting and imaging the fluorescent signal and color emitted by the bound detectably labeled multivalent molecule, (4) removing the first sequencing polymerase and the bound detectably labeled multivalent molecule, and retaining the concatemer/sequencing primer duplex, (5) contacting the retained concatemer/sequencing primer duplex with a second sequencing polymerase and a non-labeled chain terminating nucleotide under a condition suitable for polymerase-cata
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • individual cycle times can be achieved in less than 30 minutes.
  • the field of view (FOV) can exceed 1 mm 2 and the cycle time for scanning large area (> 10 mm 2 ) can be less than 5 minutes.
  • steps (2) and (3) can be conducted at a gentle temperature of about 35 - 45 °C, or about 39 - 42 °C.
  • steps (2) and (3) can be conducted at a gentle temperature which can help retain the compact size and shape of a DNA nanoball during multiple sequencing cycles (e.g., up to 30 cycles) which can improve FWHM (full width half maximum) of a spot image of the DNA nanoball inside a cellular sample.
  • the DNA nanoball does not unravel during multiple sequencing cycles.
  • the spot image of the DNA nanoball does not enlarge during multiple sequencing cycles.
  • the spot image of the DNA nanoball remains a discrete spot during multiple sequencing cycles.
  • the spot image can be represented as a Gaussian spot and the size can be measured as a FWHM.
  • a smaller spot size as indicated by a smaller FWHM typically correlates with an improved image of the spot.
  • the FWHM of a nanoball spot can be about 10 um or smaller.
  • out-of-sync phasing and/or pre-phasing events can occur during synchronized polymerase-catalyzed sequencing reactions employing detectably labeled multivalent molecules.
  • a fluorescent signal can be detected which corresponds to binding of complementary nucleotide unit of a multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex.
  • phasing and pre-phasing events can be detected and monitored using binding of labeled multivalent molecules.
  • the phasing and/or prephasing rate when conducting up to 30 sequencing cycles with detectably labeled multivalent molecules, can be less than about 5%, or less than about 1%, or less than about 0.01%, or less than about 0.001%.
  • the phasing and/or pre-phasing rates for conducting up to 30 sequencing cycles using labeled chain terminator nucleotides can be about 5%.
  • the present disclosure provides methods for conducting in situ multiplex and multi-omics detection and identification using coded padlocks probes.
  • the padlock probes are designed to selectively detect target RNA.
  • the RNA-specific padlock probes selectively hybridize to cDNA that corresponds to target RNA.
  • the RNA-specific probes carry barcodes that uniquely identify the cDNA.
  • the RNA-specific padlock probes also carry batch-specific sequencing primer binding sites.
  • Both types of padlock probes are used to generate concatemers which having multiple copies of batch-specific sequencing binding sites and barcodes.
  • the concatemers can collapse into DNA nanoballs having compact shape and size that produce increased signal intensity and color differentiation during sequencing.
  • the limit of optical resolution impedes the ability to perform highly multiplex sequencing.
  • the batch-specific sequencing primer binding sites on the padlock probes enables sequencing a desired subset (e.g., a batch) of the concatemers using selected batch-specific sequencing primers to reduce over-crowding signals and images.
  • the use of batch-specific sequencing primers produces optical images that are intense and resolvable. By conducting multiple rounds of sequencing on the same cellular sample using different batch-specific sequencing primers enables multiplex sequencing to reveal numerous target RNAs.
  • the batch-specific sequencing methods described herein have many uses. For example, the number of spots that are imaged and associated with sequencing can be counted. The counted spots can be used as a measure of RNA levels in a cellular sample.
  • the present disclosure provides methods for detecting in situ at least two different target RNA molecules, comprising step (a): providing a cellular sample deposited on a solid support, wherein the cellular sample harbors (i) a first plurality of DNA amplicons (e.g., first concatemers) that correspond to a first target cDNA or RNA molecule, and (ii) a second plurality of DNA amplicons (e.g., second concatemers) that correspond to a second target cDNA or RNA molecule.
  • a first plurality of DNA amplicons e.g., first concatemers
  • a second plurality of DNA amplicons e.g., second concatemers
  • the method further comprises step (b): sequencing the first plurality of DNA amplicons inside the cellular sample under a condition that inhibits sequencing the second plurality of DNA amplicons, wherein sequencing the first plurality of DNA amplicons inside the cellular sample comprises generating a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample.
  • the first amplicons can be reiteratively sequenced by conducting no more than 2-30 sequencing cycles, or can be reiteratively sequenced by conducting 1-250 sequencing cycles.
  • the method further comprises step (c): sequencing the second plurality of DNA amplicons inside the cellular sample under a condition that inhibits sequencing the first plurality of DNA amplicons, wherein sequencing the second plurality of DNA amplicons inside the cellular sample comprises generating a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample.
  • the second amplicons can be reiteratively sequenced by conducting no more than 2-30 sequencing cycles, or can be reiteratively sequenced by conducting 1-250 sequencing cycles.
  • the present disclosure provides methods for detecting in situ at least two different target RNA molecules, comprising step (a): providing a cellular sample deposited on a solid support, wherein the cellular sample harbors a first plurality of target RNA and a second plurality of target RNA.
  • the first plurality of target RNA encode a first polypeptide.
  • the second plurality of target RNA encode a second polypeptide.
  • the cellular sample is fixed and permeabilized.
  • the cellular sample harbors 2-25 different target RNA molecules, or harbors 25-50 different target RNA molecules, or harbors 50-75 different target RNA molecules, or harbors 75-100 different target RNA molecules. In some embodiments, the cellular sample harbors more than 100 different target RNA molecules, or more than 250 different target RNA molecules, or more than 500 different target molecules, or more than 1000 different target RNA molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target RNA molecules. In some embodiments, the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor.
  • the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, or an FFPE cellular sample.
  • the cellular sample is deposited onto a solid support.
  • the cellular sample is deposited onto a solid support which is passivated with a coating that promotes cell adhesion.
  • the cellular sample is deposited on a support that lacks immobilized capture oligonucleotides.
  • the cellular sample is cultured prior to conducting step (b) which is described below.
  • the cellular sample harbors 2-25 different target polypeptide molecules, or harbors 25-50 different target polypeptide molecules, or harbors 50-75 different target polypeptide molecules, or harbors 75-100 different target polypeptide molecules. In some embodiments, the cellular sample harbors more than 100 different target polypeptide molecules, or more than 250 different target polypeptide molecules, or more than 500 different target molecules, or more than 1000 different target polypeptide molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target polypeptide molecules.
  • the target polypeptide molecules are encoded by the target RNA molecules.
  • the methods comprise step (b): generating inside the cellular sample a plurality of cDNA by (i) generating at least a first plurality of target cDNA from the first plurality of target RNA, and (ii) generating at least a second plurality of target cDNA from the second plurality of target RNA (e.g., FIG. 13).
  • the first target cDNAs correspond to the first target RNA molecules.
  • the second target cDNAs correspond to the second target RNA molecules.
  • the method comprises generating at least 2-10,000 different target cDNA molecules that correspond to 2-10,000 different target RNA molecules.
  • the generating of step (b) comprises contacting the plurality of RNA inside the cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample.
  • a plurality of reverse transcription primers e.g., a plurality of reverse transcriptase enzymes
  • a plurality of nucleotides e.g., a plurality of nucleotides
  • the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA, and/or comprises a second sub-population of target-specific reverse transcription primers that hybridize selectively to the second target RNA. In some embodiments, the plurality of reverse transcription primers comprises a first sub-population of random-sequence reverse transcription primers that hybridize to the first target RNA, and/or comprises a second sub-population of random-sequence reverse transcription primers that hybridize to the second target RNA.
  • the first padlock probe comprises (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA, (ii) a first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • target BC-1 a first batch-specific sequencing primer binding site
  • Batch Seq-1 or a complementary sequence thereof
  • a universal binding site for an amplification primer universal RCA
  • a compaction oligonucleotide or a complementary sequence thereof
  • the second padlock probe comprises (i) a second target barcode sequence (target BC-2) that uniquely identifies the second target RNA, (ii) a second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • the methods comprise step (c): generating inside the cellular sample a plurality of DNA concatemers which correspond to the first and second plurality of target RNA molecules, comprising: (1) generating a first plurality of covalently closed circular padlock probes by contacting the first plurality of target cDNA with a first plurality of padlock probes, wherein the contacting is conducted under a condition suitable for hybridizing the first and second binding arms of the first padlock probes to proximal positions on their respective first target cDNA molecules to form a first plurality of circularized padlock probes each having a nick or gap between the hybridized first and second binding arms, wherein the first padlock probes include a (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA or cDNA, (ii) a first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), and (iii) a universal binding site for an amplification primer
  • target BC-1 a first
  • the rolling circle amplification reaction can be conducted in the presence or absence of a plurality of compaction oligonucleotides.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes.
  • the first padlock probe further comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the closing the nick in the first circularized padlock probes comprises conducting an enzymatic ligation reaction.
  • closing the gap in the first circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the first target cDNA molecule as a template, and conducting an enzymatic ligation reaction.
  • the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample.
  • each concatemer molecule in the first plurality comprises tandem repeat units, wherein a unit comprises the sequence of the first target cDNA and (i) the first target barcode sequence (target BC-1) that uniquely identifies the first target RNA, (ii) the first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), and (iii) the universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof).
  • the unit further comprises the universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • step (c) further comprises: generating inside the cellular sample a plurality of DNA concatemers which correspond to the second plurality of target RNA molecules, comprising: (1) generating a second plurality of covalently closed circular padlock probes by contacting the second plurality of target cDNA with a second plurality of padlock probes, wherein the contacting is conducted under a condition suitable for hybridizing the first and second binding arms of the second padlock probes to proximal positions on their respective second target cDNA molecules to form a second plurality of circularized padlock probes each having a nick or gap between the hybridized first and second binding arms, wherein the second padlock probes include a (i) a second barcode sequence (target BC-2) that uniquely identifies the second target cDNA or RNA, (ii) a second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof) wherein the sequence of the second batch-specific sequencing primer binding site differs from the sequence of the
  • the rolling circle amplification reaction can be conducted in the presence or absence of a plurality of compaction oligonucleotides.
  • the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes.
  • the second padlock probe further comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the closing the nick in the second circularized padlock probes comprises conducting an enzymatic ligation reaction.
  • closing the gap in the second circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the second target cDNA molecule as a template, and conducting an enzymatic ligation reaction.
  • the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample.
  • each concatemer molecule in the second plurality comprises tandem repeat units, wherein a unit comprises the sequence of the second target cDNA and (i) the second target barcode sequence (target BC-2) that uniquely identifies the second target cDNA or RNA, (ii) the second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof), and (iii) the universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof).
  • the unit further comprises the universal compaction oligonucleotide binding site (or a complementary sequence thereof).
  • the methods further comprise step (d): sequencing the first plurality of concatemer molecules inside the cellular sample under a condition that inhibits sequencing the second plurality of concatemers (e.g., FIG. 14).
  • step (d) comprises sequencing the first plurality of concatemers inside the cellular sample comprises conducting no more than 2-30 sequencing cycles to generate a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample.
  • step (d) comprises sequencing the first plurality of concatemers inside the cellular sample comprises conducting 1-250 sequencing cycles to generate a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample.
  • the first and second concatemers are subjected to a first sequencing workflow using first batch-specific sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents.
  • the first concatemers undergo reiterative sequencing but the second concatemers do not.
  • the first and second concatemers are subjected to a second sequencing workflow using second batch-specific sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents.
  • the second concatemers undergo reiterative sequencing but the first concatemers do not.
  • step (d) in the first concatemer molecules, only the first target barcode region (target BC-1) is sequenced. In some embodiments, in the first concatemer molecules, at least a portion or the full length of the first target barcode (target BC-1) is sequenced. In some embodiments, in the first concatemer molecules, the first target barcode (target BC-1) is sequenced and a portion of the first cDNA region is sequenced.
  • the sequencing the first concatemers of step (d) comprises step (1) contacting the first plurality of concatemer molecules inside the cellular sample with (i) a plurality of first batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of first batch-specific sequencing primers to their respective first batch-specific sequencing primer binding sites on the first concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the first concatemers as template molecules.
  • the sequencing of step (d) comprises sequencing at least a portion of the first nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm 2 .
  • FOV field-of-view
  • the plurality of first sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of first sequencing read products from the images obtained during the no more than 2-30 sequencing cycles, or from the images obtained during the 1-250 sequence cycles.
  • the methods further comprise step (e): removing the plurality of first sequencing read products from the first concatemer molecules and retaining the first concatemer molecules inside the cellular sample.
  • a 3’ blocking moiety can be added to the first sequencing read products to inhibit further sequencing reactions.
  • a nucleotide analog can be incorporated where the nucleotide analog inhibits incorporation of a subsequent nucleotide.
  • Exemplary blocking nucleotide analogs include dideoxynucleotide or a nucleotide having a 2’ or 3’ chain terminating moiety.
  • the methods further comprise step (f): reiteratively sequencing the plurality of first concatemers by repeating steps (d) and (e) at least once. In some embodiments, reiterative sequencing of step (f) is optional.
  • the sequencing the first concatemers of step (f) comprises step (1) contacting the first plurality of concatemer molecules inside the cellular sample with (i) a plurality of first batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of first batch-specific sequencing primers to their respective first batch-specific sequencing primer binding sites on the first concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the first concatemers as template molecules.
  • the sequencing further comprises step (3) removing the first plurality of sequencing read products from the first concatemers and retaining the plurality of first concatemers inside the cellular sample.
  • the sequencing further comprises step (4) repeating steps (1) - (3) at least once (e.g., FIG. 14).
  • step (4) comprises repeating steps (1) - (3) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times.
  • step (4) comprises repeating steps (1) - (3) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • the reiterative sequencing of the first concatemers of step (f) can be conducting using a sequencing-by-binding procedure, labeled and/or nonlabeled chain-terminating nucleotides, or multivalent molecules. Descriptions of these three sequencing methods is described below.
  • the plurality of universal sequencing primers can be hybridized to concatemer template molecules with a hybridization reagent comprising an SSC buffer (e.g., 2X saline-sodium citrate) buffer with formamide (e.g., 10-20% formamide).
  • SSC buffer e.g., 2X saline-sodium citrate
  • formamide e.g., 10-20% formamide.
  • the hybridization conditions comprise a temperature of about 20-30 °C, for about 10-60 minutes.
  • the plurality of sequencing read products can be removed from the concatemers and the plurality of concatemers can be retained inside the cellular sample using a de-hybridization reagent comprising an SSC buffer (e.g., saline-sodium citrate) buffer, with or without formamide, at a temperature that promotes nucleic acid denaturation such as for example 30 - 90 °C.
  • SSC buffer e.g., saline-sodium citrate
  • the methods further comprise step (g): sequencing the second plurality of concatemer molecules inside the cellular sample under a condition that inhibits sequencing the first plurality of concatemers (e.g., FIG. 14).
  • step (g) comprises sequencing the second plurality of concatemers inside the cellular sample comprises conducting no more than 2-30 sequencing cycles to generate a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample.
  • step (g) comprises sequencing the second plurality of concatemers inside the cellular sample comprises conducting 1-250 sequencing cycles to generate a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample.
  • step (g) in the second concatemer molecules, only the second target barcode region (target BC-2) is sequenced. In some embodiments, in the second concatemer molecules, at least a portion or the full length of the second target barcode (target BC-2) is sequenced. In some embodiments, in the second concatemer molecules, the second target barcode (target BC-2) is sequenced and a portion of the second cDNA region is sequenced.
  • the sequencing the second concatemers of step (g) comprises step (1) contacting the second plurality of concatemer molecules inside the cellular sample with (i) a plurality of second batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of second batch-specific sequencing primers to their respective second batch-specific sequencing primer binding sites on the second concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a second plurality of sequencing read products using the second concatemers as template molecules.
  • the sequencing of step (g) comprises sequencing at least a portion of the second nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm 2 .
  • FOV field-of-view
  • the plurality of second sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of second sequencing read products from the images obtained during the no more than 2-30 sequencing cycles, or from the images obtained during the 1-250 sequencing cycles.
  • the methods further comprise step (h): removing the plurality of second sequencing read products from the second concatemer molecules and retaining the second concatemer molecules inside the cellular sample.
  • a 3’ blocking moiety can be added to the second sequencing read products to inhibit further sequencing reactions.
  • a nucleotide analog can be incorporated where the nucleotide analog inhibits incorporation of a subsequent nucleotide.
  • Exemplary blocking nucleotide analogs include dideoxynucleotide or a nucleotide having a 2’ or 3’ chain terminating moiety.
  • the methods further comprise step (i): reiteratively sequencing the plurality of second concatemers by repeating steps (g) and (h) at least once. In some embodiments, reiterative sequencing of step (i) is optional.
  • the sequencing the second concatemers of step (i) comprises step (1) contacting the second plurality of concatemer molecules inside the cellular sample with (i) a plurality of second batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of second batch-specific sequencing primers to their respective second batch-specific sequencing primer binding sites on the second concatemers.
  • the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the second concatemers as template molecules.
  • the sequencing further comprises step (3) removing the first plurality of sequencing read products from the second concatemers and retaining the plurality of second concatemers inside the cellular sample.
  • the sequencing further comprises step (4) repeating steps (1) - (3) at least once (e.g., FIG. 14).
  • step (4) comprises repeating steps (1) - (3) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times.
  • step (4) comprises repeating steps (1) - (3) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
  • the reiterative sequencing of the second concatemers of step (i) can be conducting using a sequencing-by-binding procedure, labeled and/or nonlabeled chain-terminating nucleotides, or multivalent molecules. Descriptions of these three sequencing methods is described below.
  • the plurality of nucleotide reagents of steps (d) and (g) comprise a plurality of nucleotides that are detectably labeled or non-labeled.
  • individual nucleotides are linked to a detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the plurality of detectably labeled nucleotide analogs comprise a plurality of chain terminating nucleotides, where the chain terminating moiety is linked to the 3’ nucleotide sugar position to form a 3’ blocked nucleotide analog.
  • the chain terminating moiety can be removed to convert the 3’ blocked nucleotide analog to an extendible nucleotide having a 3’ OH group on the sugar.
  • the labeled nucleotide analogs are linked to a different fluorophore that corresponds to the nucleo-bases adenine, cytosine, guanine, thymine or uracil, where the different fluorophores emit a fluorescent signal.
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a sequencing polymerase and a detectably labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the detectably labeled chain terminating nucleotide into the terminal end of the sequencing primer, (2) detecting and imaging the fluorescent signal and color emitted by the incorporated chain terminating nucleotide, and (3) removing the chain terminating moiety (e.g., unblocking) and the fluorophore from the incorporated nucleotide and retaining the concatemer/sequencing primer duplex.
  • chain terminating moiety e.g., unblocking
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • the sequencing when the sequencing is conducted using different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U), then the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U)
  • the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
  • out-of-sync phasing and/or pre-phasing events can occur during synchronized sequencing reactions on clonally amplified template amplicons, where the sequencing reactions comprise polymerase-catalyzed sequencing reactions employing detectably labeled chain terminator nucleotides.
  • a sequencing reaction on one template molecule in the clonally-amplified template molecules moves ahead (e.g., pre-phasing) or fall behind (e.g., phasing) of the sequencing of the other template molecules within the clonally-amplified template molecules.
  • a fluorescent signal is typically detected which corresponds to incorporation of a labeled chain terminator nucleotide.
  • phasing and pre-phasing events can be detected and monitored using incorporation of a labeled chain terminator nucleotide.
  • the plurality of nucleotide reagents of steps (d) and (g) comprise a plurality of multivalent molecules each comprising a core attached to a plurality of nucleotide-arms, wherein the nucleotide-arms are attached to a nucleotide unit.
  • individual multivalent molecules are labeled with a detectably reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the core of the multivalent molecule is labeled with a fluorophore, and wherein the fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • a fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • At least one of the nucleotide arms of the multivalent molecule comprises a linker and/or nucleotide base that is attached to a fluorophore, and wherein the fluorophore which is attached to a given nucleotide base corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm.
  • the nucleotide base e.g., adenine, guanine, cytosine, thymine or uracil
  • a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a first sequencing polymerase to form a complexed polymerase, (2) contacting the complexed polymerase with a detectably labeled multivalent molecule under a condition suitable for binding a complementary nucleotide unit of the multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex, and the condition is suitable for inhibiting incorporation of the complementary nucleotide unit into the terminal end of the sequencing primer, (3) detecting and imaging the fluorescent signal and color emitted by the bound detectably labeled multivalent molecule, (4) removing the first sequencing polymerase and the bound detectably labeled multivalent molecule, and retaining the concatemer/sequencing primer duplex, (5) contacting the retained concatemer/sequencing primer duplex with a second sequencing polymerase and a non-labeled chain terminating nucleotide under a condition suitable for polymerase-cata
  • no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products.
  • the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample.
  • the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
  • the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length.
  • the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample.
  • the sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are colocated in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules.
  • individual cycle times can be achieved in less than 30 minutes.
  • the field of view (FOV) can exceed 1 mm 2 and the cycle time for scanning large area (> 10 mm 2 ) can be less than 5 minutes.
  • the plurality of RNA or cDNA inside the cellular sample can be amplified to generate amplicons of the RNA or cDNA where the amplicons comprise concatemers.
  • the plurality of RNA or cDNA molecules inside the cellular sample can be amplified by conducting a padlock probe circularization and rolling circle amplification workflow.
  • the methods comprise contacting the plurality of RNA or cDNA molecules inside the cellular sample with a plurality of padlock probes, including a first plurality of target-specific padlock probes that hybridize with first target RNA or cDNA molecules, and a second plurality of target-specific padlock probes that hybridize with second target RNA or cDNA molecules.
  • the padlock probes comprise single-stranded oligonucleotides.
  • the padlock probes comprise DNA, RNA, or DNA and RNA.
  • individual padlock probes comprise an internal region between the first and second terminal regions, where the internal region comprises at least one universal adaptor sequence including a sample barcode sequence, an amplification primer binding site, a sequencing primer binding site, a compaction oligonucleotide binding site and/or a surface capture primer binding site (FIG. 6).
  • the padlock probes comprise at least one target barcode sequence that corresponds to a given target RNA or target cDNA to which the padlock probes binds.
  • the padlock probes comprise at least one unique identification sequence (e.g., unique molecular index (UMI)).
  • the padlock probes comprise at least one restriction enzyme recognition sequence.
  • a padlock probe comprises a single-stranded nucleic acid molecule having two terminal regions (e.g., first and second binding arms) and an internal region.
  • the first terminal region of an individual padlock probe has a first target-specific sequence that selectively hybridizes to a first region of a target RNA or target cDNA molecule
  • the second terminal region of the individual padlock probe has a second target-specific sequence that selectively hybridizes to a second region of the same target RNA or target cDNA molecule.
  • the internal region of a padlock comprises a target barcode sequence (e.g., Target BC-1 or Target BC-2, left and right schematics respectively) which corresponds to a given target RNA or target cDNA.
  • the target barcode sequence uniquely identifies the target RNA or target cDNA.
  • the internal region of a padlock comprises a universal primer binding site for a sequencing primer (or a complementary sequence thereof).
  • the internal region of a padlock comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof).
  • the internal region of a padlock comprises a universal binding site for a compaction oligonucleotide binding (or a complementary sequence thereof).
  • the internal region of a padlock probe includes a target barcode sequence and at least one universal primer binding site (e.g., for binding a sequencing primer, for binding a rolling circle amplification primer and/or for binding a compaction oligonucleotide) in any arrangement and orientation (FIG. 6, top and bottom).
  • a target barcode sequence e.g., for binding a sequencing primer, for binding a rolling circle amplification primer and/or for binding a compaction oligonucleotide
  • individual padlock probes comprise first and second terminal regions (e.g., first and second binding arms) that hybridize to portions of target RNA or target cDNA molecules to form a plurality of RNA-padlock probe complexes or a plurality of cDNA-padlock probe complexes, wherein individual complexes have the first and second terminal probe regions hybridized to proximal regions of an RNA or cDNA molecule to form a nick or gap between the first and second terminal probe ends.
  • first and second terminal regions e.g., first and second binding arms
  • the first terminal region of an individual padlock probe has a first target-specific sequence that selectively hybridizes to a first region of a target RNA or cDNA molecule
  • the second terminal region of the individual padlock probe has a second target-specific sequence that selectively hybridizes to a second region of the same target RNA or cDNA molecule, where a nick or gap is formed between the hybridized first and second terminal regions, thereby circularizing the padlock probe (e.g., FIG. 7).
  • the first padlock probe comprises (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA or the first target cDNA, (ii) a first sequencing primer binding site (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • target BC-1 a first target barcode sequence
  • target BC-1 a first sequencing primer binding site
  • a universal binding site for an amplification primer universal RCA
  • a compaction oligonucleotide or a complementary sequence thereof
  • the second padlock probe comprises (i) a second target barcode sequence (target BC-2) that uniquely identifies the second target RNA or the second target cDNA, (ii) a second sequencing primer binding site(or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
  • target BC-2 a second target barcode sequence
  • the padlock probes comprise canonical nucleotides and/or nucleotide analogs.
  • the padlock probes are modified to confer resistance to nuclease degradation (e.g., ribonuclease degradation).
  • the padlock probes comprise at least one phosphorothioate diester bond at their 5’ ends which can render the padlock probes resistant to nuclease degradation.
  • the padlock probes comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends.
  • the padlock probes comprise at least one ribonucleotide and/or at least one 2’-O-methyl, 2’-O-methoxyethyl (MOE), 2’ fluoro-base nucleotide.
  • the padlock probes comprise phosphorylated 3’ ends.
  • the padlock probes comprise at least one locked nucleic acid (LNA) base.
  • the padlock probes comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
  • individual padlock probes in a set of padlock probes comprise first and second terminal regions that hybridize to the same target regions of the target RNA or cDNA molecules to form a plurality of RNA-padlock probe complexes or a plurality of cDNA-padlock probe complexes having the same RNA or cDNA sequence.
  • a set of padlock probes (e.g., a plurality of padlock probes) comprise at least two sub-sets of padlock probes.
  • individual padlock probes in a first sub-set of padlock probes comprise first and second terminal regions that hybridize to the same target regions (e.g., a first target region) of the target RNA or cDNA molecules to form a first plurality of RNA-padlock probe complexes or a first plurality of cDNA-padlock probe complexes having the same RNA or cDNA sequence.
  • individual padlock probes in a second sub-set of padlock probes comprise first and second terminal regions that hybridize to the same target regions (e.g., a second target region) of the target RNA or cDNA molecules to form a second plurality of RNA-padlock probe complexes or a second plurality of cDNA- padlock probe complexes having the same cDNA sequence.
  • the first and second sub-sets of padlock probes hybridize to different target regions of the same target RNA or cDNA molecules.
  • the first and second subsets of padlock probes hybridize to different target regions of different target RNA or cDNA molecules.
  • the set of padlock probes comprise 2-10 subsets of padlock probes, or 10-25 sub-sets of padlock probes, or 25-50 sub-sets of padlock probes, or up to 100 sub-sets of padlock probes. In some embodiments, the set of padlock probes comprise at least 100 sub-sets of padlock probes, at least 500 sub-sets of padlock probes, at least 1000 sub-sets of padlock probes, at least 10,000 sub-sets of padlock probes, or more sub-sets of padlock probes.
  • the nicks can be enzymatically ligated to generate covalently closed circular padlock probes.
  • the ligase enzyme can discriminate between matched and mis-matched hybridized ends to ensure target-specific hybridization.
  • the ligation reaction comprises use of a ligase enzyme, including a T3, T4, T7 or Taq DNA ligase enzyme.
  • the size of the gap between the hybridized first and second terminal regions is 1-25 bases.
  • the 3 ’OH end of hybridized padlock probe can serve as an initiation site for a polymerase-catalyzed fill-in reaction (e.g., gap fill-in reaction) using the target cDNA molecule (or the target RNA molecule) as a template. After the fill-in reaction, the remaining nick can be enzymatically ligated to generate covalently closed circular padlock probes.
  • the gap-filling reaction comprises contacting the circularized padlock probe with a DNA polymerase and a plurality of nucleotides.
  • the DNA polymerase comprises E. coli DNA polymerase I, Klenow fragment of E. coli DNA polymerase I, T7 DNA polymerase, or T4 DNA polymerase.
  • the ligase enzyme can discriminate between matched and mismatched hybridized ends to ensure target-specific hybridization.
  • the ligation reaction comprises use of a ligase enzyme, including a T3, T4, T7 or Taq DNA ligase enzyme.
  • the plurality of covalently closed circular padlock probes can be subjected to a rolling circle amplification reaction to generate a plurality of concatemer molecules each having two or more tandem copies of a unit wherein the unit comprises a target sequence that corresponds to a target RNA molecules and any additional sequence(s) carried by the padlock probes including universal adaptor sequence(s), unique molecular index sequence(s) and/or restriction enzyme recognition sequence(s).
  • the rolling circle amplification reaction comprises contacting the covalently closed circularized padlock probes with an amplification primer (e.g., a universal rolling circle amplification primer), a strand-displacing DNA polymerase, and a plurality of nucleotides, under a condition suitable for hybridizing individual amplification primers to a covalently closed padlock probe, and under a condition suitable for conducting primer extension using the covalently closed padlock probe as a template molecule to generate a nucleic acid concatemer.
  • an amplification primer e.g., a universal rolling circle amplification primer
  • a strand-displacing DNA polymerase e.g., a strand-displacing DNA polymerase
  • the plurality of nucleotides in the rolling circle amplification reaction comprise any mixture of two or more of dATP, dGTP, dCTP, dTTP and/or dUTP.
  • any of the rolling circle amplification reactions described herein can be conducted in the presence or in the absence of a plurality of compaction oligonucleotides.
  • the resulting concatemer when the rolling circle amplification reaction includes a plurality of nucleotide which includes dUTP, the resulting concatemer can be cross-linked to a cross-linking reactive group by treating the cellular sample with a succinimide ester (NHS), maleimide (Sulfo-SMCC), imidoester (DMP), carbodiimide (DCC, EDC) or phenyl azide.
  • NHS succinimide ester
  • DMP imidoester
  • DCC carbodiimide
  • EDC carbodiimide
  • polymerization of the cross-linking reactive group can be initiated with light or UV light.
  • the resulting concatemer can be cross-linked to a matrix by treating the cellular sample with a cross-linked agarose, cross-linked dextran or cross-linked polyethylene glycol (PEG), polyacrylamide, cellulose alginate or polyamide.
  • PEG polyethylene glycol
  • the PEG comprises a sulfo-NHS ester moiety at one or both ends, for example a PEGylated bis(sulfosuccinimidyl)suberate) (e.g., BS(PEG)9 from Thermo Fisher Scientific, catalog No. 21582).
  • the rolling circle amplification reaction can be conducted at a constant temperature (e.g., isothermal) wherein the constant temperature is at room temperature to about 30 °C, or about 30 - 40 °C, or about 40 - 50 °C, or about 50 - 65 °C.
  • a constant temperature e.g., isothermal
  • the DNA polymerase having a strand displacing activity can be selected from a group consisting of phi29 DNA polymerase, large fragment of Bst DNA polymerase, large fragment of Bsu DNA polymerase, and Bea (exo-) DNA polymerase, Klenow fragment of E. coli DNA polymerase, T5 polymerase, M-MuLV reverse transcriptase, HIV viral reverse transcriptase, or Deep Vent DNA polymerase.
  • the phi29 DNA polymerase can be wild type phi29 DNA polymerase (e.g., MagniPhi from Expedeon), or variant EquiPhi29 DNA polymerase (e.g., from Thermo Fisher Scientific), and chimeric QualiPhi DNA polymerase (e.g., from 4basebio).
  • wild type phi29 DNA polymerase e.g., MagniPhi from Expedeon
  • EquiPhi29 DNA polymerase e.g., from Thermo Fisher Scientific
  • chimeric QualiPhi DNA polymerase e.g., from 4basebio
  • the rolling circle amplification primers can be modified to increase resistance to nuclease degradation.
  • the rolling circle amplification primers comprise at least one phosphorothioate diester bond at their 5’ ends which can render the amplification primers resistant to exonuclease degradation.
  • the rolling circle amplification primers comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends.
  • the rolling circle amplification primers comprise at least one ribonucleotide and/or at least one 2’-O- methyl or 2’-O-methoxyethyl (MOE) nucleotide.
  • the rolling circle amplification reaction can be conducted in the presence of a plurality of compaction oligonucleotides which, when hybridized to a concatemer molecule, compacts the size and/or shape of the concatemer to form a compact nanoball.
  • the compaction oligonucleotides comprise single stranded oligonucleotides having a first region at one end that hybridizes to a portion of a concatemer molecule and a second region at the other end that hybridizes to another portion of the same concatemer molecule, where hybridization of the compaction oligonucleotide to a given concatemer compacts the size and/or shape of the concatemer.
  • the compaction oligonucleotides include a 5’ region, an optional internal region (intervening region), and a 3’ region.
  • the 5’ and 3’ regions of the compaction oligonucleotide can hybridize to any portions of the concatemer.
  • the 5’ and 3’ regions of the compaction oligonucleotide can hybridize to different portions of the concatemer to pull together distal portions of the concatemer causing compaction of the concatemer to form a DNA nanoball.
  • the 5’ region of the compaction oligonucleotide is designed to hybridize to a first portion of the concatemer molecule (e.g., a universal compaction oligonucleotide binding site), and the 3’ region of the compaction oligonucleotide is designed to hybridized to a second portion of the concatemer molecule (e.g., a universal compaction oligonucleotide binding site).
  • Inclusion of compaction oligonucleotides during RCA can promote formation of DNA nanoballs having tighter size and shape compared to concatemers generated in the absence of the compaction oligonucleotides.
  • the compact and stable characteristics of the DNA nanoballs improves in situ sequencing accuracy by increasing signal intensity and the nanoballs retain their shape and size during multiple sequencing cycles.
  • the compaction oligonucleotides comprise single stranded oligonucleotides comprising DNA, RNA, or a combination of DNA and RNA.
  • the compaction oligonucleotides can be any length, including 20-150 nucleotides, or 30-100 nucleotides, or 40-80 nucleotides in length.
  • the compaction oligonucleotides comprises a 5’ region and a 3’ region, and optionally an intervening region between the 5’ and 3’ regions.
  • the intervening region can be any length, for example about 2-20 nucleotides in length.
  • the intervening region comprises a homopolymer having consecutive identical bases (e.g., AAA, GGG, CCC, TTT or UUU).
  • the intervening region comprises a non-homopolymer sequence.
  • the 5’ region of the compaction oligonucleotides can be wholly complementary or partially complementary along its length to a first portion of a concatemer molecule.
  • the 3’ region of the compaction oligonucleotides can be wholly complementary or partially complementary along its length to a second portion of a concatemer molecule.
  • the 5’ region of the compaction oligonucleotides can hybridize to a first universal sequence portion of a concatemer molecule.
  • the 3’ region of the compaction oligonucleotides can hybridize to a second universal sequence portion of a concatemer molecule.
  • the 5’ region of the compaction oligonucleotide can have the same sequence as the 3’ region.
  • the 5’ region of the compaction oligonucleotide can have a sequence that is different from the 3’ region.
  • the 3’ region of the compaction oligonucleotide can have a sequence that is a reverse sequence of the 5’ region.
  • the 5’ region of the compaction oligonucleotide can have a sequence that is a reverse sequence of the 3’ region.
  • the 3’ region of any of the compaction oligonucleotides can include an additional three bases at the terminal 3’ end which comprises 2’-O-methyl RNA bases (e.g., designated mUmUmU) or the terminal 3’ end lacks additional 2’-O- methyl RNA bases.
  • the compaction oligonucleotides comprise one or more modified bases or linkages at their 5’ or 3’ ends to confer certain functionalities. In some embodiments, the compaction oligonucleotides comprise at least one phosphorothioate linkages at their 5’ and/or 3’ ends to confer exonuclease resistance. In some embodiments, at least one nucleotide at or near the 3’ end comprises a 2’ fluoro base which confers exonuclease resistance. In some embodiments, the 3’ end of the compaction oligonucleotides comprise at least one 2’-O-methyl RNA base which blocks polymerase-catalyzed extension.
  • the 3’ end of the compaction oligonucleotide comprises three bases comprising 2’-O-methyl RNA base (e.g., designated mUmUmU).
  • the compaction oligonucleotides comprise a 3’ inverted dT at their 3’ ends which blocks polymerase-catalyzed extension.
  • the compaction oligonucleotides comprise 3’ phosphorylation which blocks polymerase-catalyzed extension.
  • the internal region of the compaction oligonucleotides comprise at least one locked nucleic acid (LNA) which increases the thermal stability of duplexes formed by hybridizing a compaction oligonucleotide to a concatemer molecule.
  • LNA locked nucleic acid
  • the compaction oligonucleotides comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
  • the compaction oligonucleotide comprises the sequence 5 ’ -C ATGT AATGC ACGT ACTTTC AGGGT AAAC ATGT AATGC ACGT ACTTT
  • the compaction oligonucleotides includes an additional three bases at the terminal 3’ end which comprises 2’-0-methyl RNA bases (e.g., designated mUmUmU) or the terminal 3’ end lacks additional 2’-O-methyl RNA bases.
  • the compaction oligonucleotides can include at least one region having consecutive guanines.
  • the compaction oligonucleotides can include at least one region having 2, 3, 4, 5, 6 or more consecutive guanines.
  • the compaction oligonucleotides comprise four consecutive guanines which can form a guanine tetrad structure (see FIG. 25).
  • the guanine tetrad structure can be stabilized via Hoogsteen hydrogen bonding.
  • the guanine tetrad structure can be stabilized by a central cation including potassium, sodium, lithium, rubidium or cesium.
  • At least one compaction oligonucleotide can form a guanine tetrad (FIG. 25) and hybridize to the universal binding sequences in a concatemer which can cause the concatemer to fold to form an intramolecular G-quadruplex structure (FIG. 26).
  • the concatemers can self-collapse to form compact nanoballs. Formation of the guanine tetrads and G-quadruplexes in the nanoballs may increase the stability of the nanoballs to retain their compact size and shape which can withstand changes in pH, temperature and/or repeated flows of reagents during sequencing inside the cellular sample.
  • the plurality of compaction oligonucleotides in the rolling circle amplification reaction have the same sequence.
  • the plurality of compaction oligonucleotides in the rolling circle amplification reaction comprise a mixture of two or more different populations of compaction oligonucleotides having different sequences.
  • the immobilized concatemer template molecule can selfcollapse into a compact nucleic acid nanoball.
  • the nanoballs can be imaged and a FWHM measurement can be obtained to give the shape/size of the nanoballs.
  • inclusion of compaction oligonucleotides in the rolling circle amplification reaction can promote collapsing of a concatemer into a DNA nanoball.
  • Conducting RCA with compaction oligonucleotides helps retain the compact size and shape of a DNA nanoball during multiple sequencing cycles which can improve FWHM (full width half maximum) of a spot image of the DNA nanoball inside a cellular sample.
  • the DNA nanoball does not unravel during multiple sequencing cycles.
  • the spot image of the DNA nanoball does not enlarge during multiple sequencing cycles.
  • the spot image of the DNA nanoball remains a discrete spot during multiple sequencing cycles.
  • the spot image can be represented as a Gaussian spot and the size can be measured as a FWHM.
  • a smaller spot size as indicated by a smaller FWHM typically correlates with an improved image of the spot.
  • the FWHM of a nanoball spot can be about 10 um or smaller.
  • each nanoball carries numerous tandem copies of a polynucleotide unit along their lengths, where the polynucleotide unit includes a sequence-of-interest (e.g., that corresponds to target RNA or target cDNA) and at least a universal sequencing primer binding site.
  • Each polynucleotide unit can bind a sequencing primer, a sequencing polymerase and a detectably-labeled nucleotide reagent (e.g., detectably labeled multivalent molecules), to form a detectable sequencing complex (e.g., a detectable ternary complex).
  • Each nanoball carries numerous detectable sequencing complexes.
  • the compact nature of the nanoballs increases the local concentration of detectably- labeled nucleotide reagents that are used during the sequencing workflow which increases the signal intensity emitted from a nanoball to give a discrete detectable signal which can be imaged as a fluorescent spot inside the cellular sample.
  • Each spot corresponds to a concatemer and each concatemer corresponds to a target RNA molecule in the cellular sample. Multiple spots can be detected and imaged simultaneously in the cellular sample.
  • the DNA nanoballs having compact shape and size that produce increased signal intensity and color differentiation during sequencing.
  • the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor.
  • the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, or an FFPE cellular sample.
  • the cellular sample comprise one or more living cells or non-living cells.
  • the cellular sample can be obtained from a virus, fungus, prokaryote or eukaryote. In some embodiments, the cellular sample can be obtained from an animal, insect or plant. In some embodiments, the cellular sample comprises one or more virally-infected cells.
  • the cellular sample can be obtained from any organism including human, simian, ape, canine, feline, bovine, equine, murine, porcine, caprine, lupine, ranine, piscine, plant, insect or bacteria.
  • the cellular sample can be obtained from any organ including head, neck, brain, breast, ovary, cervix, colon, rectum, endometrium, gallbladder, intestines, bladder, prostate, testicles, liver, lung, kidney, esophagus, pancreas, thyroid, pituitary, thymus, skin, heart, larynx, or other organs.
  • the cellular sample harbors a plurality of RNA which include target RNA and non-target RNA.
  • cells typically produce RNA by gene expression which includes transcription of DNA (e.g., genomic DNA) into RNA molecules.
  • the transcribed RNA can undergo splicing or may not be spliced.
  • the transcribed RNA can be translated into a polypeptide (e.g., coding RNA), or do not undergo translation but can be processed into tRNA or rRNA (e.g., noncoding RNA).
  • the plurality of RNA harbored by the cellular sample includes target and non-target RNA.
  • the plurality of RNA harbored by the cellular sample comprises wild type RNA, mutant RNA or splice variant RNA.
  • the plurality of RNA harbored by the cellular sample comprises pre-spliced RNA, partially spliced RNA, or fully spliced RNA.
  • the plurality of RNA harbored by the cellular sample comprises coding RNA, non-coding RNA, mRNA, tRNA, rRNA, microRNA (miRNA), mature microRNA, or immature microRNA.
  • the plurality of RNA harbored by the cellular sample comprises housekeeping RNA, cell-specific RNA, tissue-specific RNA or disease-specific RNA. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA expressed by one or more cells in response to a stimulus such as heat, light, a chemical or a drug. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA found in healthy cells or diseased cells. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA transcribed from transgenic DNA sequences that are introduced into the cellular sample using recombinant DNA procedures.
  • the RNA can be transcribed from a transgenic DNA sequence that is controlled by an inducible or constitutive promoter sequence.
  • the plurality of RNA harbored by the cellular sample comprises RNA that is transcribed from DNA sequences that are not transgenic.
  • the cellular sample can be cultured on the support.
  • the methods comprise culturing the cellular sample on the support under a condition suitable for expanding the cellular sample for 2-10 generations or more.
  • the cultured cellular sample can generate a colony of cells.
  • the methods comprise culturing the cellular sample to confluence or nonconfluence.
  • the methods comprise culturing the cellular sample on the support in a simple or complex cell culture media.
  • the cell culture media comprises D-MEM high glucose (e.g., from Thermo Fisher Scientific, catalog No.
  • fetal bovine serum e.g., 10% FBS; for example from Thermo Fisher Scientific, catalog No. A3160402
  • MEM non-essential amino acids e.g., 0.1 mM MEM, for example from Thermo Fisher Scientific, catalog No. 11140050
  • L-glutamine e.g., 6 mM L-glutamine, for example from Thermo Fisher Scientific, catalog No. A2916801
  • MEM sodium pyruvate e.g., 1 mM sodium pyruvate, for example from Thermo Fisher Scientific, catalog No.
  • the methods comprise culturing the cellular sample at a humidity and temperature that is suitable for culturing the cell(s) on the support.
  • exemplary suitable conditions comprise approximately 37 °C with a humidified atmosphere of approximately 5-10% carbon dioxide in air.
  • the cellular sample can be cultured with suitable aeration with oxygen and/or nitrogen.
  • simple cell media refers to a cell media that typically lacks ingredients to support cell growth and/or proliferation in culture.
  • Simple cell media can be used for example to wash, suspend, or dilute the cellular sample.
  • Simple cell media can be mixed with certain ingredients to prepare a cell media that can support cell growth and/or proliferation in culture.
  • a simple cell media comprises any one or any combination of two or more of a buffer, a phosphate compound, a sodium compound, a potassium compound, a calcium compound, a magnesium compound and/or glucose.
  • the simple cell media comprises PBS (phosphate buffered saline), DPBS (Dulbecco’s phosphate-buffered saline), HBSS (Hank’s balanced salt solution), DMEM (Dulbecco’s Modified Eagle’s Medium), EMEM (Eagle’s Minimum Essential Medium), and/or EBSS.
  • the cellular sample can be placed in a simple cell media prior to or during the step of conducting any of the nucleic acid methods described herein.
  • complex cell media refers to a cell media that can be used to support cell growth and/or proliferation in culture without supplementation or additives.
  • Complex cell media can include any combination of two or more of a buffering system (e.g., HEPES), inorganic salt(s), amino acid(s), protein(s), polypeptide(s), carbohydrate(s), fatty acid(s), lipid(s), purine(s) and their derivatives (e.g., hypoxanthine), pyrimidine(s) and their derivatives, and/or trace element(s).
  • a buffering system e.g., HEPES
  • inorganic salt(s) amino acid(s), protein(s), polypeptide(s), carbohydrate(s), fatty acid(s), lipid(s), purine(s) and their derivatives (e.g., hypoxanthine), pyrimidine(s) and their derivatives, and/or trace element(s).
  • Complex cell media includes fluids obtained from a fluid or tissue extract
  • complex cell media can be a serum-containing media, for example complex cell media includes fluids such as fetal bovine serum, blood plasma, blood serum, lymph fluid, human placental cord serum and amniotic fluid.
  • complex cell media can be a serum-free media, which are typically (but not necessarily) defined cell culture media.
  • complex cell media can be a chemically-defined media which typically (but not necessarily) include recombinant polypeptides, and ultra-pure inorganic and/or organic compounds.
  • complex cell media can be a protein- free media which include for example MEM (minimal essential media) and RPMI-1640 (Roswell Park Memorial Institute).
  • the complex cell media comprises IMDM (Iscove’s Modified Dulbecco’s Medium. In some embodiments, the complex cell media comprises DMEM (Dulbecco’s Modified Eagle’s Medium). In some embodiments, the cellular sample can be placed in a complex cell media prior to or during the step of conducting any of the nucleic acid methods described herein.
  • the cellular sample comprises a fixed cellular sample.
  • the cellular sample can be treated with a fixation reagent (e.g., a fixing reagent) that preserves the cell and its contents to inhibit degradation and can inhibit cell lysis.
  • a fixation reagent e.g., a fixing reagent
  • the fixation reagent can preserve RNA harbored by the cellular sample.
  • the fixation reagent inhibits loss of nucleic acids from the cellular sample.
  • the fixation reagent can cross-link the RNA to prevent the RNA from escaping the cellular sample.
  • a cross-linking fixation reagent comprises any combination of an aldehyde, formaldehyde, paraformaldehyde, formalin, glutaraldehyde, imidoesters, N-hydroxysuccinimide esters (NHS) and/or glyoxal (a bifunctional aldehyde).
  • the fixation reagent comprises at least one alcohol, including methanol or ethanol. In some embodiments, the fixation reagent comprises at least one ketone, including acetone. In some embodiments, the fixation reagent comprises acetic acid, glacial acetic acid and/or picric acid. In some embodiments, the fixation reagent comprises mercuric chloride. In some embodiments, the fixation reagent comprises a zinc salt comprising zinc sulphate or zinc chloride. In some embodiments, the fixation reagent can denature polypeptides.
  • the fixation reagent comprises 4% w/v of paraformaldehyde to water/PBS. In some embodiments, the fixation reagent comprises 10% of 35% formaldehyde at a neutral pH. In some embodiments, the fixation reagent comprises 2% v/v of glutaraldehyde to water/PBS. In some embodiments, the fixation reagent comprises 25% of 37% formaldehyde solution, 70% picric acid and 5% acetic acid.
  • the cellular sample can be fixed on the support with 4% paraformaldehyde for about 30-60 minutes and washed with PBS.
  • the cellular sample can be stained, de-stained or unstained.
  • the cellular sample comprises a permeabilized cellular sample.
  • the methods comprise treating the cellular sample with a permeabilization reagent that alters the cell membrane to permit penetration of experimental reagents into the cells.
  • the permeabilization reagent removes membrane lipids from the cell membrane.
  • the cellular sample can be treated with a permeabilization reagent which comprises any combination of an organic solvent, detergent, chemical compound, cross-linking agent and/or enzyme.
  • the organic solvents comprise acetone, ethanol, and methanol.
  • the detergents comprise saponin, Triton X-100, Tween-20, sodium dodecyl sulfate (SDS), an N-lauroylsarcosine sodium salt solution, or a nonionic polyoxyethylene surfactant (e.g., NP40).
  • the crosslinking agent comprises paraformaldehyde.
  • the enzyme comprises trypsin, pepsin or protease (e.g. proteinase K).
  • the cells can be permeabilized using an alkaline condition, or an acidic condition with a protease enzyme.
  • the permeabilization reagent comprises water and/or PBS.
  • the fixed cells can be permeabilized with 70% ethanol for about 30- 60 minutes, and the permeabilizing reagent can be exchanged with PBS-T (e.g., PBS with 0.05% Tween-20).
  • PBS-T e.g., PBS with 0.05% Tween-20
  • the cells can be post-fixed with 3% paraformaldehyde and 0.1% glutaraldehyde for about 30-60 minutes, and washed with PBS-T multiple times.
  • the cellular sample is infused with a swellable polyelectrolyte hydrogel (U.S. patent No. 10,309,879 and Chen 2015 Science 347:543, the contents of these documents are incorporated by reference in their entireties).
  • a fixed and permeabilized cellular sample can be infused with sodium acrylate, acrylamide and a cross-linker N-N’- methylenebisacrylamide.
  • ammonium persulfate (APS) initiator and tetramethylethylenediamine (TEMED) accelerator were infused to achieve polymerization.
  • the cellular sample can be infused with proteinase K for proteolysis and incubated in a digestion buffer.
  • the gel inside the cellular sample can be swelled by addition of water.
  • the plurality of RNAs inside cellular sample can be converted to cDNA.
  • the methods comprise contacting the plurality of RNA inside the fixed and permeabilized cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample.
  • synthesis of second strand cDNA molecules is omitted.
  • the RNA inside the cellular sample is not converted into cDNA, where the RNA is hybridized to targetspecific padlock probes.
  • the reverse transcriptase enzyme exhibits RNA-dependent DNA polymerase activity.
  • the reverse transcriptase enzyme comprises a reverse transcriptase enzyme from AMV (avian myeloblastosis virus), M- MuLV (moloney murine leukemia virus), or HIV (human immunodeficiency virus).
  • the reverse transcriptase enzyme comprises a recombinant enzyme that exhibits reduced RNase H activity, for example REVERTAID (e.g., from Thermo Fisher Scientific, catalog No. EP0441).
  • the reverse transcriptase can be a commercially-available enzyme, including MULTISCRIBE (e.g., from Thermo Fisher Scientific, catalog # 4311235), THERMOSCRIPT (e.g., from Thermo Fisher Scientific, catalog # 12236-014), or ARRAYSCRIPT (e.g., from Ambion, catalog No. AM2048).
  • the reverse transcriptase enzyme comprises SUPERSCRIPT II (e.g., catalog No. 18064014), SUPERSCRIPT III (e g., catalog No. 18080044), or SUPERSCRIPT IV enzymes (e.g., catalog No. 18090010 ) (all SUPERSCRIPT enzymes from Invitrogen).
  • the reverse transcription reaction can include an RNase inhibitor.
  • the reverse transcription primers comprise a singlestranded oligonucleotide comprising DNA, RNA, or chimeric DNA/RNA.
  • the reverse transcription primers Any combination of adenine (A), thymine (T), guanine (G), cytosine (C), uracil (U) and/or inosine (I).
  • the reverse transcription primers can be any length, for example 5-25 bases, or 25-50 bases, or 50-75 bases, or 75-100 bases in length or longer.
  • the reverse transcription primers each comprise a 5’ end and 3’ end.
  • the 3’ end of the reverse transcription primers can include a 3’ OH moiety which serves as a nucleotide polymerization initiation site in a polymerase-catalyzed primer extension reaction.
  • the 3’ end of the reverse transcription primers have a chain terminating moiety which blocks a polymerase-catalyzed primer extension reaction. The chain terminating moiety can be removed to convert the 3’ sugar position to an extendible 3 ’OH.
  • the reverse transcription primers are modified to confer resistance to nuclease degradation (e.g., ribonuclease degradation).
  • the reverse transcription primers comprise at least one phosphorothioate diester bond at their 5’ ends which can render the reverse transcription primers resistant to nuclease degradation.
  • the reverse transcription primers comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends.
  • the plurality of reverse transcription primers comprise at least one ribonucleotide and/or at least one 2’-O-methyl, 2’ -O-m ethoxy ethyl (MOE), 2’ fluoro-base nucleotide.
  • the reverse transcription primers comprise phosphorylated 3’ ends. In some embodiments, the reverse transcription primers comprise locked nucleic acid (LNA) bases. In some embodiments, the reverse transcription primers comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
  • LNA locked nucleic acid
  • the entire length of a reverse transcription primer can hybridize to a portion of an RNA molecule.
  • individual reverse transcription primers comprise a 3’ region having a sequence that hybridizes to a portion of an RNA molecule and a 5’ region that carries a tail that does not hybridize to an RNA molecule.
  • the 5’ tail comprises a universal adaptor sequence including any one or any combination of two or more of a sample barcode sequence, an amplification primer binding site, a sequencing primer binding site, a compaction oligonucleotide binding site and/or a surface capture primer binding site.
  • the 5’ tail comprises a unique identification sequence (e.g., unique molecular index (UMI).
  • the 5’ tail comprises a restriction enzyme recognition sequence.
  • individual reverse transcription primers comprise at least a portion of the 3’ region having a homopolymer sequence, for example poly-A, poly-T, poly-C, poly-G or poly-U.
  • the reverse transcription primers can hybridize to any portion of an RNA molecule, including the 5’ or the 3’ end of the RNA molecule, or an internal portion of the RNA molecule.
  • the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA (e.g., targeted transcriptomics). In some embodiments, the plurality of reverse transcription primers further comprise a second sub-population of target-specific reverse transcription primers that hybridize selectively to the second target RNA. In some embodiments, the target-specific reverse transcription primers comprise a pre-determined sequence at the 3’ region which hybridizes to a target RNA molecule. In some embodiments, the pre-determined sequence portion of the reverse transcription primers can be 4-20 bases, or 20-40 bases, or 40-50 bases in length.
  • the first sub-population of target-specific reverse transcription primers can selectively hybridize to an RNA transcribed in the cellular sample by a housekeeping gene.
  • selection of the housekeeping gene may be dependent upon the type of cellular sample to be used for the in situ methods described herein.
  • Exemplary housekeeping genes include glyceraldehyde-3 -phosphate dehydrogenase (GAPDH), beta-actins (ACTB), tubulins, PPIA (peptidyl-prolyl cis-trans isomerase), NME4 (NME/NM23 nucleoside diphosphate kinase 4), SMARCAL1 (SWI/SNF related matrix associated actin dependent regulator of chromatin, subfamily A like 1), and POMK (protein-O-mannose kinase).
  • GPDH glyceraldehyde-3 -phosphate dehydrogenase
  • ACTB beta-actins
  • tubulins tubulins
  • PPIA peptidyl-prolyl cis-trans isomerase
  • NME4 NME/NM23 nucleoside diphosphate kinase 4
  • SMARCAL1 SWI/SNF related matrix associated actin dependent regulator of chromatin, subfamily A like 1
  • the second sub-population of target-specific reverse transcription primers can selectively hybridize to an RNA transcribed from a gene that is expressed in the cellular sample being examined (e.g., a cell-specific or tissue-specific RNA).
  • the plurality of reverse transcription primers comprises a first sub-population of random-sequence reverse transcription primers that hybridize to the first target RNA (e.g., whole transcriptomics).
  • the plurality of reverse transcription primers further comprises a second sub-population of randomsequence reverse transcription primers that hybridize to the second target RNA.
  • the reverse transcription primers comprise a random and/or degenerate sequence at the 3’ region which hybridizes to an RNA molecule.
  • the random-sequence or the degenerate-sequence portion of the reverse transcription primers can be 4-20 bases, or 20-40 bases, or 40-50 bases in length.
  • sequencing polymerases can be used for conducting sequencing reactions.
  • the sequencing polymerase(s) is/are capable of binding and incorporating a complementary nucleotide opposite a nucleotide in a concatemer template molecule.
  • the sequencing polymerase(s) is/are capable of binding a complementary nucleotide unit of a multivalent molecule opposite a nucleotide in a concatemer template molecule.
  • the plurality of sequencing polymerases comprise recombinant mutant polymerases.
  • suitable polymerases for use in sequencing with nucleotides and/or multivalent molecules include but are not limited to: Klenow DNA polymerase; Thermus aquaticus DNA polymerase I (Taq polymerase); KlenTaq polymerase; Candidatus altiarchaeales archaeon; Candidatus Hadarchaeum Yellowstonense; Hadesarchaea archaeon; Euryarchaeota archaeon; Thermoplasmata archaeon; Thermococcus polymerases such as Thermococcus litoralis, bacteriophage T7 DNA polymerase; human alpha, delta and epsilon DNA polymerases; bacteriophage polymerases such as T4, RB69 and phi29 bacteriophage DNA polymerases; Pyrococcus furiosus DNA polymerase (Pfu polymerase); Bacillus subtilis DNA polymerase III; E.
  • Klenow DNA polymerase Thermus aquaticus
  • coli DNA polymerase III alpha and epsilon 9 degree N polymerase
  • reverse transcriptases such as HIV type M or O reverse transcriptases
  • avian myeloblastosis virus reverse transcriptase Moloney Murine Leukemia Virus (MMLV) reverse transcriptase
  • MMLV Moloney Murine Leukemia Virus
  • DNA polymerases include those from various Archaea genera, such as, Aeropyrum, Archaeglobus, Desulfurococcus, Pyrobaculum, Pyrococcus, Pyrolobus, Pyrodictium, Staphylothermus, Stetteria, Sulfolobus, Thermococcus, and Vulcanisaeta and the like or variants thereof, including such polymerases as are known in the art such as 9 degrees N, VENT, DEEP VENT, THERMINATOR, Pfu, KOD, Pfx, Tgo and RB69 polymerases.
  • Archaea genera such as, Aeropyrum, Archaeglobus, Desulfurococcus, Pyrobaculum, Pyrococcus, Pyrolobus, Pyrodictium, Staphylothermus, Stetteria, Sulfolobus, Thermococcus, and Vulcanisaeta and the like or variants thereof, including such polymerases as
  • the sequencing comprises conducting sequencing-by-binding (SBB) reactions inside the cellular sample, where the cDNA amplicons are the concatemer molecules.
  • the sequencing-by- binding (SBB) procedure employs non-labeled chain-terminating nucleotides.
  • a cycle of sequencing-by-binding comprises the steps of (a) sequentially contacting a primed concatemer (e.g., a concatemer annealed to a plurality of sequencing primers) with at least two separate mixtures under ternary complex stabilizing conditions, wherein the at least two separate mixtures each include a polymerase and a nucleotide, whereby the sequentially contacting results in the primed concatemer being contacted, under the ternary complex stabilizing conditions, with nucleotide cognates for first, second and third base type base types in the template; (b) examining the at least two separate mixtures to determine whether a ternary complex formed; and (c) identifying the next correct nucleotide for the primed concatemer, wherein the next correct nucleotide is identified as a cognate of the first, second or third base type if ternary complex is detected in step (b), and wherein the next correct nucleotide is imputed to be
  • any of the sequencing methods described herein can employ at least one nucleotide.
  • the nucleotides comprise a base, sugar and at least one phosphate group.
  • at least one nucleotide in the plurality comprises an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and one or more phosphate groups (e.g., 1-10 phosphate groups).
  • the plurality of nucleotides can comprise at least one type of nucleotide selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP.
  • the plurality of nucleotides can comprise at a mixture of any combination of two or more types of nucleotides selected from a group consisting of dATP, dGTP, dCTP, dTTP and/or dUTP.
  • at least one nucleotide in the plurality is not a nucleotide analog.
  • at least one nucleotide in the plurality comprises a nucleotide analog.
  • At least one nucleotide in the plurality of nucleotides comprise a chain of one, two or three phosphorus atoms where the chain is typically attached to the 5’ carbon of the sugar moiety via an ester or phosphoramide linkage.
  • at least one nucleotide in the plurality is an analog having a phosphorus chain in which the phosphorus atoms are linked together with intervening O, S, NH, methylene or ethylene.
  • the phosphorus atoms in the chain include substituted side groups including O, S or BH3.
  • the chain includes phosphate groups substituted with analogs including phosphoramidate, phosphorothioate, phosphordithioate, and O-methylphosphoroamidite groups.
  • At least one nucleotide in the plurality of nucleotides comprises a terminator nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety can inhibit polymerase-catalyzed incorporation of a subsequent nucleotide unit or free nucleotide in a nascent strand during a primer extension reaction.
  • the chain terminating moiety is attached to the 3’ sugar hydroxyl position where the sugar comprises a ribose or deoxyribose sugar moiety. In some embodiments, the chain terminating moiety is removable/cleavable from the 3’ sugar hydroxyl position to generate a nucleotide having a 3 ’OH sugar group which is extendible with a subsequent nucleotide in a polymerase-catalyzed nucleotide incorporation reaction.
  • the chain terminating moiety comprises an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group.
  • the chain terminating moiety is cleavable/removable from the nucleotide, for example by reacting the chain terminating moiety with a chemical agent, pH change, light or heat.
  • the chain terminating moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3-Dichloro-5,6-dicyano-l,4-benzo-quinone (DDQ).
  • the chain terminating moieties aryl and benzyl are cleavable with H2 Pd/C.
  • the chain terminating moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including betamercaptoethanol or dithiothritol (DTT).
  • the chain terminating moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH).
  • the chain terminating moieties urea and silyl are cleavable with tetrabutyl ammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
  • At least one nucleotide in the plurality of nucleotides comprises a terminator nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety comprises an azide, azido or azidomethyl group.
  • the chain terminating moiety comprises a 3’-O-azido or 3’-O-azidomethyl group.
  • the chain terminating moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound.
  • the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety.
  • the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP).
  • the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
  • the nucleotide comprises a chain terminating moiety which is selected from a group consisting of 3’-deoxy nucleotides, 2’,3’-dideoxynucleotides, 3’-methyl, 3’-azido, 3’- azidom ethyl, 3’-O-azidoalkyl, 3’-O-ethynyl, 3’-O-aminoalkyl, 3’-O-fluoroalkyl, 3’- fluorom ethyl, 3 ’-difluoromethyl, 3 ’-trifluoromethyl, 3 ’-sulfonyl, 3 ’-malonyl, 3 ’-amino, 3’-O-amino, 3’-sulfhydral, 3 ’-aminomethyl, 3’-ethyl, 3’butyl, 3" -tert butyl
  • the plurality of nucleotides comprises a plurality of nucleotides labeled with detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • the fluorophore is attached to the nucleotide base.
  • the fluorophore is attached to the nucleotide base with a linker which is cleavable/removable from the base.
  • at least one of the nucleotides in the plurality is not labeled with a detectable reporter moiety.
  • a particular detectable reporter moiety e.g., fluorophore
  • the nucleotide base e.g., dATP, dGTP, dCTP, dTTP or dUTP
  • the nucleotide base e.g., dATP, dGTP, dCTP, dTTP or dUTP
  • the cleavable linker on the nucleotide base comprises a cleavable moiety comprising an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group.
  • the cleavable linker on the base is cleavable/removable from the base by reacting the cleavable moiety with a chemical agent, pH change, light or heat.
  • the cleavable moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3-Dichloro-5,6-dicyano-l,4-benzo-quinone (DDQ).
  • the cleavable moieties aryl and benzyl are cleavable with H2 Pd/C.
  • the cleavable moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including beta-mercaptoethanol or dithiothritol (DTT).
  • the cleavable moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH).
  • the cleavable moieties urea and silyl are cleavable with tetrabutylammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
  • the cleavable linker on the nucleotide base comprises cleavable moiety including an azide, azido or azidomethyl group.
  • the cleavable moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound.
  • the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety.
  • the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP).
  • the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
  • the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the cleavable linker on the nucleotide base have the same or different cleavable moieties.
  • the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the detectable reporter moiety linked to the base are chemically cleavable/removable with the same chemical agent.
  • the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the detectable reporter moiety linked to the base are chemically cleavable/removable with different chemical agents.
  • the sequencing employs at least one multivalent molecule which comprises a plurality of nucleotide arms attached to a core and having any configuration including a starburst, helter skelter, or bottle brush configuration (e.g., FIG. 16).
  • the multivalent molecule comprises: (1) a core; and (2) a plurality of nucleotide arms which comprise (i) a core attachment moiety, (ii) a spacer comprising a PEG moiety, (iii) a linker, and (iv) a nucleotide unit, wherein the core is attached to the plurality of nucleotide arms, wherein the spacer is attached to the linker, wherein the linker is attached to the nucleotide unit.
  • the nucleotide unit comprises a base, sugar and at least one phosphate group, and the linker is attached to the nucleotide unit through the base.
  • the linker comprises an aliphatic chain or an oligo ethylene glycol chain where both linker chains having 2-6 subunits. In some embodiments, the linker also includes an aromatic moiety.
  • An exemplary nucleotide arm is shown in FIG. 20. Exemplary multivalent molecules are shown in FIGS. 16-19. An exemplary spacer is shown in FIG. 21 (top) and exemplary linkers are shown in FIG. 21 (bottom) and FIG. 22. Exemplary nucleotides attached to a linker are shown in FIGS. 23 A-23D. An exemplary biotinylated nucleotide arm is shown in FIG. 24.
  • a multivalent molecule comprises a core attached to multiple nucleotide arms, and wherein the multiple nucleotide arms have the same type of nucleotide unit which is selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP.
  • a multivalent molecule comprises a core attached to multiple nucleotide arms, where each arm includes a nucleotide unit.
  • the nucleotide unit comprises an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and one or more phosphate groups (e.g., 1-10 phosphate groups).
  • the plurality of multivalent molecules can comprise one type multivalent molecule having one type of nucleotide unit selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP.
  • the plurality of multivalent molecules can comprise at a mixture of any combination of two or more types of multivalent molecules, where individual multivalent molecules in the mixture comprise nucleotide units selected from a group consisting of dATP, dGTP, dCTP, dTTP and/or dUTP.
  • the nucleotide unit comprises a chain of one, two or three phosphorus atoms where the chain is typically attached to the 5’ carbon of the sugar moiety via an ester or phosphoramide linkage.
  • at least one nucleotide unit is a nucleotide analog having a phosphorus chain in which the phosphorus atoms are linked together with intervening O, S, NH, methylene or ethylene.
  • the phosphorus atoms in the chain include substituted side groups including O, S or BH3.
  • the chain includes phosphate groups substituted with analogs including phosphoramidate, phosphorothioate, phosphordithioate, and O-methylphosphoroamidite groups.
  • the multivalent molecule comprises a core attached to multiple nucleotide arms, and wherein individual nucleotide arms comprise a nucleotide unit which is a nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the nucleotide unit comprises a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety can inhibit polymerase- catalyzed incorporation of a subsequent nucleotide unit or free nucleotide in a nascent strand during a primer extension reaction.
  • the chain terminating moiety is attached to the 3’ sugar hydroxyl position where the sugar comprises a ribose or deoxyribose sugar moiety.
  • the chain terminating moiety is removable/cleavable from the 3’ sugar hydroxyl position to generate a nucleotide having a 3 ’OH sugar group which is extendible with a subsequent nucleotide in a polymerase- catalyzed nucleotide incorporation reaction.
  • the chain terminating moiety comprises an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group.
  • the chain terminating moiety is cleavable/removable from the nucleotide unit, for example by reacting the chain terminating moiety with a chemical agent, pH change, light or heat.
  • the chain terminating moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3- Dichl oro-5, 6-di cyano- 1,4-benzo-quinone (DDQ).
  • the chain terminating moieties aryl and benzyl are cleavable with H2 Pd/C.
  • the chain terminating moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including beta-mercaptoethanol or dithiothritol (DTT).
  • the chain terminating moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH).
  • the chain terminating moieties urea and silyl are cleavable with tetrabutyl ammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
  • the nucleotide unit comprises a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position.
  • the chain terminating moiety comprises an azide, azido or azidomethyl group.
  • the chain terminating moiety comprises a 3’-O-azido or 3 ’-0 -azidomethyl group.
  • the chain terminating moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound.
  • the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety.
  • the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP).
  • the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
  • the nucleotide unit comprising a chain terminating moiety which is selected from a group consisting of 3’-deoxy nucleotides, 2’,3’- dideoxynucleotides, 3 ’-methyl, 3 ’-azido, 3 ’-azidomethyl, 3’-O-azidoalkyl, 3’-O-ethynyl, 3’-O-aminoalkyl, 3’-O-fluoroalkyl, 3 ’-fluoromethyl, 3 ’-difluoromethyl, 3’- trifluoromethyl, 3 ’-sulfonyl, 3 ’-malonyl, 3 ’-amino, 3’-O-amino, 3’-sulfhydral, 3’- aminomethyl, 3’-ethyl, 3’butyl, 3" -tert butyl, 3’- Fluorenylmethyloxy carbonyl,
  • the multivalent molecule comprises a core attached to multiple nucleotide arms, wherein the nucleotide arms comprise a spacer, linker and nucleotide unit, and wherein the core, linker and/or nucleotide unit is labeled with detectable reporter moiety.
  • the detectable reporter moiety comprises a fluorophore.
  • a particular detectable reporter moiety e.g., fluorophore
  • the multivalent molecule can correspond to the base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) of the nucleotide unit to permit detection and identification of the nucleotide base.
  • At least one nucleotide arm of a multivalent molecule has a nucleotide unit that is attached to a detectable reporter moiety.
  • the detectable reporter moiety is attached to the nucleotide base.
  • the detectable reporter moiety comprises a fluorophore.
  • a particular detectable reporter moiety (e.g., fluorophore) that is attached to the multivalent molecule can correspond to the base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) of the nucleotide unit to permit detection and identification of the nucleotide base.
  • the core of a multivalent molecule comprises an avidin-like or streptavidin-like moiety and the core attachment moiety comprises biotin.
  • the core comprises a streptavidin-type or avidin-type moiety which includes an avidin protein, as well as any derivatives, analogs and other non-native forms of avidin that can bind to at least one biotin moiety.
  • Other forms of avidin moieties include native and recombinant avidin and streptavidin as well as derivatized molecules, e.g. non-glycosylated avidin and truncated streptavidins .
  • avidin moiety includes de-glycosylated forms of avidin, bacterial streptavidin produced by Streptomyces (e.g., Streptomyces avidinii), as well as derivatized forms, for example, N-acyl avidins, e.g., N-acetyl, N-phthalyl and N-succinyl avidin, and the commercially- available products EXTRAVIDIN, CAPTAVIDIN, NEUTRAVIDIN and NEUTRALITE AVIDIN.
  • any of the methods for sequencing nucleic acid molecules described herein can include forming a binding complex, where the binding complex comprises (i) a polymerase, a nucleic acid concatemer molecule duplexed with a primer, and a nucleotide, or the binding complex comprises (ii) a polymerase, a nucleic acid concatemer molecule duplexed with a primer, and a nucleotide unit of a multivalent molecule.
  • the binding complex has a persistence time of greater than about 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1 second.
  • the binding complex has a persistence time of greater than about 0.1-0.25 seconds, or about 0.25-0.5 seconds, or about 0.5-0.75 seconds, or about 0.75-1 second, or about 1-2 seconds, or about 2-3 seconds, or about 3-4 second, or about 4-5 seconds, and/or wherein the method is or may be carried out at a temperature of at or above 15 °C, at or above 20 °C, at or above 25 °C, at or above 35 °C, at or above 37 °C, at or above 42 °C at or above 55 °C at or above 60 °C, or at or above 72 °C, or at or above 80 °C, or within a range defined by any of the foregoing.
  • the binding complex (e.g., ternary complex) remains stable until subjected to a condition that causes dissociation of interactions between any of the polymerase, template molecule, primer and/or the nucleotide unit or the nucleotide.
  • a dissociating condition comprises contacting the binding complex with any one or any combination of a detergent, EDTA and/or water.
  • the present disclosure provides said method wherein the binding complex is deposited on, attached to, or hybridized to, a surface showing a contrast to noise ratio in the detecting step of greater than 20.
  • the present disclosure provides said method wherein the contacting is performed under a condition that stabilizes the binding complex when the nucleotide or nucleotide unit is complementary to a next base of the template nucleic acid, and destabilizes the binding complex when the nucleotide or nucleotide unit is not complementary to the next base of the template nucleic acid.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Disclosed herein are sequencing systems and sequencing methods for training neural networks and for utilizing the trained neural networks for sequencing analysis after acquiring flow cell images using the sequencing systems. The sequencing systems disclosed herein can include Field-Programmable Gate Array (FPGAs), artificial intelligence (AI) chips, or a combination thereof.

Description

THREE-DIMENSIONAL BASE CALLING IN NEXT GENERATION SEQUENCING ANALYSIS
CROSS-REFERENCE
[0001] This application claims the benefit of U.S. Provisional Application No. 63/549,327, filed Feb. 2, 2024, U.S. Provisional Application No. 63/549,333, filed Feb. 2, 2024, U.S. Provisional Application No. 63/570,038, filed Mar. 26, 2024, U.S. Provisional Application No. 63/661,332 , filed Jun. 18, 2024, U.S. Provisional Application No. 63/724,712, filed Nov. 25, 2024, and U.S. Provisional Application No. 63/736,743, filed Dec. 20, 2024, each of which is incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0002] Embodiments of this disclosure relate generally to image processing and base calling in sequencing data analysis, and particularly to three-dimensional (3D) images of in situ samples.
BACKGROUND
[0003] In next-generation sequencing (NGS) or NGS-like applications such as sequencing by synthesis, sequencing by binding, or sequencing by avidity, in order to identify the sequence of a target nucleic acid, a new strand is synthesized one nucleotide base at a time. During each sequencing cycle, one base attaches to any given strand. At the imaging step of each cycle, image(s) are recorded. A base-calling algorithm is applied to the image(s) to “read” the successive signals from each cluster or polony and convert the optical signals into an identification of the nucleotide base sequence added to each DNA fragment. Traditional sequencing data analysis relies on two-dimensional (2D) flow cell images. When it comes to sequencing analysis of in situ samples such as cells or tissue, the sample has a thickness along the z direction orthogonal to the image plane. As such, flow cell images at a selected z level can include signals from out-of-focus polonies located at adjacent z levels and other undesired signals, e.g., from the cell membrane. There is a need for fast and accurate three-dimensional (3D) flow cell image processing and base calling to ensure reliable base calling and sequencing analysis of 3D samples such as cells and tissue. BRIEF SUMMARY
[0004] Provided herein are system, apparatus, method, and/or computer program product embodiments, and/or combinations and sub-combinations thereof which enables fast and accurate flow cell image processing for reliable and accurate base calling of samples such as in situ cells or tissue. The flow cell images can come from different sequencing cycles and/or different channels.
[0005] As a particular application of such, embodiments of methods, systems, and media for image processing of flow cell images of 3D volumetric samples, e.g., cells or tissue, so that the image intensity, location and/or size of clusters or polonies can be relied upon for accurate base calling. The image processing methods herein may function to reverse the imaging process of an optical system and virtually improve the full width half maximum (FWHM) of the optical system. As such, the image processing methods disclosed herein may advantageously increase detectable density of polonies or clusters in 3D samples or traditional 2D samples. The methods herein may advantageously lessen the impact of color mixing of polonies that may be caused by neighboring polonies in 2D or 3D dimensions by computationally increasing the spatial resolution of the flow cell images.
[0006] In some embodiments, a neural network, e.g., a convolutional neural network, is used in generating a high-resolution z-stack of flow cell images of the 3D sample from the low-resolution z-stack that has been acquired from the sequencing system, and subsequent primary analysis can be performed based on the high-resolution flow cell images instead of the low-resolution flow cell images. In some embodiments, the neural network, e.g., a convolutional neural network, is used in image processing of the high- resolution z-stacks of flow cell images of the samples to generate the base callings.
[0007] Embodiments of these aspects include corresponding computer systems, apparatus, and computer program product recorded on computer storage device(s), which, alone or in combination, configured to perform the operations of the methods. For a computer system configured or to be configured to perform operations, the computer system has installed on it software, firmware, hardware, or their combinations that in operation cause the computer system to perform the operations or actions. For a computer program product configured or to be configured to perform operations or actions, the computer program product includes instructions that, when executed, by a hardware processor, cause the hardware processor to perform the operations or actions.
[0008] Further embodiments, features, and advantages of the present disclosure, as well as the structure and operation of the various embodiments of the present disclosure, are described in detail below with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present disclosure and, together with the description, further serve to explain the principles of the disclosure and to enable a person skilled in the art(s) to make and use the embodiments.
[0010] FIG. 1 illustrates a block diagram of a sequencing system for performing sequencing, flow cell image processing, and/or primary analysis operations including base calling using flow cell images, according to some embodiments.
[0011] FIGS. 2A-2C show an exemplary simulated flow cell image (FIG. 2A, of a in situ cell sample) and two different images (FIGS. 2B-2C) predicted using the systems and methods herein and corresponding to the image in FIG. 2A, according to some embodiments.. The predicted images are at different z levels.
[0012] FIGS. 2D-2E show exemplary simulated flow cell image in the reference set. The simulated flow cell images are generated using the methods herein with the first (FIG. 2D) and second (FIG. 2E) resolutions, according to some embodiments.
[0013] FIGS. 3A-3D show two exemplary flow cell images (FIGS. 3A and 3D) with multiple cells at two different z levels, and two different predicted images at different z levels (FIGS. 3B-3C) generated from the image in FIG. 3A using the systems and methods herein according to some embodiments.
[0014] FIGS. 3E-3F shows improved detection of targets per cell in the same imaging area (FIG. 3E) and fewer false positives (FIG. 3F) using the methods herein when compared with non-artificial intelligence-based methods; in this case, the targets are polonies or clusters within the cells.
[0015] FIG. 3G shows improved detection of targets in simulated flow cell images of sample(s) using the neural network herein which produces higher R2 value than a traditional method. [0016] FIG. 4 illustrates a block diagram of a computer system for performing image processing, sequencing analysis, training of neural network(s), predicting base calls, image intensities, high resolution images, and/or classifications using the pre-trained neural networks, and/or base calling, according to some embodiments.
[0017] FIG. 5A is a flow chart of an exemplary method of predicting 3D flow cell images of sequencing sample(s) and performing base calling using the 3D flow cell images, according to some embodiments.
[0018] FIG. 5B is a flow chart of an exemplary method of training a neural network that can be used to predict higher resolution flow cell images of sequencing sample(s), according to some embodiments.
[0019] FIG. 5C is a schematic showing of an exemplary embodiment of the first reconfigurable logic device, the integrated circuit, and their connection(s) to the processor of the sequencing system.
[0020] FIG. 5D is a schematic showing of an exemplary embodiment of using the first reconfigurable logic device and the integrated circuit in parallel with a sequencing run in progress within a predetermined time window.
[0021] FIG. 5E is a flow chart of an exemplary method of training a neural network, thereby generating a pre-trained neural network that can be used to predict higher resolution flow cell images of sequencing sample(s), base calls, intensities, and/or classifications, according to some embodiments.
[0022] FIG. 5F shows scatter plots for an exemplary embodiment of generating reference intensities from high resolution training flow cell images.
[0023] FIG. 6 is a schematic showing exemplary embodiments of padlock probes.
[0024] FIG. 7 is a schematic showing a workflow for generating inside a cell circularized padlock probes, comprising generating first and second cDNAs from first and second target RNA molecules (respectively), hybridizing first and second padlock probes to the first and second cDNA molecules (respectively) to generate first and second circularized padlock probes (respectively).
[0025] FIG. 8 is a schematic showing a rolling circle and sequencing workflow inside a cell, comprising generating first and second concatemers by conducting rolling circle amplification using first and second covalently closed circular molecules (respectively). The first and second concatemers are subjected to a sequencing workflow using universal sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents. [0026] FIG. 9 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
[0027] FIG. 10 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
[0028] FIG. 11 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
[0029] FIG. 12 is a schematic showing an exemplary workflow for sequencing a concatemer that is generated inside the cell.
[0030] FIG. 13 is a schematic showing a workflow for generating circularized padlock probes, comprising generating first and second cDNAs from first and second target RNA molecules (respectively), hybridizing first and second padlock probes to the first and second cDNA molecules (respectively) to generate first and second circularized padlock probes (respectively).
[0031] FIG. 14 is a schematic showing a rolling circle and sequencing workflow comprising generating first and second concatemers by conducting rolling circle amplification using first and second covalently closed circular molecules (respectively).
[0032] FIG. 15 is a schematic of an exemplary low binding support comprising a glass substrate and alternating layers of hydrophilic coatings which are covalently or non- covalently adhered to the glass, and which further comprises chemically-reactive functional groups that serve as attachment sites for oligonucleotide primers (e.g., capture oligonucleotides).
[0033] FIG. 16 is a schematic of various exemplary configurations of multivalent molecules. Left (Class I): schematics of multivalent molecules having a “starburst” or “helter-skelter” configuration. Center (Class II): a schematic of a multivalent molecule having a dendrimer configuration. Right (Class III): a schematic of multiple multivalent molecules formed by reacting streptavidin with 4-arm or 8-arm PEG-NHS with biotin and dNTPs. Nucleotide units are designated ‘N’, biotin is designated ‘B’, and streptavidin is designated ‘ SA’ .
[0034] FIG. 17 is a schematic of an exemplary multivalent molecule comprising a generic core attached to a plurality of nucleotide-arms.
[0035] FIG. 18 is a schematic of an exemplary multivalent molecule comprising a dendrimer core attached to a plurality of nucleotide-arms. [0036] FIG. 19 shows a schematic of an exemplary multivalent molecule comprising a core attached to a plurality of nucleotide-arms, where the nucleotide arms comprise biotin, spacer, linker and a nucleotide unit.
[0037] FIG. 20 is a schematic of an exemplary nucleotide-arm comprising a core attachment moiety, spacer, linker and nucleotide unit.
[0038] FIG. 21 shows the chemical structure of an exemplary spacer (top), and the chemical structures of various exemplary linkers, including an 11 -atom Linker, 16-atom Linker, 23-atom Linker and an N3 Linker (bottom).
[0039] FIG. 22 shows the chemical structures of various exemplary linkers, including Linkers 1-9.
[0040] FIG. 23 A shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
[0041] FIG. 23B shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
[0042] FIG. 23 C shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
[0043] FIG. 23D shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
[0044] FIG. 24 shows the chemical structure of an exemplary biotinylated nucleotide- arm.
[0045] FIG. 25 is a schematic of a guanine tetrad (e.g., G-tetrad).
[0046] FIG. 26 is a schematic of an exemplary intramolecular G-quadruplex structure.
[0047] FIG. 27 shows an exemplary support with multiple tiles for immobilizing 2D or 3D sample(s) thereon for sequencing, including the cellular sample(s), according to some aspects.
[0048] FIG. 28 shows a flow chart of an exemplary method of predicting base calls of the flow cell images (e.g., of in situ samples) using the neural network disclosed herein, according to some embodiments.
[0049] FIG. 29 shows a flow chart of an exemplary method of training the neural network that can be used to predict base calls or high resolution flow cell images, according to some embodiments.
[0050] FIGS. 30A-30B show a flow cell image (FIG. 30A) and its high resolution image predicted using the neural network that is pre-trained using reference base calls. In this case, base calls are determined from the high resolution image using non-neural network based algorithm(s).
[0051] FIG. 31 shows a block diagram of an exemplary method of training the neural network(s) and an exemplary method of predicting high resolution flow cell images and/or predicting base calls using such pretrained neural network(s).
[0052] In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
DETAILED DESCRIPTION
[0053] Provided herein are system, apparatus, method, and/or computer program product embodiments, and/or combinations and sub-combinations thereof which enables image processing of flow cell images, e.g., flow cell images obtained from in situ samples or traditional 2D samples in a sequencing run, to: 1) generate images with improved spatial resolution and improved detectable density of polonies or clusters and perform base calling using flow cell images with such improved spatial resolutions, and the generated images may be used for subsequent sequencing analysis including but not limited to base calling; or 2) to predict intensities, base call(s), or classifications of polonies or clusters. The techniques herein can be used while a sequence run is still in progress to improve efficiency of sequencing and sequencing analysis, reduce data storage required during sequencing and sequencing analysis, and improve accuracy and reliability of sequencing analysis. The techniques herein can be used on flow cell images obtained using various imaging and/or sequencing techniques of volumetric 3D samples and/or traditional 2D samples and/or obtained using various sequencing systems, e.g., next generation sequencing (NGS) systems. The techniques disclosed herein are useful for base calling in NGS, and NGS flow cell images will be used as the primary example herein for describing the application of these techniques. However, such image analysis techniques may also be useful in other applications where spot-detection and/or CCD imaging is used.
[0054] Traditional flow cell images can show clusters or polonies from 2D samples and base calling can be performed using their corresponding image intensities. Existing base calling algorithms may not be able to differentiate polonies or clusters from background noises or other signal spots that may have similar shape or size to polonies or clusters. There is a need for identifying polony or cluster locations for accurate and reliable base calling, preferably with improved spatial resolution than the resolution of flow cell images, especially for samples with increased polony or cluster density than traditional 2D samples. There is also a need for generating accurate and reliable intensities for polonies or clusters so that they can be used for accurate and reliable base callings. Further, there is a need to accurately predict base calling using information included in the flow cell images, such as polony locations, shapes, intensities, etc.
[0055] The techniques herein can be used for processing flow cell images (e.g., 2D or 3D) to generate accurate and reliable image intensities for polonies or clusters with improved spatial resolution thus improved maximum polony or cluster density detected in the sample(s) for accurate and reliable sequencing analysis. The technologies disclosed herein may advantageously function to reverse the imaging process of an optical system and virtually improve the full width half maximum (FWHM) of the imager so that the density of polony locations are not limited by the optical design of the sequencing systems. As such, the disclosed technologies herein may advantageously increase detected density of polonies, e.g., by 2x, 4x, 8x, 16x, 27x, 40x, 50x, lOOx or more than polony density detectable using traditional optical systems and image processing methods. In some embodiments, the disclosed technologies herein may advantageously increase spatial resolution of flow cell images in each of the one or more spatial dimensions by 2x, 4x, 8x, 16x, 27x, 40x, 50x, lOOx or more than flow cell images acquired using traditional optical systems and/or image processing methods. The methods herein may also advantageously lessen the impact of color mixing of polonies that may be caused by neighboring polonies or clusters by computationally increasing the spatial resolution of flow cell images.
[0056] In situ samples such as cells or tissue can have a thickness along the axial or z direction that cannot remain in-focus within a single 2D image. A z-stack of multiple 2D flow cell images may be acquired to cover clusters or polonies at different z levels, e.g., in a 3D cellular sample. Interferences may occur in the z-stack of flow cell images, such as out-of-focus polonies and background signal from cellular components. For example, a polony that locates at a first z level can appear in a first flow cell image at a first z level and it may also generate a blob of signal in a second 2D flow cell image taken at its adjacent z level where it is out-of-focus. The blob of signal may interfere with intensities of polonies at or near the same x-y location in the second flow cell image, thus deteriorating the accuracy and reliability of base callings. As another example, color mixing from neighboring polonies may interfere with polony intensity or polony density that can be detected for subsequent base calling. There is a need for identifying polony or cluster locations for accurate and reliable 3D base callings. There is also a need for generating accurate and reliable intensities for polonies or clusters from 3D volumetric samples so that they can be used for accurate and reliable base callings.
[0057] The techniques disclosed herein advantageously train a neural network to efficiently and accurately predict polony or cluster locations in the sample(s). The techniques disclosed herein advantageously train a neural network to efficiently and accurately predict high resolution intensities, base calls, and/or classifications for polonies or clusters in the sample(s). The samples herein are not limited to 3D samples, e.g., in situ cells and/or tissue. The samples herein may also include traditional 2D samples.
[0058] The techniques disclosed herein may advantageously utilize the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), to: 1) predict high-resolution polony or cluster locations based on low-resolution flow cell images; or 2) to predict intensities, base calls and/or classifications at the high-resolution for the polonies or clusters in the sample(s). The utilization of the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), on-board the sequencing system may advantageously reduce computational time, reduce energy consumption, improve sequencing analysis efficiency, reduce data storage space required, and reduce sequencing system cost in analysis of flow cell images when compared with sequencing analysis using existing sequencing systems.
[0059] The techniques disclosed herein advantageously train a neural network based on a loss function that is determined by comparison to reference base calls as ground truth, while the trained neural network may be used to accurately and reliably predict high resolution post image-processing flow cell images based on the flow cell images that are acquired from the sample(s). The techniques disclosed herein advantageously allow a mismatch in the training outputs and the prediction outputs. For example, the neural network may be trained by generating training base calls as training outputs and comparing the training outputs to reference base calls as ground truth. The trained neural network may then be used to predict high resolution flow cell images or to predict base calls. Such mismatching in training and prediction outputs may advantageously allow reference base calls to be considered in training parameters of the neural network and prediction of higher resolution higher quality version of the flow cell images that can be used to improve base calling accuracy and reliability. Such training and prediction advantageously enable utilization of a simplified neural network which requires less computational burden, reduction in computational time, reduction in power consumption, and reduction in making predictions.
[0060] The samples herein are not limited to 3D samples, e.g., in situ cells and/or tissue. The samples herein may also include traditional 2D samples. The techniques disclosed herein may advantageously utilize the reconfigurable logic device, e.g., FPGAs, and other integrated circuits, e.g., Al chips or neural processing units (NPUs), to perform one or more operations in the training and/or the prediction.
[0061] In DNA sequencing, identifying the centers of clusters or polonies is sometimes referred to as part of primary analysis. Primary analysis can include some or all of operations and/or steps needed to perform base calling and compute quality score of the base callings. Primary analysis can involve the formation of a template image for at least part of the flow cell. The template image can include the estimated locations of all detected clusters or polonies in a common coordinate system. The template image can include a polony map that is 2D or 3D. Template images are generated by identifying cluster or polony locations in all images in the first cycle or the first few cycles of the sequencing process. Generation of the template image may need sufficient spatial resolution to differentiate the polonies from background features, neighboring polonies, and/or duplicate polonies that are out-of-focus.
Sequencing systems
[0062] FIG. 1 illustrates a block diagram of a computer-implemented system 100, according to one or more embodiments disclosed herein. The system 100 has a sequencing system 110 that includes a flow cell 112, a sequencer 114, an imager 116, data storage 122, and user interface 124. The sequencing system 110 may be connected to a cloud 130. The sequencing system 110 may include one or more of dedicated processors 118, a first reconfigurable logic device, e.g., Field-Programmable Gate Array(s) (FPGAs) 120, and a computing system 126.
[0063] In some embodiments, the flow cell 112 is configured to capture DNA fragments and form DNA sequences for base-calling on the flow cell. The flow cell 112 can include a support as disclosed herein. The support can be a solid support. The support can include a surface coating thereon as disclosed herein. The surface coating can be a polymer coating as disclosed herein.
[0064] A flow cell 112 can include multiple tiles or imaging areas thereon, and each tile may be separated into a grid of subtiles. Each subtile can include a plurality of clusters or polonies immobilized thereon. As a nonlimiting example, a flow cell can have 424 tiles, and each tile can be divided into a 6 x 9 grid, therefore 54 subtiles. The flow cell image as disclosed herein can be an image including signals of a plurality of clusters or polonies. The flow cell image can include one or more tiles of signals or one or more subtiles of signals. In some embodiments, a flow cell image can be an image that includes all the tiles and approximately all signals thereon. The flow cell image can be acquired from a channel during an imaging or sequencing cycle using the imager 116. In some embodiments, each tile may include millions of polonies or clusters. As a nonlimiting example, a tile can include about 1 to 10 million of clusters or polonies. Each polony can be a collection of many copies of DNA fragments.
[0065] Depending on the sample(s) immobilized on the support (e.g., a flow cell), the flow cell images may be acquired using the imager 116 at single or multiple z levels along a z axis orthogonal to the image plane of the flow cell images. In particular, for three dimensional samples, e.g., cells, tissues, or other in situ samples, the flow cell images can include multiple z-levels (i.e., z levels) in order to cover the whole sample(s) in 3D. The z axis can extend from the objective lens of the imager 116 disclosed herein to the support, e.g., flow cell 112. The z axis can be orthogonal to the image plane of the flow cell images. Each z level of flow cell images may be separated from the adjacent z level(s) for a predetermined distance, for example, ranging from about 0.1 um to about 15 urns, or from 0.02 um to 10 urns. Each z level of flow cell images may be separated from the adjacent z level(s) for a distance ranging from 0.5 um to 10 urns, from 0.01 um to 5 urns, or from 0.1 um to 15 urns. At each z level, flow cell images can be acquired from one or more sequencing cycles and/or one or more channels. Each flow cell image may include in its field of view at least part of one or more tiles or subtiles of the flow cell. FIG. 27 shows a portion of a flow cell 2712 with multiple tiles 2710. The image plane is defined by the x and y axis. And the z direction (i.e., z axis) is orthogonal to the x-y plane. Although the flow cell images, samples, and the z axis are described in a Cartesian coordinate system as shown in FIG. 27, any other coordinate systems can be used to define spatial locations and relationships herein. Other coordinate systems can include but are not limited to the polar coordinate system, cylindrical, or spherical coordinate systems.
[0066] The sequencer 114 may be configured to flow a nucleotide mixture onto the flow cell 112, cleave blockers from the nucleotides in between flowing steps, and perform other steps for the formation of the DNA sequences on the flow cell 112. The nucleotides may have fluorescent elements attached that emit light or energy in a wavelength that indicates the type of nucleotide. Each type of fluorescent element may correspond to a particular nucleotide base (e.g., A, G, C, T). The fluorescent elements may emit light in visible wavelengths. In some embodiments, the sequencer 114 and the flow cell 112 may be configured to perform various sequencing methods disclosed herein, for example, sequencing-by-avidite.
[0067] For example, each nucleotide base may be assigned a color. Different types of nucleotides can have different colors. Adenine(A) may be red, cytosine(C) may be blue, guanine(G) may be green, and thymine(T) may be yellow, for example. The color or wavelength of the fluorescent element for each nucleotide may be selected so that the nucleotides are distinguishable from one another based on the wavelengths of light emitted by the fluorescent elements.
[0068] The imager 116 may be configured to capture images of the flow cell 112 after each flowing step. In some embodiment, the imager 116 includes a camera configured to capture digital images, such as a CMOS or a CCD camera. The camera may be configured to capture images at the wavelengths of the fluorescent elements bound to the nucleotides. The images acquired by the imager of the sample(s) immobilized on at least a portion of the flow cell can be called the flow cell images.
[0069] In some embodiments, the imager 116 can include one or more optical systems disclose herein. The optical system(s) can be configured to capture optical signals from the flow cell and generate corresponding flow cell images thereof. The flow cell images can then be used for base calling.
[0070] In an embodiment, the images of the flow cell may be captured in groups, where each image in the group is taken at a wavelength or in a spectrum that matches or includes only one of the fluorescent elements. In another embodiment, the images may be captured as single images that capture all of the wavelengths of the fluorescent elements. [0071] The resolution of the imager 116 can control the level of detail in the flow cell images, including pixel size. In existing systems, this resolution is very important, as it controls the accuracy with which a spot-finding algorithm identifies the polony or cluster centers. In some embodiments, the image resolution of flow cell images disclosed herein can be about 10 nanometers (nms) to a couple of hundreds of nms or greater. In some embodiments, the image resolution of flow cell images can be in a range from 0.1 nm to 1000 nms. In some embodiments, the image resolution of flow cell images can be in a range from 1 nm to 500 nms. In some embodiments, the image resolution of flow cell images can be in a range from 5 nm to 300 nms. One way to increase the accuracy of polony or cluster finding is to improve the resolution of the imager 116, or improve the processing performed on images taken by imager 116. Detecting polony or cluster centers in pixels other than those detected by a spot-finding algorithm can be performed. These methods can allow for improved accuracy in detection of polony or cluster centers without increasing the resolution of the imager 116. The resolution of the imager 116 may even be better than existing systems with comparable performance, which may reduce the cost of the sequencing system 110.
[0072] The image quality of the flow cell images can control the base calling quality. One way to increase the accuracy of base calling is to improve the imager 116, or improve the processing performed on images taken by imager 116 to result in a better image quality. The methods described herein may predict high resolution of the flow cell images (2x, 4x, or more than existing flow cell image resolution, in a common coordinate system) so that the detectable polony or cluster density can be improved with reduced or eliminated interferences from neighboring polonies, cellular background signal, color mixing, and/or other noises in the flow cell images. As a result, 3D base calling can be more accurate using the methods herein when compared with existing methods without using such high resolution flow cell images. Such methods herein can allow for accurate and efficient base calling. These methods can be advantageously performed in parallel with a sequencing run in the computer-implemented system 100, without interference with or delay of existing sequencing workflow of the sequencing system 110. The results of predicted high resolution flow cell images can be available for making base calling in the current sequencing cycle in the sequencing workflow. Further, some or all of the operations disclosed herein can be advantageously performed by the first reconfigurable logic device, e.g., FPGA(s) or the integrated circuit, e.g., an application specific integrated circuit (ASIC) chip, neural processing unit (NPU), or artificial intelligence (Al) chip and data can be communicated between the CPU(s) and the first reconfigurable logic device or integrated circuit to reduce the total operational time from methods operating using only the CPUs.
[0073] The sequencing system 110 may be configured to perform operations or actions for image processing of the flow cell images across different cycles and/or channels. The operations or actions disclosed herein may be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or a combination thereof. One or more operations or actions in the methods 500, 600, 700, 2800, 2900 disclosed herein may be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or a combination thereof. In some embodiments, which operations or actions are to be performed by the dedicated processors 118, the reconfigurable logic device(s) and/or integrated circuit(s) 120, the computing system 126, or their combinations can be determined based on one or more of a computation time for the specific operation(s), the complexity of computation in the specific operation(s), the need for data transmission between the hardware devices, the power required for the specific operation(s), or their combinations. Image processing operations or actions of the flow cell images can be performed after the corresponding flow cell images are acquired but before base calling of the flow cell images is performed.
[0074] In some embodiments, the data storage 122 is used to store information used in the methods herein. This information may include the flow cell images themselves or information and/or images derived from the flow images captured by the imager 116. The DNA sequences determined from the base-calling may be stored in the data storage 122. Parameters identifying polony or cluster locations may also be stored in the data storage 122. Raw and/or processed image intensities of each polony or cluster may be stored in the data storage 122. The region and/or subtile that each polony or cluster corresponds to may also be stored in the data storage 122. The transformation matrix of each region and/or subtile for different cycle(s) and/or channel(s) may also be stored in the data storage 122. Cell images may be stored in the data storage 122. The flow cell images, the processed images, and/or the filtered images may be stored in the data storage. Other information or images that can facilitate 3D base calling of the sample can be saved in the data storage. [0075] The user interface 124 may be used by a user to operate the sequencing system or access data stored in the data storage 122 or the computing system 126.
[0076] The computing system 126 may control the general operation of the sequencing system and may be coupled to the user interface 124. It may also perform steps in image processing, base calling, their preceding operations, and/or subsequent operations including but not limited to predicting high resolution flow cell images. In some embodiments, the computing system 126 is a computer system 400, as described in more detail in FIG. 4. The computing system 126 may store information regarding the operation(s) of the sequencing system 110, such as configuration information, instructions for operating the sequencing system 110, or user information. The computing system 126 may be configured to pass information between the sequencing system 110 and the cloud 130.
[0077] The computing system 126 can include one or more general purpose computers that provide interfaces to run a variety of program in an operating system, such as Windows™ or Linux™. Such an operating system typically provides great flexibility to a user. In some embodiments, the computing system 126 may include one or more processors, e.g., CPUs, the CPUs may be configured for artificial intelligence algorithm development and training (e.g., neural network training), either alone or in combination with the reconfigurable logic device and/or integrated circuit 120.
[0078] In some embodiments, the sequencing system may include one or more reconfigurable logic devices 120 and/or one or more other integrated circuits 120. The reconfigurable logic device 120 can include one or more FPGA devices. The integrated circuit 120 herein may or may not be reconfigurable, and it may include an Al chip, an application-specific integrated circuit (ASIC) chip, a neural processing unit (NPU), or a combination thereof. In some embodiments, the reconfigurable logic device and/or integrated circuit 120 may be configured for artificial intelligence algorithm development and training (e.g., training of a neural network), either alone or in combination with the CPU and/or GPU.
[0079] In some embodiments, the reconfigurable logic device and/or integrated circuit 120 include a main unit and an edge unit. For example, the main unit may be a FPGA device and the edge unit may be an ASIC or Al chip. In some embodiments, the edge unit is an additional hardware processing module that may be individually installed and/or uninstalled on the system 110. The edge unit may be configured for artificial intelligence algorithm development and training. The edge unit may be configured for making inferences or predictions using deployed Al algorithm(s), e.g., neural networks. The edge unit may communicate electronically with the main unit e.g., data communication via DMA connections. The edge unit may communicate electronically for data with other parts of the system 100 via various connections, such as a chip2chip connection. As an example, the edge unit may include a neural processing unit (NPU) chip, an Al chip, or any other integrated circuit(s).
[0080] In some embodiments, the dedicated processors 118 may be configured to perform operations in the methods disclosed herein. The dedicated processors 118 may include one or more reconfigurable logic devices and/or integrated circuits disclosed herein. The dedicated processors 118 may not include general-purpose processors, but instead custom processors with specific hardware or instructions for performing those steps. Dedicated processors directly run specific software without an operating system. The lack of an operating system reduces overhead, at the cost of the flexibility in what the processor may perform. A dedicated processor may make use of a custom programming language, which may be designed to operate more efficiently than the software run on general-purpose computers. This may increase the speed at which the steps are performed and allow for real time processing.
[0081] In some embodiments, the reconfigurable logic device and/or the integrated circuit 120, e.g., FPGA and/or Al chip, may be configured to perform some or all of operations in the methods herein. The reconfigurable logic device and/or the integrated circuit may be programmed as hardware that can perform specific task(s). A special programming language may be used to transform software steps into hardware componentry. Each software step may correspond to at least one operation or action in the methods disclosed herein. Each software step may include at least a part of the operation or action in the methods disclosed herein. Once the reconfigurable logic device is programmed, the hardware directly processes digital data that is provided to it without running software. The reconfigurable logic device and/or integrated circuit instead uses logic gates and registers to process the digital data. Because there is no overhead required for an operating system, the reconfigurable logic device and/or integrated circuit generally processes data faster than a general-purpose computer. Similar to dedicated processors, this may be at the cost of flexibility. The lack of software overhead may also allow the reconfigurable logic device and/or the integrated circuit to operate faster than a dedicated processor, although this will depend on the exact processing to be performed and the specific the reconfigurable logic device and/or integrated circuit and dedicated processor.
[0082] A group of the reconfigurable logic devices and/or integrated circuits 120 may be configured to perform the steps in parallel. In some embodiments, a number of processing engines of the FPGA(s) may be configured to perform one or more identical image processing steps for an image, a set of images, a subtile, or a select region in one or more images. Each FPGA(s) 120 may perform its own part of the image processing step(s) in parallel, reducing the time needed to process data. This may allow the image processing step(s) to be completed in real time. For example, a number of processing engines of a first FPGA may be configured to generate a polony map for a tile of the flow cell. Each processing engine may be responsible for generating a portion, e.g., non-overlapping portion, of the polony map at a different subtile within the tile, e.g., in parallel. A second FPGA may be configured to perform intensity normalization in parallel with the generation of the polony map. As another example, a number of FPGA(s) and integrated circuits, e.g., Al chips, may be configured to perform one or more image processing step(s) for the flow cell images. Each FPGA(s) 120 may perform its own part of the processing step(s) in parallel, reducing the time needed to process data, while each Al chip may perform polony or cluster prediction after receiving data from its corresponding FPGA. This may allow the image processing steps to be completed in real time. For example, a first and second FPGA may be configured to perform intensity registration in parallel for a different subtile or tile of the flow cell. A corresponding Al chip may perform prediction of high resolution flow cell image of the corresponding subtile or tile after image registration is completed by its corresponding FPGA. Further discussion of the use of FPGAs is provided below.
[0083] The reconfigurable logic device and/or the integrated circuit may be configured to perform some or all of the operations or actions in the methods disclosed herein in real time. Performing the operations or actions in real time may allow the system 110 to use less memory and/or data storage, as the data may be processed as it is received. This is an improvement over conventional systems that may need to store the data before it may be processed and consequently require more memory/data storage or accessing a computer system located in the cloud 130. Further, performing the operations or actions in real time may allow more efficient sequencing analysis as it is being performing in parallel while a sequencing run is still in progress. Furthermore, performing the processing steps using the FPGAs and Al chips may allow the system to use less power, e.g., 2x, 5x, lOx, 20x or more, thus producing less heat than performing the same processing steps using the CPUs and/or GPUs. Further discussion of the use of FPGAs is provided below.
[0084] As discussed above, the sequencing system 110 may have dedicated processors 118, the reconfigurable logic device and/or integrate circuit 120, or the computing system 126. The sequencing system may use one, two, or all of these elements to accomplish one or more operations or actions in the methods disclosed herein. In some embodiments, when these hardware elements are present together, the image processing tasks are split between them. For example, the reconfigurable logic device 120 may be used to perform some or all of: the preprocessing operations, color correction, polony map generation, image registration, predicting high resolution flow cell images, training a neural network, generating the training flow cell images, base calling, and any subsequent operations, while the computing system 126 may perform other processing functions for the sequencing system 110 such as intensity normalization and registering images for base calling with cell staining image(s). Those skilled in the art will understand that various combinations of these elements will allow various system embodiments that balance efficiency and speed of processing with cost of processing elements.
[0085] In some embodiments, one or more reconfigurable logic devices and/or integrated circuits 120 can accelerate base calling and/or any primary analysis steps of flow cell images acquired from 2D or 3D sample(s). In some embodiments, the reconfigurable logic devices and/or integrated circuits can accelerate primary analysis of 2D sample(s) or 3D volumetric sample(s) by 2x, 4x, 5x, lOx, 15x, 20x, 25x, 30x, 40x, 50x, lOOx, 200x, 400x, 500x, 800x, lOOOx, or more than traditional primary analysis methods using only CPUs and/or GPUs. In some embodiments, one or more reconfigurable logic devices and/or integrated circuits 120 herein can accelerate sequencing and sequencing analysis (including at least primary analysis) of the flow cell images acquired from 2D or 3D sample(s). In some embodiments, the reconfigurable logic devices and/or integrated circuits herein can accelerate sequencing and sequencing analysis (including at least primary analysis) of the flow cell images acquired from 2D or 3D sample(s) by 2x, 4x, 5x, lOx, 15x, 20x, 25x, 30x, 40x, 50x, lOOx, 200x, 400x, 500x, 800x, lOOOx, or more than traditional sequencing systems with only CPUs and/or GPUs. In some embodiments, making inferences or predictions of high resolution images, of base calls, or of classifications, using the neural network disclosed herein and the reconfigurable logic devices and/or integrated circuits can be less than 800 ms, 500ms, 400ms, 300ms, 200 ms, 100ms, 50ms, 20 ms, or less per tile per cycle. The tile size can be varied in different flow cells. The title size may be at least 0.0012 mm, 0.01 mm2, 0.05 mm2, 0.1 mm2, 0.5 mm2 1 mm 2, 2 mm2, 3 mm2 or more.
[0086] In some embodiments, one or more reconfigurable logic devices and/or integrated circuits 120 can enable primary analysis (base calling) of polonies for flow cell images at multiple z levels. For example, processing time using reconfigurable logic devices can be less than 400 hours for at least 50 flow cell images (e.g., covering 50 tiles and from two or more color channels) with a FOV of at least 1 mm2 with a resolution of 1 um or better in three dimensions for one or more flow cycles, e.g., 1-15 cycles. The flow cell images can be from multiple z- levels to cover some or all of the volumetric 3D samples (e.g., completely covering at least two samples).
[0087] In some embodiments, one or more reconfigurable logic devices and/or integrated circuits 120 can be used for accelerating primary analysis of 3D samples involving training neural network(s) and using the trained neural networks for making predictions or inferences. For example, neural network(s) can be used to predict polony locations and/or predict cell boundaries thereby identifying polonies within the cell(s). Using the reconfigurable logic device and/or integrated circuits 120 for computations associated with neural networks can reduce the training and/or prediction time needed in comparison with usage of GPUs or other computer processors, thereby accelerating sequence analysis, and enabling sequence analysis of flow cycles while subsequent flow cycles are to be performed or in progress in the sequence run. In some embodiments, the reconfigurable logic device(s) and/or integrated circuits 120 can accelerate training and/or prediction by lOx, 20x, 50x, 80x, lOOx, 200x, 500x, 600x, 800x, lOOOx, or more than training and/or prediction using CPUs and/or GPUs. In some embodiments, the reconfigurable logic devices and/or integrated circuits 120 can be used to achieve optimal acceleration in sequencing analysis. For example, one or more FPGA chips can be used in combination with an integrated circuit specific for computations corresponding to artificial intelligence (Al) algorithms, e.g., a NPU. The integrated circuit(s) can be specific circuits for Al functions. The integrated circuit(s) can include applicationspecific integrated circuits (ASIC). Computational tasks can be distributed to the FPGA(s) and the integrated circuit(s) to optimize computational time, energy consumption, heat dissipation, etc. For example, the Al chip may be used only for computations involving a neural network (e.g., predicting polony locations, predicting high resolution flow cell images, or training the neural network) and the FPGA(s) may be used for the rest of the primary analysis steps. The primary analysis time using dual FPGA chips or single FGPA chip in connection with the Al chip(s) can be less than 400, 300, 200, 100, 50, or 20 hours for at least 50 flow cell images (e.g., covering about 50 tiles of the flow cell and from two or more color channels) with a FOV of at least 1 mm2 with a resolution of 1 um or better for each flow cell image in three dimensions for one or more flow cycles, e.g., 1-15 cycles. The flow cell images can be from multiple z-levels to cover some or all of the volumetric 3D samples (e.g., 10 to 20 z-locations to completely cover at least two samples). The primary analysis time may include a total time of image processing from obtaining raw flow cell images acquired using the imager 116 to generating base calls and saving base call results. The 3D samples herein includes polonies or clusters that are centered at different z levels that are spaced apart from each other with at least 0.01 um, 0.05 um, 0.1 um, 0.2 um, 0.5 um, 1 um, or more along the z direction or axial direction.
[0088] The cloud 130 may be a network, remote storage, or some other remote computing system separate from the sequencing system 110. The connection to cloud 130 may allow access to data stored externally to the sequencing system 110 or allow for updating of software in the sequencing system 110.
Reconfigurable logic devices and integrated circuits
[0089] FIG. 5C shows an exemplary embodiment of the reconfigurable logic device and the integrated circuit(s) of the sequencing system disclosed herein. In some embodiments, the sequencing system 110 may include one or more reconfigurable logic devices 120_a. In the embodiment shown in FIG. 5C, the sequencing system comprises a single reconfigurable logic device, i.e., a first reconfigurable logic device 120_a. In some embodiments, the sequencing system comprises multiple reconfigurable logic devices (not shown). The reconfigurable logic device may comprise data processing engines 5011 configured to perform data processing in parallel. Each data processing engine may include a combination of digital logic circuit to perform its function, e.g., intensity extraction, convolution, registration, etc. The sequencing system 110 may further include reconfigurable routing channels 5013 that may function as connections among the data processing engines 5011 and may also connect the data processing engines to other structural elements, e.g., the first processor and the memory device, of the sequencing system 110. In some embodiments, a neural network may be deployed at least partly on the reconfigurable logic device 120_a so that the reconfigurable logic device can be used for at least some computational tasks for generating inferences using the neural network. The neural network may be pretrained using various training methods and data, for example, using the training methods and training data disclosed herein. The sequencing system may further include a first processor 120_c to selectively activate or deactivate different combinations of the of data processing engines 120_a and the reconfigurable routing channels 120_b. The FPGA(s) 120 as shown in FIG. 1 of the sequencing system 110 may include one or more of the reconfigurable logic device 120_a, the integrated circuit 120_b, and the processor 120_c.
[0090] In some embodiments, the FPGA(S) 120 may only include the reconfigurable logic device 120_a and the processor 120_c, but not the integrated circuit(s) 120_b. The different combinations of the of data processing engines 5011 and the reconfigurable routing channels 5013 may be configured to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s). The sequencing analysis may include operations or steps of primary analysis. Such operation(s) may include one or more of (a) obtaining sensor data from one or more sensors (in the imager 116) of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; (c) predicting a second plurality of flow cell images using the neural network based on the sensor data or the first plurality of flow cell images; (d) determining polonies from the second plurality of flow cell images; and (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[0091] In some embodiments, the sensor data includes raw data that has been acquired from the sensor(s) of the imager without any additional image processing. In some embodiments, the sensor data includes raw flow cell images that have not been processed by the computing system 126, the dedicated processors 118, and/or the reconfigurable logic device and integrated circuit(s) 120 of the sequencing system 110.
[0092] In some embodiments, the sequencing system comprises: a first reconfigurable logic device 120_a comprising a first plurality of data processing engines 5011 configured to perform data processing in parallel; first reconfigurable routing channels 5013 connecting at least some of the first plurality of data processing engines 5011; a neural network deployed at least partly on the first reconfigurable logic device 5011; a first processor 120_c that selectively activates or deactivates different combinations of the first plurality of data processing engines 5011 and the first reconfigurable routing channels 5013 to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s). The sequencing analysis may include operations or steps of primary analysis. Such operation(s) may include one or more of (a) obtaining sensor data directly from one or more sensors of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; (c) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; (d) repetitively performing, for one or more times, downsampling operations comprising: (1) performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (2) performing a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result after (d); (e) performing the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; (f) repetitively performing up sampling operations comprising: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (4) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a sixth convolution result after (f); (g) performing the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; (h) predicting a second plurality of flow cell images based on the seventh convolution result, wherein each of the second plurality of flow cell images corresponds to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is at least 2, 4, 6, 8, 10, 12, 16, or 32 times greater than the first resolution in one or more spatial dimensions; (i) determining polonies in the second plurality of flow cell images; (j) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (k) optionally forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor, or one or more hardware processors of the sequencing system.
[0093] In some embodiments, obtaining sensor data from one or more sensors (in the imager 116) of the sequencing system may be via a direct connection. In some embodiments, the direct connection between the first reconfigurable logic device (120 and 120_a) and the sensor(s) lacks other hardware components that may process or store the sensor data thus causing undesired complexity, delay, and possible errors in sensor data communication. Such hardware components include the first processor 120_c, the memory device 5030, or any processors, e.g., computing system 126, e.g., CPU, of the sequencing system. Comparing with traditional sequencing systems in which sensor data is communicated to other hardware components before it is communicated to where it is being processed (e.g., communicating to CPU and then to GPU to be processed) the direct sensor data communication herein advantageously improves data transmission efficiency from the sensor to the FPGAs 120, frees-up the other hardware(s), e.g., CPUs, storage devices, for other data processing functions, decreases power consumption from indirect data communication, and reduces time consumption in data communication thus sequencing analysis.
[0094] In some embodiments, the connection between the first reconfigurable logic device (120 and 120_a) and sensor may include other hardware components that may process or store the sensor data. Such hardware components may include the first processor 120_c, the memory device 5030, or any processors, e.g., CPUs 126 of the sequencing system. For example, the sensor data may be saved into the memory device 5030, and then it can be accessed by the first reconfigurable logic device using memory controller(s) 5013.
[0095] The reconfigurable logic device may include digital logic circuits therein, in a sense that it is also an integrated circuit. However, the integrated circuit herein (e.g., the Al chip, NPU, etc.) may have various difference with the reconfigurable logic device, e.g., the integrated circuit may not be as flexible in reconfiguration as the reconfigurable logic device. For example, the integrated circuit herein, e.g., the Al chip, NPU, etc., may not be reconfigurable.
[0096] In some embodiments, the sequencing system 110 comprises at least one reconfigurable logic device but lacks any integrated circuits, e.g., Al chips, ASIC chips, or NPUs. The reconfigurable logic device may perform one or more operations in sequencing analysis and may forward its output back to the CPU as end results of primary analysis, e.g. base calls. Alternatively, the reconfigurable logic device may forward its output back to the CPU so that subsequent operations may be performed based on its output by the CPU to generate the end results of sequencing analysis.
[0097] In some embodiments, the sequencing system 110 comprises at least one reconfigurable logic device, and at least one integrated circuit as shown in FIG. 5C. The integrated circuit may perform one or more operations in sequencing analysis and may forward its output back to the reconfigurable logic device so that subsequent operations may be performed based on its output at the reconfigurable logic device.
[0098] In some embodiments, the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
[0099] In some embodiments, the data communication between any two of the reconfigurable logic device, the integrated circuits, the first processor, and the second processor may be direct such that the direct communication lacks any other hardware components that may process or store the data. Such other hardware components may include memory device(s), and/or other processor(s) of the sequencing system. Such direct communication may include DMA connections. In some embodiments, the data communication the data communication between any two of the reconfigurable logic device, the integrated circuits, the first processor, and the second processor may be direct such the data may not be utilized by other logic circuits or stored before reaching its communication destination, but the data may be stored in a memory device before reach its communication destination.
[0100] In some embodiments, the sequencing system 110 may include a first reconfigurable logic device 120_a, e.g., FPGA, comprising a first plurality of data processing engines 5011 configured to perform data processing in parallel; an integrated circuit 120_b, e.g., an Al chip; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines alone or in combination with the fist routing channels to perform operation(s) in sequencing analysis to facilitate generating the sequencing analysis result(s). The sequencing analysis may include operations or steps of primary analysis. The sequencing analysis may include operations or steps of secondary analysis. Such operation(s) may include one or more of: obtaining sensor data from one or more image sensors of the sequencing system; processing the sensor data to generate a first plurality of flow cell images; and communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit. The sequencing system may include a second processor or the first processor to control the integrated circuit to perform one or more operations in sequencing analysis to facilitate generating the sequencing analysis result(s). The sequencing analysis may include operations or steps of primary analysis and/or secondary analysis. Such operation(s) may include one or more of: receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; determining polonies from the second plurality of flow cell images; performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the determined polonies; corresponding base callings of polonies in the second plurality of flow cell images to one or more of: the first reconfigurable logic device 120_a, the first 120_c or second processor, and/or one or more processors of the sequencing system 126. In some embodiments, the operation of forwarding the second plurality of flow cell images, the determined polonies; corresponding base callings of polonies in the second plurality of flow cell images comprises forward to a memory device herein, e.g., DDR memory, so that one or more of: the first reconfigurable logic device 120_a, the first 120_c or second processor, and/or one or more processors of the computing system 126 can access the data from the memory. Accessing data from the memory including reading, writing, editing, etc., may be assisted by the memory controllers disclosed herein.
[0101] In some embodiments, the sequencing system 110 comprises at least one reconfigurable logic device, and at least one integrated circuit as shown in FIG. 5C. The integrated circuit may perform one or more operations in sequencing analysis and may generate its output as the end results of primary analysis and forward its output to one or more devices including: the reconfigurable logic device, the first or second processor, the hardware processor of the sequencing system, etc., so that the end results can be saved or presented to a user. In some embodiments, the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
[0102] In some embodiments, the integrated circuit may perform one or more operations in sequencing analysis and generate its output as intermediate results of primary analysis, e.g., location of polonies, and may forward its output back to one or more of: the reconfigurable logic device, the first or second processor, the hardware processor of the sequencing system, etc., so that the end results can be determined based on its output.
[0103] In some embodiments, the integrated circuit may forward its output, either intermediate or end results, to be stored in a memory device, so that one or more devices including: the reconfigurable logic device, the first or second processor, and the hardware processor of the sequencing system can access the stored output whenever needed. The access to the output stored in a memory device can be via a memory controller of the sequencing system, e.g., 5013.
[0104] In some embodiments, the output of the reconfigurable logic device or the integrated circuit comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in two dimensions. In some embodiments, the output data of the reconfigurable logic device or the integrated circuit comprises identification of base calling locations in three dimensions.
[0105] In some embodiments, the sequencing system comprises: a first reconfigurable logic device comprising a first plurality of data processing engines configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines. The different combinations of the first plurality of data processing engines may be configured to perform operations comprising: obtaining sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images; and communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit. The integrated circuit may perform operations comprising: (1) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; and (2) predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; and (3) communicating the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
[0106] In some embodiments, the sequencing system comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising: (a) obtaining sensor data from one or more sensors of the sequencing system; (b) processing the sensor data to generate a first plurality of flow cell images; and (c) communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit; wherein the integrated circuit performs operations comprising: (d) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; (e) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; (f) repetitively performing, for one or more times, down-sampling operations comprising: (1) performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (2) performing a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result; (g) performing the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; (h) repetitively performing up sampling operations comprising: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (4) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a sixth convolution result; (i) performing the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and (j) predicting a second plurality of flow cell images based on the seventh convolution result, wherein each of the second plurality of flow cell images corresponds to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is at least 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions.
[0107] In some embodiments, the first reconfigurable routing channels comprises one or more electronic nodes, and the electronic nodes are programmable. The electronic nodes here may include junction points in the circuit(s). The electronic nodes may include points where two or more circuit elements are connected together. In some embodiments, the first reconfigurable routing channels comprises one or more interconnects. The interconnect may include the physical wiring(s) that connects transistors and other components on an integrated circuit. In some embodiments, reconfigurable routing channels comprises one or more memory controllers, e.g., 5013 in FIG. 5C. In some embodiments, the first reconfigurable routing channels comprises one or more network- on-chips (NoCs), e.g., 5013 in FIG. 5C. In some embodiments, the first reconfigurable routing channels may comprise one or more of: a network-on-chip (NoC), and a memory controller.
[0108] The first reconfigurable routing channels may be configured to passively communicate data between components of the sequencing system. For example, the reconfigurable routing channels may be configured to communicate data bilaterally between the data processing engines, e.g., 5011 in FIG. 5C and the memory device, e.g., 5030 in FIG. 5C. The first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device, e.g.,120_a, and one or more memory devices, e.g., 5030. The first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device e.g., 120_a, and the integrated circuit, e.g., 120_b.
[0109] The reconfigurable logic device herein may each comprise one or more data processing engines, e.g., 5011. Each data processing engine may comprise multiple digital logic circuits.
[0110] The first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto. The first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels. The first reconfigurable logic device may comprise digital circuits that are integrated and forming a FPGA device. For example, the FPGA device in FIG. 5C includes the first reconfigurable logic device, the DMA connections, the first reconfigurable routing channels (e.g., NoC and memory controllers).
[OHl] The sequencing system may further comprise one or more memory devices electrically connected for data communication with one or more components of the sequencing system, the one or more components may include one or more of: the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; a second processor; and one or more processors of the sequencing system.
[0112] In some embodiments, the sequencing system further comprises one or more direct data access (DMA) connections, e.g., 5012 in FIG. 5C, that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels, e.g., 5013 in FIG. 5C. The DMA connections may be configured to actively communicate data between components of the sequencing system. For example, the DMA connections may be configured to fetch data or send data to components that are connected thereto, e.g., the data processing engines, e.g., 5011 in FIG. 5C and the reconfigurable routing channels, e.g., 5013 in FIG. 5C. The DMA connections herein may be configured to actively request data from or actively sending data directly to: the first reconfigurable logic device; the first reconfigurable routing channels; the integrated circuit; or a combination thereof. One or more direct data access (DMA) connections may be in data communication with the first reconfigurable routing channels and the integrated circuit herein. The DMA connections may be configured to allow data communication based on a predetermined protocol, e.g., a PCIe protocol.
[0113] In some embodiments, the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices. In some embodiments, the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
[0114] In some embodiments, the sequencing system further comprises an integrated circuit that is different from the first reconfigurable logic device, , e.g., 120_b in FIG. 5C. The integrated circuit herein may not be reconfigurable. The integrated circuit may comprise an application specific integrated circuit (ASIC) chip. In some embodiments, the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip. The integrated circuit may comprise a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits. The integrated circuit may further comprise: second plurality of data processing engines and second routing channels, each connecting at least some of the second plurality of data processing engines.
[0115] In some embodiments, the sequencing system further comprises a first processor. The first processor may be configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations disclosed herein. In some embodiments, the sequencing system further comprises a second processor. The second processor may be configured to control digital circuits of the integrated circuit herein.
[0116] In some embodiments, the first processor, or a second processor, e.g., of the integrated circuit, is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations. The first processor or a second processor may be configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations herein.
[0117] The sequencing system may further comprise a housing that encloses the first reconfigurable logic device, the first reconfigurable routing channels, the one or more DMA connections, the integrated circuit, and the first processor therein. In some embodiments, the sequencing system further comprises: a housing that encloses at least the first reconfigurable logic device therein and the integrated circuit is external to the housing.
[0118] In some embodiments, the sequencing system further comprises: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit. A first power level supplied by the power source to the first reconfigurable logic device may be higher than a second power level supplied to the integrated circuit while a sequencing run and/or sequencing analysis is in progress. A maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers, e.g., traditional sequencers without the first reconfigurable logic device (e.g., FPGA), the integrated circuit (e.g., Al chip), or both. The time consumption in performing a sequencing run and corresponding sequencing analysis (e.g., primary analysis) thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips). Time consumption in performing a sequencing run and sequencing analysis of the sequencing run (e.g., primary analysis) using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and analysis using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips). In some embodiments, a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding sequencing analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts. The power source may be configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts. The power source may be configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
[0119] In some embodiments, one or more components of the first reconfigurable logic device and/or integrated circuit may include a computational performance of at least 2, 4, 8, 10, 16, 20, 30, 40, 50, 60, 70, 80, or 100 Giga-operations per second (GOPs) or more. In some embodiments, one or more processing engines of the first reconfigurable logic device and/or integrated circuit may include a computational performance of at least 12, 4, 8, 10, 16, 20, 30, 40, 50, 60, 70, 80, or 100 Giga-operations per second (GOPs), or more Giga-operations per second (GOPs), or more. In some embodiments, the first reconfigurable logic device and/or the integrated circuit includes a computational performances of at least 10, 20, 40, 50, 60, 80, or 100 Tera-operations per second (TOPs).
[0120] In some embodiments, one or more components are located on a first printed circuit board (PCB). The one or more components may include: the first reconfigurable logic device the first reconfigurable routing channels; the first processor; and the one or more DMA connections. In some embodiments, the integrated circuit is located on a second printed circuit board (PCB) different from the first printed circuit board, e.g., as shown in FIG. 5C. The integrated circuit and the second PCB may be positioned within a same housing of the sequencing system as the first PCB or external to the housing of the sequencing system. Being on a separate PCB makes connecting the first reconfigurable logic device, e.g., FPGA device with various integrated circuit on a chip convenient, efficient, and easily customizable. In some embodiments, the first PCB board may be a main board, and the second PCB board may be a daughter board or edge unit.
[0121] In some embodiments, the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs). Instead, the sequencing systems utilizes FPGAs, Al chips, NPUs, or other ASIC chips for performing the operations disclosed herein. The sequencing system disclosed herein advantageously requires less power, generates less heat, and reduces the hardware complexity and costs for performing NGS sequencing runs and corresponding sequencing analysis than sequencers that use GPUs or TPUs.
[0122] In some embodiments, the sequencing systems include logic devices that are not limited to reconfigurable logic devices (e.g., FPGAs) and/or other integrated circuits (e.g., Al chips, NPUs). In some embodiments, the sequencing systems include various types of processing units or processors configured for reconfigurable parallel processing, In some embodiments, the sequencing systems include various types of logic devices or integrated circuits, e.g., ASIC chips. In some embodiments, the sequencing systems include GPUs, TPUs, or other various types of processing units that are configured to perform one or more operations disclosed herein.
[0123] In some embodiments, the sequencing systems include GPUs, TPUs, or other various types of processing units that are configured to perform one or more operations that can be performed by the reconfigurable logic devices (e.g., FPGAs) and/or other integrated circuits (e.g., Al chips, NPUs).
[0124] The first processor may be positioned on the first PCB board together with the reconfigurable logic device for convenient and efficient control of the reconfigurable logic device. In some embodiments, the first processor is a separate processor from one or more processors of the sequencing system configured to control the optical system, the fluidics of the sequencing system, etc. In some embodiments, the first processor can be configured to only control the components on the first PCB board, e.g., the FPGA device, alone or in combination with components on the second PCB board, e.g., the Al chip. In some embodiments, the sequencing system may comprise a second processor that is configured to separately control the Al chip. The first processor or second processor of the sequencing system, e.g., 120_c, may comprise a CPU. The one or more hardware processors of the sequencing system comprises a CPU. In some embodiments, the first or second processor, e.g., 120_c, lacks any GPU or TPU. In some embodiments, the first or second processor, e.g., 120_c, comprises only CPU(s).
[0125] In some embodiments, the sequencing system may further comprise a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
[0126] In some embodiments, the operation for processing the sensor data to generate the first plurality of flow cell images comprises one or more of: registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
[0127] In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are in real time. In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are within the time window of performing sequencing reactions and/or imaging of a single sequencing cycle of the sequencing run. In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are within the time window of performing sequencing reactions and/or imaging of a single z-level of a single sequencing cycle.
[0128] FIG. 5D shows an exemplary embodiment of performing sequencing analysis in parallel with performing a sequencing run. In this particular embodiment, the sequencing run includes multiple sequencing cycles, only part of a single cycle is shown herein. For each cycle, flow cell images are acquired at multiple z-levels from different color channels of an in situ sample . The sequencing reactions are repeatedly performed for each z-level in each cycle within a time window 5601. The operations of the integrated circuit are performed within a processing window 5602 within the time window 5609 of a single sequencing cycle and also within a time window 5601 for sequencing reactions and imaging at a single z-level 5601. The operations of the first reconfigurable logic device (e.g., on board primary analysis operations) are also performed with a processing window 5603 that is within the time window 5609 of each sequencing cycle. The processing windows 5602 and 5603 may be of identical or different duration depending on various factors such as sequencing data, primary analysis algorithms, etc. In some embodiments, the operations are not just performed within the processing windows but completed within the processing windows with respect to the data of the current cycle, e.g., of a preceding z-level of the current cycle that sensor data has been acquired. In some embodiments, the operations are completed within the processing windows with respect to the data of a preceding cycle, e.g., the cycle immediately preceding the current cycle.
[0129] In embodiments where the sample is a 3D volumetric sample, the operations are performed for a single z level in each cycle within a predetermined time window, e.g., 5602, 5603. The predetermined time window is for a single z level in a single sequencing cycle. In some embodiments, the predetermined time window is less than 1000 ms, 900 ms, 800 ms, 700ms, 600 ms, 500 ms, 400 ms, 300 ms, 250 ms, 200 ms, or 100 ms. In some embodiments, each of the one or more operations are performed within the predetermined time window and in parallel while the sequencing run is in progress. In some embodiments, each of the one or more operations are performed in parallel within a time window that sequencing, imaging, or both of a subsequent sequencing cycle is completed.
[0130] The first plurality of flow cell images herein may be obtained from a single z level of a 2D or 3D sample. In some embodiments, the first plurality of flow cell images herein may be obtained from multiple z levels covering at least partly of an in situ sample, e.g., of cells or tissue(s). The first plurality of flow cell images may be obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample. In some embodiments, the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from multiple color channels. In some embodiments, the first plurality of flow cell images are from a single sequencing cycle. In some embodiments, the first plurality of flow cell images are from multiple sequencing cycles. The first plurality of flow cell images may be of a first spatial resolution in x, y, and/or z directions. The second plurality of flow cell images may be generated based on the first plurality of flow cell images. The second plurality of flow cell images may be of a second spatial resolution in x, y, and/or z directions. The first spatial resolution may be lower than the second spatial resolution, and a higher resolution herein indicates that a pixel size is smaller so that the polonies in the flow cell images are of finer spatial details. The first spatial resolution may be 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x, y, and/or z directions. The first spatial resolution may be at least 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x,y, and/or z directions. In some embodiments, the first and second resolution is in 3D. In some embodiments, the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the second resolution is in a range of 0.01 um to 2 um. In some embodiments, the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
[0131] In some embodiments, the sequencing system further comprises one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support. The support may comprise a glass or plastic substrate. The support may be included in a flow cell device. The one or more image sensors may be configured to generate sensor data based on the optical signals. In some embodiments, the sequencing system further comprises: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations disclosed herein. The one or more data storage devices may include one or more memory devices. The one or more memory devices may be accessible by the one or more processors, the first processor, the second processor, the first reconfigurable logic device, and the integrated circuit.
[0132] In some embodiments, the one or more processors are separate from the first or second processors. The operations performed by the one or more processors may include one or more of 1) recording sensor data generated in the sequencing system in one or more flow cycles; 2) optionally processing the recorded sensor data; 3) sending the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit; 4) receiving outcome from the first reconfigurable logic device or integrated circuit; and 5) generating sequencing analysis results based on the received outcome. The operations performed by the one or more processors may include one or more of 1) receiving outcome from the first reconfigurable logic device or integrated circuit; and 2) generating sequencing analysis results based on the received outcome.
[0133] In some embodiments, the sequencing analysis results comprise primary analysis results. In some embodiments, the sequencing analysis results comprise a data file in a predetermined data format. In some embodiments, the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
[0134] In some embodiments, the sequencing system further comprises: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors. The optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images. The support may be comprised in a flow cell device.
[0135] In some embodiments, the operation(s) performed by the first reconfigurable routing channels or the integrated circuit using the neural network comprises one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings.
[0136] In some embodiments, the neural network herein comprises a convolutional neural network (CNN). In some embodiments, the neural network comprises a U-Net. In some embodiments, the neural network has been pretrained. In some embodiments, the neural network has been trained using the first reconfigurable logic device or the integrated circuit. In some embodiments, the neural network is a 3D neural network.
[0137] In some embodiments, the first convolution comprises a 3D convolution with a convolution kernel. In some embodiments, the convolutional kernel has at least four dimensions. In some embodiments, the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer. In some embodiments, n is an integer from 1 to 16384. In some embodiments, the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively. In some embodiments, the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively. In some embodiments, n is in a range from 4 to 1024.
[0138] In some embodiments, the neural network has been trained using the first reconfigurable logic device or the integrated circuit. In some embodiments, the neural network is a 2D neural network. In some embodiments, the first convolution comprises a 2D convolution with a convolution kernel. In some embodiments, the convolutional kernel has at least three dimensions. In some embodiments, the convolutional kernel is m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer. In some embodiments, n is an integer from 1 to 16384.
[0139] In some embodiments, the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively. In some embodiments, the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively. In some embodiments, n is in a range from 4 to 1024.
[0140] In some embodiments, the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n filters in a first, second, third repetition, respectively. In some embodiments, the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, filters in a last repetition, last minus one, last minus two, repetition, respectively. In some embodiments, n is in a range from 4 to 1024.
[0141] In some embodiments, the neural network is pretrained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s).
Comparing with neural networks trained with 3D volumes of training data, e.g., 3D CNN, the neural networks pretrained with 2D flow cell images, e.g., 2D CNN, are less complex and requires less computational effort in making predictions or inferences, thereby providing higher efficiency and saving time and computational effort in its prediction of polony locations. In some embodiments, the neural network pretrained with 2D flow cell images may predict polony locations per tile per cycle in a time window that is lOx, 50x, 80x, lOOx, 200x, 400x, 600x, 800x, lOOOx, 1500x, 2000x or less than making identical predictions using neural networks trained from 3D volumes of flow cell images.
[0142] In some embodiments, the neural network pretrained with 2D flow cell images may predict polony locations per tile per cycle using the reconfigurable logic device and/or other integrated circuits, e.g., FPGA and/or Al chips, in a time window that is 5x, lOx, 20x, 40x, 50x, 80x, lOOx, 200x, 400x, 600x, 800x, lOOOx or less than identical neural network using CPUs or other processors.
[0143] In some embodiments, the operation (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images. In some embodiments, the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution. In some embodiments, the fourth plurality of flow cell images comprises the second resolution.
[0144] In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity. In some embodiments, the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles. In some embodiments, the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences. In some embodiments, different insert sequences correspond to different target RNA molecules or target cDNA molecules. In some embodiments, each location of the determined polonies corresponds to a location of the concatemer molecules. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles. In some embodiments, the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles. In some embodiments, the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%. [0145] In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 102 -1015 per mm2. In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 103 -IO10 per mm2.
[0146] In some embodiments, the down-sampling factor is 2, 4, or 8. In some embodiments, the up-sampling factor is 2, 4, or 8. In some embodiments, the downsampling factor is 2, 4, 8, 16, 32, 64, or more. In some embodiments, the up-sampling factor is 2, 4, 8, 16, 32, 64, or more.
[0147] In some embodiments, one or more of operations of (a) to (k) are performed while a sequencing run is being performed. In some embodiments, the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500. In some embodiments, the one or more cycles comprises a current cycle N. In some embodiments, wherein N is in a range from 1 to 500. In some embodiments, the one or more cycles comprises a single cycle ranging from 1 to 500. In some embodiments, the one or more cycles comprises multiple cycles ranging from 1 to 500. In some embodiments, one or more of operations, e.g., operations (a) to (j), are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
[0148] In some embodiments, the training data set of flow cell images comprises z-stacks of flow cell images taken at different z-locations, and each z-stack is used as a 3D volume for training the neural network. In some embodiments, the training data set of flow cell images comprises 2D flow cell images taken at different z-locations, and individual 2D flow cell images at multiple z-levels are used as 2D images for training the neural network.
[0149] In some embodiments, the training data set of flow cell images comprises simulated flow cell images of in situ samples at different z-locations. In some embodiments, the training data set of flow cell images comprises actual flow cell images acquired from in situ samples at different z-locations.
[0150] In some embodiments, performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
[0151] In some embodiments, performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
[0152] In some embodiments, repetitively performing up sampling operations comprises: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; (4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous downsample repetition, wherein the first up-sampled result has a same size as the first down- sampled result in the previous down-sampling repetition; and (5) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
[0153] In some embodiments, the different combinations of the first plurality of data processing engines are configured to perform operations further comprising: (a) receiving the second plurality of flow cell images from the integrated circuit; (b) determining polonies from the second plurality of flow cell images; and (c) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (d) forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings to the first processor or one or more hardware processors of the sequencing system or a combination thereof.
[0154] In some embodiments, the one or more operations performed by the first reconfigurable logic device further comprises: forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system. In some embodiments, the one or more operations performed by the integrated circuit further comprises forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system. [0155] In some embodiments, the operations performed by the first reconfigurable logic device or the integrated circuit further comprising: registering the second plurality of flow cell images to a common coordinate system.
[0156] In some embodiments, the operations performed by the integrated circuit further comprising one or more of: determining polonies from the second plurality of flow cell images; performing a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
[0157] In some embodiments, the operation (d) or (i) of determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies. The operation of generating a 3D polony map comprising spatial location of polonies based on the determined polonies may further comprise: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus. In some embodiments, the operation of determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images. Exemplary embodiments of methods for generating the polony maps are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
[0158] Disclosed herein, in some embodiments, are sequencing methods comprising operations herein. Such operation may include one or more of: (a) obtaining, by a first reconfigurable logic device of a sequencing system, sensor data from one or more sensors of the sequencing system; (b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images; (c) predicting, by the first reconfigurable logic device, a second plurality of flow cell images using a neural network at least partly deployed on the first reconfigurable device and based on the sensor data or the first plurality of flow cell images; (d) determining, by the first reconfigurable logic device, polonies from the second plurality of flow cell images; (e) performing, by the first reconfigurable logic device, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (f) optionally forwarding, by the first reconfigurable logic device, the second plurality of flow cell images, the corresponding base calling, or both to one or more processors of the sequencing system.
[0159] Disclosed herein, in some embodiments, are sequencing methods comprising operations herein. Such operations may include one or more of (a) obtaining, by the first reconfigurable logic device, sensor data from one or more image sensors of the sequencing system; (b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images; (c) communicating, by the first reconfigurable logic device to an integrated circuit, the sensor data, the first plurality of flow cell images, or both; (d) receiving, by the integrated circuit and from the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both; (e) predicting, by the integrated circuit, a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; (f) determining, by the integrated circuit, polonies from the second plurality of flow cell images; and (g) performing, by the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[0160] Disclosed herein, in some embodiments , are sequencing methods comprising operations herein. Such operation may include one or more of (a) obtaining, by the first reconfigurable logic device of a sequencing system, sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images; (b) communicating, by the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both to the integrated circuit; (c) receiving, by the integrated circuit of the sequencing system, the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; (d) predicting, by the by the integrated circuit, a second plurality of flow cell images using a neural network deployed at least partly on the integrated circuit and based on the sensor data, the first plurality of flow cell images, or both; and (e) communicating, by the integrated circuit, the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
[0161] In some embodiments, the first reconfigurable routing channels comprises one or more electronic nodes, and the electronic nodes are programmable. The electronic nodes here may include junction points in the circuit(s). The electronic nodes may include points where two or more circuit elements are connected together. In some embodiments, the first reconfigurable routing channels comprises one or more interconnects. The interconnect may include the physical wiring(s) that connects transistors and other components on an integrated circuit. In some embodiments, reconfigurable routing channels comprises one or more memory controllers, e.g., 5013 in FIG. 5C. In some embodiments, the first reconfigurable routing channels comprises one or more network- on-chips (NoCs), e.g., 5013 in FIG. 5C. In some embodiments, the first reconfigurable routing channels may comprise one or more of a network-on-chip (NoC), and a memory controller.
[0162] The first reconfigurable routing channels may be configured to passively communicate data between components of the sequencing system. For example, the reconfigurable routing channels may be configured to communicate data bilaterally between the data processing engines, e.g., 5011 in FIG. 5C and the memory device, e.g., 5030 in FIG. 5C. The first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device and one or more memory devices. The first reconfigurable routing channels may be configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
[0163] The reconfigurable logic device herein may each comprise one or more data processing engines. Each data processing engine may comprise multiple digital logic circuits.
[0164] The first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto. The first reconfigurable logic device may be configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels. The first reconfigurable logic device may comprise a first integrated circuit forming a FPGA device. For example, the FPGA device in FIG. 5C includes the first reconfigurable logic device, the DMA connections, and the first reconfigurable routing channels (e.g., NoC and memory controllers).
[0165] The sequencing system may further comprises one or more memory devices electrically connected for data communication with one or more components of the sequencing system, the one or more components may include one or more of the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; a second processor; and one or more processors of the sequencing system. [0166] In some embodiments, the sequencing system further comprises one or more direct data access (DMA) connections, e.g., 5012 in FIG. 5C, that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels, e.g., 5013 in FIG. 5C. The DMA connections may be configured to actively communicate data between components of the sequencing system. For example, the DMA connections may be configured to fetch data or send data to components that are connected thereto, e.g., the data processing engines, e.g., 5011 in FIG. 5C and the reconfigurable routing channels, e.g., 5013 in FIG. 5C. The DMA connections herein may be configured to actively request data from or actively sending data directly to: the first reconfigurable logic device; the first reconfigurable routing channels; the integrated circuit; or a combination thereof. One or more direct data access (DMA) connections may be in data communication the first reconfigurable routing channels and the integrated circuit herein. The DMA connections may be configured to allow data communication based on a predetermined protocol, e.g., a PCIe protocol.
[0167] In some embodiments, the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices. In some embodiments, the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
[0168] In some embodiments, the sequencing system further comprises an integrated circuit that is different from the first reconfigurable logic device, e.g., 120_b in FIG. 5C. The integrated circuit herein may not be reconfigurable. The integrated circuit may comprise an application specific integrated circuit (ASIC) chip. In some embodiments, the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip. The integrated circuit may comprise a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits. The integrated circuit may further comprise: second plurality of data processing engines and second routing channels, each connecting at least some of the second plurality of data processing engines.
[0169] In some embodiments, the sequencing system further comprises a first processor. The first processor may be configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations disclosed herein. [0170] In some embodiments, the first processor or a second processor is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations. The first processor or a second processor may be configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations herein.
[0171] The sequencing system may further comprise a housing that encloses the first reconfigurable logic device, the first reconfigurable routing channels, the one or more DMA connections, the integrated circuit, and the first processor therein. In some embodiments, the sequencing system further comprises: a housing that encloses at least the first reconfigurable logic device therein and the integrated circuit is external to the housing.
[0172] In some embodiments, the sequencing system further comprises: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit. A first power level supplied by the power source to the first reconfigurable logic device may be higher than a second power level supplied to the integrated circuit while a sequencing run and/or sequencing analysis is in progress. A maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers, e.g., traditional sequencers without the first reconfigurable logic device (e.g., FPGA), the integrated circuit (e.g., Al chip), or both. The time consumption in performing a sequencing run and corresponding sequencing analysis (e.g., primary analysis) thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run using a sequencer without the first reconfigurable logic device, the integrated circuit, or both (e.g., a traditional sequencer without FPGA and/or Al chips). Time consumption in performing a sequencing run and sequencing analysis of the sequencing run (e.g., primary analysis) using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and analysis using a sequencer without the first reconfigurable logic device, the integrated circuit, or both(e.g., a traditional sequencer without FPGA and/or Al chips).
[0173] In some embodiments, a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding sequencing analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts.
[0174] In some embodiments, the sequencing system further comprises a power source configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts.
[0175] In some embodiments, the sequencing system further comprises a power source configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
[0176] In some embodiments, one or more components are located on a first printed circuit board (PCB). The one or more components may include: the first reconfigurable logic device the first reconfigurable routing channels; the first processor; and the one or more DMA connections. In some embodiments, the integrated circuit is located on a second printed circuit board (PCB) different from the first printed circuit board, e.g., as shown in FIG. 5C. The integrated circuit and the second PCB may be positioned within a same housing of the sequencing system as the first PCB or external to the housing of the sequencing system. Being on a separate PCB makes connecting the first reconfigurable logic device, e.g., FPGA device with various integrated circuit on a chip convenient, efficient, and easily customizable. In some embodiments, the first PCB board may be a main board, and the second PCB board may be a daughter board.
[0177] In some embodiments, the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs). Instead, the sequencing systems utilizes FPGAs, Al chips, NPUs, or other ASIC chips for performing the operations disclosed herein. The sequencing system disclosed herein advantageously requires less power, generate less heat, and reduces the hardware costs for performing NGS sequencing runs and corresponding sequencing analysis.
[0178] The first processor may be positioned on the first PCB board together with the reconfigurable logic device for convenient and efficient control of the reconfigurable logic device. In some embodiments, the first processor is a separate processor from one or more processors of the sequencing system configured to control the optical system, the fluidics of the sequencing system, etc. In some embodiments, the first processor can be configured to only control the components on the first PCB board, e.g., the FPGA device, alone or in combination with components on the second PCB board, e.g., the Al chip. In some embodiments, the sequencing system may comprise a second processor that is configured to separately control the Al chip. The first processor or second processor of the sequencing system, e.g., 120_c, may comprise a CPU. The one or more hardware processors of the sequencing system comprises a CPU.
[0179] Continuing referring to FIG. 5C, in this particular embodiment, the sensor data at the imager 116 can be communicated directly to the data processing engine(s) 5011 of the first reconfigurable logic device 120(a). Alternatively, the sensor data may be saved into a memory device, e.g., 5030 so that it can be accessed by the data processing engine. The first processor 120_c may control operation of the data processing engines and the routing channels to process the sensor data and generate the first plurality of flow cell images. The processing may include operations disclosed herein such as intensity normalization, color correction, phasing and prephasing correction, background subtraction, etc. The first plurality of flow cell images may then be communicated from the processing engines through the routing channels to the memory device 5030 so that the integrated circuit may be controlled by the first processor or a second processor to access the first plurality of flow cell images for subsequent steps in primary analysis. Alternatively, the first plurality of flow cell images may be directly communicated to the integrated circuit 120-b via DMA connections 5012. In this embodiment, the integrated circuit is only used for prediction higher resolution polony locations using a pretrained CNN, thereby generating the second plurality of flow cell images with a resolution that is at least 8 times higher than the resolution of the first plurality of flow cell images. The CNN may be pretrained using simulated images or real flow cell images. The second plurality of flow cell images are transmitted back from the integrated circuit to the first reconfigurable logic device for subsequent processing steps such as base calling. The base calls along with quality information may then be saved into a FastQ data file. Other information including cell segmentation and staining may also be saved in the same file or another FastQ file with compatible data format.
[0180] In some embodiments, the sequencing system may further comprise a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
[0181] In some embodiments, the operation for processing the sensor data to generate the first plurality of flow cell images comprises one or more of registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
[0182] In some embodiments, each of the one or more operations performed by the first reconfigurable logic device or the integrated circuit are performed within the time window of performing a single sequencing cycle of the sequencing run. FIG. 5D shows an exemplary embodiment of performing sequencing analysis in parallel with performing a sequencing run. In this particular embodiment, the sequencing run include multiple sequencing cycles. For each cycle, flow cell images are acquired at multiple z-levels from different color channels. The sequencing reactions are repeatedly performed for each z- level in each cycle within a time window 5601. The operations of the integrated circuit are performed within a processing window 5602 within the time window 5609 of a single sequencing cycle and also within a time window 5601 for sequencing reactions and imaging at a single z-level 5601. The operations of the first reconfigurable logic device (e.g., on board primary analysis operations) are also performed with a processing window 5603 that is within the time window 5609 of each sequencing cycle. The processing windows 5602 and 5603 may be identical or different depending on various factors such as sequencing data, primary analysis algorithms, etc. In some embodiments, the operations are not just performed within the processing windows but completed within the processing windows with respect to the data of the current cycle, e.g., a preceding z- level that sensor data has been acquired. In some embodiments, the operations are completed within the processing windows with respect to the data of a preceding cycle, e.g., the cycle immediately preceding the current cycle.
[0183] In embodiments where the sample is a 3D volumetric sample, the operations are performed for a single z level in each cycle within a predetermined time window, e.g., 5602, 5603. The predetermined time window is for a single z level in a single sequencing cycle. In some embodiments, the predetermined time window is less than 1000 ms, 900 ms, 800 ms, 700 ms, 600 ms, 500 ms, 400 ms, 300 ms, 250 ms, 200 ms, or 100 ms. In some embodiments, each of the one or more operations are performed within the predetermined time window and in parallel while the sequencing run is in progress. In some embodiments, each of the one or more operations are performed in parallel within a time window that sequencing, imaging, or both of a subsequent sequencing cycle is completed. [0184] The first plurality of flow cell images herein may be obtained from multiple z levels covering at least partly of an in situ sample, e.g., of cells or tissue(s). The first plurality of flow cell images may be obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample. In some embodiments, the first plurality of flow cell images are from a single color channel. The first plurality of flow cell images may be of a first spatial resolution in x, y, and/or z directions. The second plurality of flow cell images may be generated based on the first plurality of flow cell images. The second plurality of flow cell images may be of a second spatial resolution in x, y, and/or z directions. The first spatial resolution may be lower than the second spatial resolution, and a higher resolution herein indicate that a pixel size is smaller so that the polonies in the flow cell images are of finer spatial details. The first spatial resolution may be 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x, y, and/or z directions. The first spatial resolution may be at least 2x, 4x, 6x, 8x, lOx, 16x, 24x, 32x, or 48x lower than the second spatial resolution in x,y, and/or z directions. In some embodiments, the first and second resolution is in 3D. In some embodiments, the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the second resolution is in a range of 0.01 um to 2 um. In some embodiments, the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
[0185] In some embodiments, the sequencing system further comprises one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support. The support may comprise a glass or plastic substrate. The support may be comprised in a flow cell device. The one or more image sensors may be configured to generated sensor data based on the optical signals. In some embodiments, the sequencing system further comprises: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations disclosed herein. The one or more data storage devices may include one or more memory devices. The one or more memory devices may be accessible by the one or more processors, the first processor, the second processor, the first reconfigurable logic device, the integrated circuit.
[0186] In some embodiments, the one or more processors are separate from the first or second processors. The operations performed by the one or more processors may include one or more of: 1) recording sensor data generated in the sequencing system in one or more flow cycles; 2) optionally processing the recorded sensor data; 3) sending the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit; 4) receiving outcome from the first reconfigurable logic device or integrated circuit; and 5) generating sequencing analysis results based on the received outcome. The operations performed by the one or more processors may include one or more of: 1) receiving outcome from the first reconfigurable logic device or integrated circuit; and 2) generating sequencing analysis results based on the received outcome.
[0187] In some embodiments, the sequencing analysis results comprise primary analysis results. In some embodiments, the sequencing analysis results comprise a data file in a predetermined data format. In some embodiments, the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
[0188] In some embodiments, the sequencing system further comprises: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors. The optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images. The support may be comprised in a flow cell device.
[0189] In some embodiments, the output data comprises base calls of nucleotide bases in a sample immobilized on a support. In some embodiments, the output data comprises identification of base calling locations in two dimensions. In some embodiments, the output data comprises identification of base calling locations in three dimensions.
[0190] In some embodiments, the operation(s) performed by the first reconfigurable routing channels or the integrated circuit using the neural network comprises one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings.
[0191] In some embodiments, the neural network comprises a convolutional neural network (CNN). In some embodiments, the neural network comprises a U-Net. In some embodiments, the neural network has been trained using the first reconfigurable logic device or the integrated circuit. In some embodiments, the first convolution comprises a 3D convolution with a convolution kernel. In some embodiments, the convolutional kernel have at least four dimension. In some embodiments, the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer. In some embodiments, n is an integer from 1 to 16384. In some embodiments, the second convolution in operation (1) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively. In some embodiments, the second convolution in (4) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively. In some embodiments, n is in a range from 4 to 1024.
[0192] In some embodiments, the operation (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images. In some embodiments, the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution. In some embodiments, the fourth plurality of flow cell images comprises the second resolution.
[0193] In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity. In some embodiments, the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles. In some embodiments, the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences. In some embodiments, different insert sequences correspond to different target RNA molecules or target cDNA molecules. In some embodiments, each location of the determined polonies corresponds to a location of the concatemer molecules. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support. In some embodiments, the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles. In some embodiments, the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles. In some embodiments, the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
[0194] In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 102 -1015 per mm2. In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 103 -1010 per mm2.
[0195] In some embodiments, the down-sampling factor is 2, 4, or 8. In some embodiments, the up-sampling factor is 2, 4, or 8. In some embodiments, the downsampling factor is 2, 4, 8, 16, 32 or 64. In some embodiments, the up-sampling factor is 2, 4, 8, 16, 32, or 64.
[0196] In some embodiments, one or more of operations of (a) to (k) are performed while a sequencing run is being performed. In some embodiments, the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500. In some embodiments, the one or more cycles comprises a current cycle N. In some embodiments, wherein N is in a range from 1 to 500. In some embodiments, the one or more cycles comprises a single cycle ranging from 1 to 500. In some embodiments, the one or more cycles comprises multiple cycles ranging from 1 to 500. In some embodiments, one or more of operations, e.g., operations (a) to (j), are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed. In some embodiments, the z-axis is orthogonal to image planes of the flow cell images. [0197] In some embodiments, performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
[0198] In some embodiments, performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
[0199] In some embodiments, repetitively performing up sampling operations comprises: (3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; (4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous downsample repetition, wherein the first up-sampled result has a same size as the first down- sampled result in the previous down-sampling repetition; and (5) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result.
[0200] In some embodiments, the different combinations of the first plurality of data processing engines are configured to perform operations further comprising: (a) receiving the second plurality of flow cell images from the integrated circuit; (b) determining polonies from the second plurality of flow cell images; and (c) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and (d) forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
[0201] In some embodiments, the one or more operations performed by the first reconfigurable logic device further comprises: forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system. In some embodiments, the one or more operations performed by the integrated circuit further comprises forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system.
[0202] In some embodiments, the operations performed by the integrated circuit further comprising one or more of: determining polonies from the second plurality of flow cell images; performing a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
[0203] In some embodiments, the operations performed by the first reconfigurable logic device or the integrated circuit further comprising: registering the second plurality of flow cell images to a common coordinate system.
[0204] In some embodiments, the operation (d) or (i) of determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies. The operation of generating a 3D polony map comprising spatial location of polonies based on the determined polonies may further comprise: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus. In some embodiments, the operation of determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images. Exemplary embodiments of methods for generating 3D polony map are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
[0205] In some embodiments, the method further comprises: providing the cellular sample harboring a plurality of RNA which comprises the first target RNA molecule and the second target RNA molecule. In some embodiments, the method further comprises: generating inside the cellular sample a plurality of cDNA molecules which include a first target cDNA molecule that corresponds to the first target RNA molecule and a second target cDNA molecule that corresponds to the second target RNA molecule. In some embodiments, the method further comprises: contacting the plurality of cDNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of first target-specific padlock probes and a second plurality of second target-specific padlock probes.
[0206] In some embodiments, the method further comprises: contacting the plurality of RNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
[0207] In some embodiments, individual padlock probes in the first plurality of first target-specific padlock probes comprise: first and second terminal regions, wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule or the first target RNA molecule, and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule or the first target RNA molecule.
[0208] In some embodiments, contacting the plurality of RNA molecules in the cellular sample with the plurality of target-specific padlock probes comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule or the first target RNA molecule to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions. In some embodiments, the first targetspecific padlock probe comprises a first target barcode sequence that corresponds to and uniquely identifies the first target cDNA sequence or the first target RNA sequence. In some embodiments, the first target-specific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule or the first target RNA sequence. In some embodiments, the first target-specific padlock probe comprises at least one universal adaptor sequence. In some embodiments, the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer or a complementary sequence thereof. In some embodiments, the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site or a complementary sequence thereof. In some embodiments, the method further comprises: closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the cellular sample. In some embodiments, the method further comprises: conducting a rolling circle amplification reaction inside the cellular sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least the first concatemer molecule that corresponds to the first target RNA molecule, and the second concatemer molecule that corresponds to the second target RNA molecule. In some embodiments, the first concatemer comprises: tandem repeat units of: a first target barcode sequence that uniquely identifies the first target RNA or the first target cDNA sequence, a first insert sequences that corresponds to the first target RNA or the first target cDNA, and a first sequencing primer binding site or a complementary sequence thereof. In some embodiments, the first concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof. In some embodiments, the second concatemer comprises: tandem repeat units of: a second target barcode sequence that uniquely identifies the second target RNA or the second target cDNA sequence, a second insert sequences that corresponds to the second target RNA or the second target cDNA, and a second sequencing primer binding site or a complementary sequence thereof. In some embodiments, the second concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
[0209] In some embodiments, conducting the one or more cycles of sequencing reactions comprises: contacting the plurality of concatemer molecules inside the cellular sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers. In some embodiments, the plurality of nucleotide reagents comprise: multivalent molecules, nucleotides, nucleotide analogs, or their combinations. In some embodiments, individual nucleotides or nucleotide analogs are detectably labeled or non-labeled. In some embodiments, the detectably labeled individual nucleotides or nucleotide analogs comprises a different detectable color label that corresponds with each different type of nucleotide base of A, G, C, and T/U. In some embodiments, an individual multivalent molecule comprise a core attached with multiple nucleotide arms and each arm of the individual multivalent molecule comprises the same type of nucleotide base.
[0210] In some embodiments, generating the first plurality of flow cell images comprises: in each cycle, imaging, by an optical system, optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules. In some embodiments, the first plurality of flow cell images comprises optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules. In some embodiments, conducting the one or more cycles of sequencing reactions comprises: sequencing only the first target barcode sequence region of the first concatemer, thereby generating the first sequencing read product. In some embodiments, conducting the one or more cycles of sequencing reactions comprises: sequencing the first target barcode sequence region and at least a portion of the first insert sequence of the first concatemer, thereby generating the first sequencing read product.
[0211] In some embodiments, conducting the one or more cycles of sequencing reactions comprises: sequencing only the second target barcode sequence region of the second concatemer, thereby generating the second sequencing read product. In some embodiments, conducting the one or more cycles of sequencing reactions comprises: sequencing the second target barcode sequence region and at least a portion of the second insert sequence of the second concatemer, thereby generating the second sequencing read product.
[0212] In some embodiments, the method further comprises: removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample. In some embodiments, the method further comprises: reiteratively sequencing the plurality of concatemers by repeating the following operations for at least once: generating the first plurality of flow cell images of a cellular sample immobilized on a support by conducting one or more cycles of sequencing reactions thereby generating the first sequencing read product and the second sequencing product, the cellular sample comprising a plurality of concatemer molecules therewithin, wherein a first concatemer molecule of the plurality of concatemer molecules corresponds to a first target RNA molecule of the cellular sample, and a second concatemer molecule of the plurality of concatemer molecules corresponds to a second target RNA molecule of the cellular sample, wherein the first plurality of flow cell images; and removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample.
[0213] In some embodiments, the first sequencing read product comprises some or all of: a first target barcode sequence in one or more tandem units of the first concatemer molecule; a first insert sequence in one or more tandem units of the first concatemer molecule; or their combinations.
[0214] In some embodiments, the method further comprises: confirming presence of the first target RNA molecule, the second target RNA molecule, or both molecules in the cellular sample based on the performed base calling of the second plurality of flow cell images at the base calling locations in the base calling template.
[0215] In some embodiments, the method further comprises: generating, by the sequencing system, the second plurality of flow cell images of the cellular sample immobilized on the support by conducting subsequent cycles of sequencing reactions after the one or more cycles. In some embodiments, generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the first concatemer inside the cellular sample under a condition that inhibits sequencing the second concatemer. In some embodiments, sequencing at least the first concatemer inside the cellular sample comprises: generating a plurality of first sequencing read products, and wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm presence of the first target RNA in the cellular sample. In some embodiments, generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the second concatemer inside the cellular sample under a condition that inhibits sequencing the first concatemer. In some embodiments, sequencing at least the second concatemer inside the cellular sample comprises: generating a plurality of second sequencing read products, and wherein sequences of the second sequencing read products are aligned with a second target reference sequence to confirm presence of the second target RNA in the cellular sample. Predicting high resolution flow cell images
[0216] FIG. 5A shows a flow chart of a computer-implemented method 500 for predicting high resolution flow cell images thereby improving detectable polony density in the flow cell images. The method 500 can include some or all of the operations disclosed herein. The operations may be performed in but is not limited to the order that is described herein.
[0217] The method 500 can be performed by one or more processors disclosed herein. In some embodiments, the processor can include one or more of: a computing system comprising a processing unit 118, a reconfigurable logic device 120, an integrated circuit that is not reconfigurable 120, or their combinations. For example, the processing unit can include a central processing unit (CPU). The reconfigurable logic device can include one or more FPGA devices. The integrated circuit can include a chip such as an Al chip or an ASIC chip. In some embodiments, the one or more processors can include the computer system 400 disclosed herein.
[0218] In some embodiments, some or all operations in method 500, 600, 700, 2800, and 2900 can be performed by the reconfigurable logic device, e.g., the FPGA(s), and/or the integrated circuit, e.g., the Al chip. In embodiments when some operations are performed by the reconfigurable logic device and/or integrated circuit, e.g., FPGA(s), the data produced by the reconfigurable logic device and/or integrated circuit, e.g., the FPGA(s) after performing one or more operations, can be communicated to various hardware elements of the system 100, e.g., CPU(s) or GPU(s), so that subsequent operation(s) in method 500, 600, 700, 2800, and 2900 can be performed by such various hardware using the communicated data. Similarly, data can also be communicated in the opposite direction from various hardware e.g., CPU(s), to the reconfigurable logic device or the integrated circuit for processing. In some embodiments, all the operations in method 500, 600, 700, 2800, and 2900 can be performed by CPU(s). Alternatively, the operations performed by CPU(s) can be performed by other processors such as the dedicated processors, or GPU(s). In some embodiments, all the operations in method 500, 600, 700, 2800, and 2900 can be performed by the reconfigurable logic device and/or the integrated circuit, e.g., FPGA(s) and/or the Al chip(s).
[0219] In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit, e.g., via DMA connections. In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit without being routed first to a CPU, a GPU, or any other processing units before reaching the reconfigurable logic device and/or the integrated circuit.
[0220] In some embodiments, predicting high resolution flow cell images using the methods 500 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips, may require at least 2x, 8x, lOx, 15x, 20x, 40x, 50x, or lOOx less power than making the same predict! on(s) using other computing hardware including but not limited to CPUs or GPUs.
[0221] In some embodiments, the sequencing system herein further comprises: a power source that is configured to supply identical or different power levels to the reconfigurable logic device and the integrated circuit. In some embodiments, a maximum power output of the power source to the sequencing system in performing methods 500, 600, 700, 2800, and/or 2900 is less than 2000 Watts, 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, 300 Watts, 200 Watts, or 100 Watts.
[0222] In some embodiments, the sequencing system herein comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 500, 2800) to make predictions.
[0223] In some embodiments, the sequencing system herein comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit in data communication with the first reconfigurable logic device; a neural network deployed at least partly on the integrated circuit and/or the first reconfigurable logic device; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations in methods herein (e.g., methods 500, 2800) to make prediction using the neural network. [0224] In some embodiments, the first reconfigurable logic device and the integrated circuit is within the same physical housing as the other elements of the sequencing system as show in FIG 1. In some embodiments, the first reconfigurable logic device and the integrated circuit are not physically external to the sequencing system 110 as shown in FIG. 1, e.g., not in the cloud 130.
[0225] The method 500 can comprise an operation 510 of (i) generating, by the sequencing system 110, a first plurality of flow cell images of sample(s) immobilized on a support by conducting one or more cycles of sequencing reactions. The sample(s) may comprise concatemer molecules therewithin. The sample(s) may include concatemer molecules from one or more different sample sources. The sample(s) may include a thickness along the z-axis so that the first plurality of flow cell images may be acquired at a z-stack of different z-locations with a first resolution to cover the sample in 3D. The samples may be acquired from a single z-location of a 2D or 3D sample.
[0226] The sample can be in situ. The sample can be a 3D sample. The sample can be a volumetric sample that may contain different biological information at the same x-y location but different z levels. The sample can be a cellular sample including multiple cells, tissue, or their combination. The sample can be any biological sample that has a thickness that is greater than a predetermined threshold along the z axis. For example, the thickness can be greater than 2 um, 3 um, 4 um, 5 um, 10 um, 20 um, or more. The z axis (e.g., z axis) is orthogonal to the image plane defined by x and y axes. In some embodiments, the sample can be traditional 2D sequencing samples.
[0227] In some embodiments, such computer-implemented method comprises an operation (i) of generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, wherein the first plurality of flow cell images are acquired with a first resolution. Such operation is similar to operation 510 in FIG. 5 A except that the sample may be 2D or 3D sample. In embodiments where the sample is 3D, the sample comprises concatemer molecules therewithin. In embodiments where the sample is 2D, the sample comprises template molecules therewithin.
[0228] The flow cell images can be acquired using the optical system of the imager 116 disclosed herein, from the 1, 2, 3, 4, or more color channels. Each flow cell image can include at least a portion of one or more tiles (e.g., imaging areas). Each tile can be divided into multiple subfiles. Each tile or subtile can include a plurality of polonies or clusters. Each subtile can include multiple regions with each region including a number of polonies or clusters. The flow cell image as disclosed herein can be an image that is acquired from a flow cell 112 as shown in FIG. 1 or 2712 as shown in FIG. 27. In some embodiments, the flow cell images are acquired from a single color channel, and subsequent prediction is by using a pretrained neural network corresponding to that single channel. In some embodiments, the flow cell images are acquired from 2, 3, 4, or more color channels, and subsequent prediction is by using a pretrained neural network corresponding to the multiple color channels.
[0229] In some embodiments, a flow cell image herein can be an image of one or more tiles, one or more subtiles, one or more segmented regions within tile(s) or subtile(s), or their combinations. Each flow cell image can comprise a field of view (FOV). The FOV can be orthogonal to the z axis. The FOV can be within the x-y plane. The FOV of different flow cell images at different z levels can be identical within the x-y plane. The FOV of different flow cell images at different z levels can have at least an overlapping portion within the x-y plane. The image resolution of different flow cell images at different z levels can be about identical or exactly identical. In some embodiments, The image resolution of different flow cell images at different z levels is different. FIGS. 3A and 3D show two exemplary flow cell images acquired at two different z levels along the z axis of a same 3D sample within a same sequencing cycle. The FOV can be in 3D and be of various sizes to cover the volumetric sample to be imaged. The FOV along x, y, and/or z direction can be in a range from 10 um to 5 mm. The FOV along x, y, and/or z direction can be in a range from about 0.1 um to about 2 mm. The FOV along x, y, and/or z direction can be in a range from 0.5 um to 1 mm. For example, the FOV can be about 0.5 mm by 0.5 mm by 20 um for certain cellular samples along the x, y, and z direction, respectively.
[0230] The flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be any integer greater than 64 or 128. The flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be in a range from 2 to 65536. A single flow cell image can be separated into different number of regions, for example, 4, 8, 16, or even more regions, and each region may include a size of 256 by 256 by 1, 512 by 512 by 3, or other sizes. In some embodiments, the number of pixels along x, y, and/or z direction may be adjusted to maintain a particular spatial resolution in a given FOV. For example, with a spatial resolution of 0.2 um, to cover a FOV of 0.8 mm, the number of pixels may be 4000.
[0231] Each flow cell image at a specific z level may include intensities generated by polonies or clusters at the corresponding z level. As shown in FIGS. 3 A and 3D, signals from polonies or clusters are small bright spots within the images. Each bright spot can be of various sizes that is less than a couple of pixels, e.g., less than a pixel, about a pixel, about 2 pixels, 3 pixels, 4, pixels, 5 pixels, or more. In some embodiments, each signal spot of the polonies or clusters can be any number of pixels in the range from 0.01 pixel to about 100 pixels. In some embodiments, each signal spot of the polonies or clusters can be any number of pixels in the range from 0.1 pixel to about 16 pixels.
[0232] Each flow cell image can also include intensities generated by the cell and its structural elements. Such structural elements can be background objects or components, e.g., in FIG. 3 A. Each flow cell images can also include noise and/or artifacts that are not from the polonies or cellular structures.
[0233] In some embodiments, when the depth of field the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis, polonies or clusters that are within the range of depth of field can appear in-focus or about in-focus in the flow cell image. Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image. So, such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurry signal spots represent out-of-focus polonies or clusters. Some of the out-of- focus polonies or clusters are circled in FIG. 3 A.
[0234] Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample. The undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria. Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour. In some embodiments, background objects can include any objects within the 3D sample but are not polonies or clusters.
[0235] In some embodiments, the flow cell images are from multiple color channels. In some embodiments, the flow cell images are of unbalanced nucleotide diversity. In some embodiments, the flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more sequencing cycles. In some embodiments, the flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles. In some embodiments, two or more different concatemer molecules among the concatemer molecules have different insert sequences. In some embodiments, different insert sequences correspond to different target RNA molecules or target cDNA molecules. In some embodiments, each location of the determined polonies corresponds to a location of the concatemer molecules. In some embodiments, the flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support. In some embodiments, the flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles. In some embodiments, the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases that is less than 20%, 15%, 10%, or 5% in the one or more sequencing cycles. In some embodiments, the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%. As an example, bases calls from the polonies include 4 different bases, and percentage of polonies for each of the 4 different bases can be greater than about 10% so that the data are of balanced diversity. As another example, bases called from the plurality of polonies includes 4 or less different bases, and percentage of polonies for one or more bases can be less than about 10%, and such data can be considered as unbalanced diversity. In some embodiments, bases called from the plurality of polonies include 4 or less different bases, and percentage of polonies for some of the bases can be less than about 5%, about 2%, or even about 1%, and such data can be considered as unbalanced diversity. As yet another example, the unbalanced diversity data include bases A, T, C, G in the plurality of polonies, and their percentages of the total base calls are about 1%, about 2%, about 1%, and about 95%, respectively. In addition to the base biases affecting diversity, plexity can also be a factor that when plexity is lower than a number, e.g., 8 or 16, the signal is of unbalanced diversity. [0236] The method 500 is configured to predict high resolution flow cell images even if the polonies in the acquired flow cell images are of unbalanced diversity in one or more sequencing cycles.
[0237] In some embodiments, the method 500 comprises an operation 520 of (ii) providing, by a processor, the first plurality of flow cell images as an input to a neural network (e.g., CNN), wherein the neural network is pre-trained using a training data set of training flow cell images using a training method 600 herein. The neural network is pretrained so that the values of parameters of the neural network has been optimized based on the training. The neural network may be retrained when needed, for example, for predicting flow cell images from different cellular samples.
[0238] In some embodiments, the computer-implemented method 500 may be used to predict high resolution flow cell images that are at higher resolution than the first plurality of flow cell images (e.g., 2x, 4x, or 6x along at least one spatial dimension) acquired by the imager 116. In some embodiments, the high resolution flow cell images may be post image-processing images of the first plurality of flow cell images, e.g., by going through the image processing part 3120 of the neural network in FIG. 31. Image processing herein may include various image processing steps including but are not limited to: background removal, background reduction, artifact removal, artifact suppression, adjusting signal to noise ratio, adjusting contrast to noise ratio, intensity normalization, intensity offset correction, noise reduction, color correction, phasing or dephasing correction, image registration, and deconvolution.
[0239] In some embodiments, the neural network in operation 520 is a first neural network that can be trained using method 700 disclosed herein.
[0240] In some embodiments, the method 500 comprises an operation 520’ in replacement of the operation 520. In some embodiments, the operation 520 includes (ii) providing, by a processor or a first reconfigurable logic device, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images and reference base calls of the training dataset.
[0241] In some embodiments, the operation 520’ is similar to the operation 520, e.g., as shown in FIG. 5 A, with the exception of a different neural network. In some embodiments, the operation 520’ may replace the operation 520 in method 500. In some embodiments, the neural network in operation 520’ is a different neural network from that in operation 520.
[0242] In some embodiments, the neural network in operation 520 is a first neural network, and the neural network in operation 520’ is a second neural network that is different from the first neural network in operation 520. The difference(s) among the first and second neural networks may include but is not limited to: different types of neural networks, differences in values of parameters, number of parameters, number of convolutional layers, number of layers, or a combination thereof.
[0243] In some embodiments, the second neural network in operation 520’ is a different neural network that is pretrained using the same training data set of flow cell images as that used for training the first neural network in operation 520.
[0244] In some embodiments, the second neural network in operation 520’ is a different neural network that is pretrained using a different training data set of flow cell images as that used for training the first neural network in operation 520. In some embodiments, the second neural network in operation 520’ is pre-trained using reference base calls of the training dataset.
[0245] In some embodiments, the first neural network in operation 520 is pretrained using reference images or reference intensities as ground truths, e.g., reference high resolution images or reference intensities in high resolution images, and the second neural network in operation 520 is pre-training using reference base calls of the training flow cell images in the training datasets as ground truths.
[0246] In some embodiments, the reference base calls may be generated using various base calling methods including those methods disclosed herein in relation to training methods for predicting base calls herein. In some embodiments, the reference base calls may be generated using methods that lacks usage of a neural network. Exemplary embodiments of generating base calls from flow cell images are disclosed in U.S. Patent Application No. 18/078,820 and PCT Application No. PCT/US2023/076125, which are incorporated by reference in their entireties.
[0247] In some embodiments, the second neural network in operation 520’ may be trained using a training method similar to methods 700 in FIG. 5E, In such embodiments, the reference intensities are not used in operations, e.g., operations 725, 730, and 755. Instead, reference base calls are used in such operations, e.g., operations 725’, 730’, and 755. [0248] In some embodiments, the loss function for training the second neural network in operation 520’ may be different from the loss function used in training the first neural network in the operation 520. In some embodiments, various loss functions may be used for training the second neural network in operation 520’. In some embodiments, the second neural network is pre-trained using one or more loss functions based on comparing training base calls of the training flow cell images to the reference base calls of the training flow cell images. In some embodiments, the loss function for training the second neural network in operation 520’ may be based on comparison of training outputs, e.g., base calls, to the reference base calls. In some embodiments, training of the second neural network in operation 520’ may be completed when the loss function satisfies a predetermined criteria. The predetermined criteria can be customized to include various aspects of training outputs. In some embodiments, the predetermined criteria is determined based on the comparison of training base calls to reference base calls. In some embodiments, the predetermined criteria is based on the correctness of the training base calls in comparison to the reference base calls. In some embodiments, the predetermined criteria is at least based on training time that has been spent.
[0249] FIG. 31 is a block diagram showing an exemplary embodiment of the first and second neural networks and the method for training such neural networks.
[0250] It is worth noting that although neural network 3110 is disclosed in this exemplary embodiment, the neural network 3110 herein may be any artificial intelligence-based or machine learning based model that can include an imaging processing part 3120 and a base calling part 3130. Similarly, for the imaging processing part 3120 and the base calling part 3130, each of them may be any artificial intelligence-based or machine learning based model that may achieve similar functions as the neural network-based equivalent
[0251] In some embodiments, the neural network 3110 may be the first neural network in operation 520 or the second neural network in operation 520’. The method for training the neural network 3110 may be method 700 as an example. The neural network may include two separate parts, the first part is the image processing part 3120, and the second part is the base calling part 3130.
[0252] The image processing part 3120 is configured to perform one or more image processing steps disclosed herein, e.g., in relation to method 500, on the flow cell images herein, e.g., the first or second plurality of flow cell images. The one or more image processing steps may include but are not limited to: background removal, background reduction, artifact removal, artifact suppression, adjusting signal to noise ratio, adjusting contrast to noise ratio, intensity normalization, intensity offset correction, noise reduction, color correction, phasing or dephasing correction, image registration, intensity extraction, and deconvolution.
[0253] In some embodiments, the base calling part 3130 is configured to perform base calling using the output images 3150 from the image processing part 3120 of the neural network. The base calling part 3130 may be configured to perform some image processing steps including but not limited to intensity extraction, color correction, and/or phasing or dephasing correction in embodiments where such image processing steps are not performed in the image processing part 3120 of the neural network.
[0254] The first or second part of the neural network 3120, 3130 may each include one or more structural elements of the neural network such as a convolutional layer. In some embodiments, the first or second part of the neural network 3120, 3130 may include one or more embedding layers of the neural network. The first part of the neural network 3120 may include at least part of an encoder of the neural network, and the second part of the neural network may include at least part of an decoder of the neural network. In some embodiments, the second part of the neural network may include at least part of: a convolutional layer, a pooling layer, a fully connected layer, a SoftMax layer, an input layer, an output layer, an embedding layer, an encoder, and a decoder of the neural network.
[0255] In some embodiments, the base calling part 3130 may lack any structural element of a neural network, e.g., a convolutional layer or a pooling layer. In some embodiments, the base calling part 3130 may lack any artificial-intelligence based algorithm. In some embodiments, the base calling part 3130 may lack any convolutional layers of the neural network. In some embodiments, the base calling part 3130 may lack any part of an embedding layer or a decoder of the neural network. In some embodiments, the base calling part 3130 may lack any part of: a convolutional layer, a pooling layer, a fully connected layer, a SoftMax layer, an input layer, an output layer, an embedding layer, an encoder, and a decoder of the neural network. In some embodiments, the base calling part 3130 may only comprise non-neural network base calling algorithm(s). As an example, the neural network 3110 that generates the output images 3150 is the second neural network in operation 520’ . As another example, the neural network 3110 that generate the output base calls 3160 is the neural network disclosed in relation to method 2800.
[0256] In embodiments where the base calling part 3130 lack any elements of a neural network or an artificial intelligence based algorithm, training of the neural network may include training of one or more parameters of the base calling part 3130. For example, the one or more parameters may include a feature size. In such embodiments, during training, the back propagation for finding adjustments of values for parameters of the neural network 3110, e.g., originating from the loss function, the references, and the output base calls, goes through the base calling part 3130 with making any adjustment to parameters of the base calling part 3130 and the image processing part 3120 (with adjustment of parameters) as the solid gray line with arrow shown in FIG. 31.
[0257] In some embodiments where the base calling part 3130 lacks any elements of a neural network or an artificial intelligence based algorithm, training of the neural network does not include training of any parameters of the base calling part 3130. During training, the back propagation for finding adjustments of values for parameters of the neural network 3110, e.g., originating from the loss function, the references, and the output base calls, may go through the base calling part 3130 without making any adjustment to parameters of the base calling part 3130 and then the image processing part 3120 (but with adjustment of parameters) as the solid gray line with arrow shown in FIG. 31.
[0258] In embodiments where the base calling part 3130 comprises at least some elements of a neural network or an artificial intelligence based algorithm, training of the neural network may include training of the base calling part 3130 and the image processing part 3120, including adjusting parameters from both parts, as shown in FIG. 31 as the solid gray line with arrow.
[0259] In other embodiments where the base calling part 3130 lack any elements of a neural network or an artificial intelligence based algorithm, training of the neural network may only include training of the image processing part 3120 but not training of any of the parameters in the base calling part 3130. In such embodiments, during training, the back propagation for updating thereby training the parameters of the neural network 3110, e.g., originating from the loss function, the references, and the output base calls, goes directly to the image processing part 3120 without going through the base calling part 3130 as shown in FIG. 31 as the dotted grey line with arrow. In other words, in such embodiments, the base calling part 3130 is not trained and the parameters in the base calling part 3130 are fixed. In such embodiments, the loss function may be based on the output of the base calling part 3130, and the value of the loss function may be determined based on the output of the base calling part.
[0260] After the neural network is trained, the input 3140 may go through the image processing part 3120 to generate the output images 3150. In some embodiments, the output images 3150 comprise the second plurality of flow cell images, e.g., disclosed herein in relation to methods 500. In some embodiments, the output images 3150 comprise high resolution post-processing images corresponding to the input images 3140. The output images may go through the base calling part 3130 to generate the base calls 3160. Alternatively, the output images 3150 may go through various base calling algorithms, e.g., non-neural network based traditional base calling algorithms, but not the base calling part 3130 for generating the base calls. In such embodiments, after the model is trained, only image processing part 3120 of the neural network is used for making predictions, but not the base calling part 3130 of the neural network. In such embodiments, the neural network 3110 advantageously reduces the time required to make predictions, and reduces the computational burden and power required to make the prediction comparing with existing neural networks that predicts base calls.
[0261] In some embodiments, the input images 3140 comprise raw flow cell images acquired at the imager 116. In some embodiments, the input images 3140 comprise the first plurality of flow cell images disclosed herein. In some embodiments, the input images 3140 may be from multiple color channels and multiple sequencing cycles. In some embodiments, the input images 3140 may be from multiple color channels and a single sequencing cycle. In some embodiments, the input images 3140 may be from a single color channel and multiple sequencing cycles. In some embodiments, the input images 3140 may be from a single z level or multiple z levels.
[0262] Continuing referring to FIG. 31, in embodiments of training the neural network 3110, e.g., using methods 700, references or ground truths 3180 can be used for comparison of the output base calls 3160, and the value of the loss function 3170 can be calculated based on such comparison. The value of the loss function then can be used during training for back propagation into the neural network 3110 for adjusting values of the parameters of the neural network 3110, e.g., gradients. In some embodiments, adjusting parameters of the neural network may include parameters of the base calling part 3130 and the image processing part 3120. In other words, both parts 3120, 3130 are trained during training of the neural network, e.g., using the training methods herein 700, 2900. In some embodiments, the value of the loss function then can be back propagated into the neural network 3110 for adjusting values of the parameters of only the image processing part 3120, but not the base calling part 3130. In other words, in such embodiments, only the image processing part 3120, but not the base calling part 3130, is trained during training of the neural network, e.g., using the training methods herein 700, 2900. In some embodiments, the neural network 3110 that is trained only on the image processing part 3120 is the second neural network in operation 520’. In some embodiments, the neural network 3110 that is trained on both the image processing part 3120 and the base calling part 3130 is the second neural network in operation 520’ .
[0263] In some embodiments, the second neural network in operation 520’ comprises a convolutional neural network. In some embodiments, the second neural network in operation 520’ comprises a recurrent neural network. In some embodiments, the second neural network in operation 520’ comprises a U-Net, residual U-Net, ResNet (residual neural network), and/or a LSTM (long short-term memory) neural network.
[0264] In some embodiments, the training flow cell images are acquired only from a same color channel. In some embodiments, each of the training flow cell images comprise flow cell images of a same field of view from a plurality of sequencing cycles stacked along a time dimension. The plurality of sequencing cycles may be of a same sequencing run. The plurality of sequencing cycles may be consecutive sequencing cycles in the sequencing run. In some embodiments, each of the training flow cell images comprise flow cell images of a same field of view from one or more sequencing cycles. In some embodiments, each of the training flow cell images comprise flow cell images of the sample at one or more z-levels. In some embodiments, the training flow cell images comprise flow cell images of the sample at multiple different field of views of the same sample. In some embodiments, the training flow cell images comprise flow cell images of the sample at multiple different field of views of one or more sample(s). The different field of views may be at the same x, y, or z location of the same sample. For example, the different field of views may be different subtitles of the sample at the same z location, but different x,y locations. For example, the multiple different views may be adjacent to each other, with none or at least some spatial overlap with other field of views. In some embodiments, each of the training flow cell images comprise flow cell images of different field of views (e.g., adjacent FOVs of the same sample) from a plurality of sequencing cycles stacked along one or two spatial dimensions.
[0265] In some embodiments, the training dataset for the neural network, e.g., first or second neural network, may only include flow cell images of the same color channel thereby the neural network is not trained on variations across different color channels that may be caused by differences in optical elements in response to different colored light signals (e.g., emission filter, illumination, etc.), differences in fluorescent dyes, or other factors of the sequencing system, etc. In some embodiments, such variations may cause but is not limited to cause different background levels, different signal to noise ratio, different artifacts in the field of view, different full width at half maximum (FWHM) of emission light signals, point spread function (PSF), etc. Training the neural network using flow cell images of the same color channel may advantageously remove fitting to variations across different color channels, and may simplify and speed up training the neural network and avoid possible errors in training.
[0266] In some embodiments, a different neural network is trained with flow cell images of a corresponding color channel. In other words, the neural network is trained to be a channel-specific neural network. In embodiments where 3 colors are used for sequencing, 3 different neural networks are trained using corresponding flow cell images of the corresponding color channels. Each channel-specific neural network is used for prediction of high resolution flow cell images of the corresponding color channel.
[0267] In embodiments where two or more channels are of the same color but of different wavelength ranges, a single neural network may be trained using flow cell images of such same colors from the two or more channels. Such neural network may be used to predict or make inferences of high resolution flow cell images from the two or more channels of the same color.
[0268] In some embodiments, when two or more channels are of the same color but of different wavelength ranges, a different neural network may be trained using flow cell images of a single channel. Each different neural network is a channel specific neural network that may be used for prediction or inference only of the corresponding channel.
[0269] In some embodiments, subsequent to operation 520’, the method 500 comprises an operation 530 of (iii) predicting, by the first reconfigurable device or an integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images is with a second resolution and corresponds to a corresponding image of the first plurality of flow cell images, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions.
[0270] In some embodiments, the operation 530 of (iii) predicting the second plurality of flow cell images using the neural network comprises predicting high resolution postprocessing images corresponding to the first plurality of flow cell images, and wherein the processing comprises various image processing or intensity processing steps. For example, the processing steps may comprise one or more of: noise reduction, background reduction; background removal; artifact removal; artifact suppression; intensity offset correction; intensity normalization; adjusting signal to noise ratio; adjusting contrast to noise ratio; color correction; phasing and/or dephasing; image registration; and deconvolution. In some embodiments, predicting high resolution post-processing images corresponding to the first plurality of flow cell images may advantageously allow a higher resolution and higher image quality version of the first plurality of flow cell images to be generated, and the higher resolution, higher quality version may be used for generating more accurate and reliable base calls. To achieve more accurate and reliable base calling, the second neural network of method 500, e.g., in operation 520’, may be trained using not reference flow cell images or reference intensities, but reference base calls as ground truths. The reference base calls may be generated using various methods including methods disclosed herein in relation to training neural network for predicting base calls herein. In some embodiments, the training, e.g., using methods 700, or other training methods herein, may optimize at least some of the parameters of the neural network for producing training base calls that are similar enough to the reference base calls (e.g., determined by the value of the loss function satisfying a predetermined criteria). As a result, the trained neural network may be used to predict high resolution post-processing images corresponding to the first plurality of flow cell images, and such high resolution post-processing images may be used to produce accurate and reliable base calls. Comparing with embodiments of method 500 with the operation 520 that predicts high resolution flow cell images, or other existing methods that predicts base calls directly, the embodiments of method 500 with the operation 520’ may improve base calling accuracy, reliability, and reduce computation complexity in the prediction, free up storage space, save power and time when compared with methods that predicts base calling directly. [0271] In some embodiments, training the second neural network in the operation 520’ for each corresponding color channel in comparison with training of first neural network in the operation 520 with flow cell images from multiple color channels may require less computations, require less power consumption, require less memory or data storage, reduce training time, and avoid possible training failures.
[0272] FIG. 30A shows an exemplary flow cell image of the first plurality of flow cell images. In this case, the exemplary flow cell image is of a 2D sequencing sample, and is acquired from one of the 4 different color channels. The image size is 608 pixels by 608 pixels. FIG. 30B is a high resolution image of the flow cell image in FIG. 30A and it is predicted using method 500 with the second neural network in the operation 520 herein. The neural network is pretrained using a training data set comprising training flow cell images. The neural network is pretrained using the training data set and reference base calls instead of reference intensities. The high resolution image has a size of 1216 by 1216 pixels, which provides 2x resolution of the flow cell image in FIG. 30A in x and y direction. The detectable polony density in high resolution image, e.g., FIG. 30B, is increased by at least 2x, 4x, or more than that in the first plurality of flow cell images, e.g., FIG. 30A. In this case, the neural network predicts the high resolution image with less background noise, blurriness, bright artifacts, etc. The high resolution image, in combination with other high resolution images from the other 3 color channels, can then be used together to determine base calls. In this case, the error rate in determining polonies, and the error rate in making base calls can be lower using the high resolution image in FIG. 30B than using the flow cell image in FIG. 30A. The prediction of high resolution images rather than prediction of base calls directly may advantageously reduce computational complexity, computation burden, power consumption, storage usage for performing base calls while maintaining or improving accuracy and reliability.
[0273] In some embodiments, the method 500 include an operation 540 (iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images. In some embodiments, determining the polonies comprises determining locations of the polonies, locations of the center of the polonies, size of the polonies, or a combination thereof. In some embodiments, the location of the polonies, or the locations of the center of the polonies may be 2D or 3D. In some embodiments, the polonies excludes duplicate polonies. In some embodiments, the method 500 comprises an operation 550 of (v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[0274] In some embodiments, the second neural network in the operation 520’, when it is being trained, may include one or more layers, e.g., convolutional layers, for generating base calls from the high resolution post image processing images. In such embodiments, the second neural network, after it is trained, in operation 520’ may utilize only a subset of the layers of the second neural network being trained since the neural network only predicts the high resolution post image processing images but not the base calls. In such embodiments, the neural network, after it is trained, in operation 520’ may utilize only a subset of the layers of the neural network in operation 520’ since the neural network only predicts the high resolution post image processing images but not the base calls. In some embodiments, the pretrained second neural network using reference base calls may have a first number of layers, while a second number of the layers in the pretrained second neural network is used in operation 530 in predicting the high resolution flow cell images. The second number of layers is less than the first number of layers. For example, the pretrained second neural network may have 5 layers with the first 4 layers for predicting high resolution post image processing images and the last layer for predicting base calls. In the operation of 530, only the first 4 layers of the pretrained second neural network is used. The operation 550 may use the last layer of the pretrained second neural network.
[0275] In some embodiments, the second neural network in operation 520’ utilizes the same number of layers as the second neural network being trained. In such embodiments, the neural network in operation 520’ may lack any neural network layers that is specific for generating base calls based on the high resolution post image-processing images. In such embodiments, the neural network in operation 520’ may rely on other non-neural network based algorithms or software for base calling of the high resolution post imageprocessing images. For example, the pretrained second neural network may have 5 layers with the first 4 layers for predicting high resolution post image processing images and the last layer for predicting base calls. In the operation of 530, only the first 4 layers of the pretrained second neural network is used. The operation 550 may use non-neural network based algorithms or software for base calling.
[0276] In some embodiments, the neural network in operation 520’ has fewer number of convolutional layers than the number of convolutional layers in the neural network in operation 520. In some embodiments, the second neural network in operation 520’ has the same number of layers as the first neural network in operation 520.
[0277] In some embodiments, the neural network herein has less than or equal to 18, 15, 12, 10, 8, 7, 6, 5, 4, 3, or 2 layers. In some embodiments, the neural network herein has 6, 5, 4, 3, or 2 layers. In some embodiments, the neural network has less than 256, 128, 96, 80, 64, or 32 features.
[0278] In some embodiments, the method 500 is performed during a cycle N, cycle N may be one of the reference cycle(s) for generating the polony map. In some embodiments, cycle N may be a cycle different from the reference cycle(s). The polony map can be generated in the reference cycle(s) as a subsequent operation after the methods herein have improved the detectable polony density in flow cell images. Polonies from one or more channels within the reference cycle(s) can be included in the polony in a reference coordinate system, while base calling of cycle N is yet to be performed. In some embodiments, cycle N is the current cycle. N can be any non-zero integer. For example, for short read sequencing, N can be any integer from 1 to 150, from 1 to 200, or from 1 to 1000.
[0279] In some embodiments, the polony map disclosed herein can include individual regions within a subtile or tile. Each polony map can include a plurality of polonies therein. In some embodiments, the polony map can be of about the same size of a flow cell image so that all the polonies, from different tiles, and from multiple channels, can be registered to the same polony map. However, such polony map may contain polonies that will not be used in at least some operations described herein to reduce computational burden without sacrificing accuracy. In some embodiments, more than one polony map can be generated, and each corresponds to at least part of a subtile of a flow cell image from a channel. The more than one polony map may be tiled together in order to cover the entire sample region of the flow cell device.
[0280] In some embodiments, the polony map disclosed herein can include polonies that are within individual cells or tissue, or on the membrane thereof. In some embodiments, the polony map disclosed herein can exclude polonies or signal spots that are outside cell boundaries. In some embodiments, the polony map disclosed herein can exclude duplicate polonies, such duplication may occur at different z-locations, with one or more in-focus and/or out-of-focus in the flow cell images. The duplicate polonies may be within the same flow cell image or in different flow cell images. [0281] The polony map herein can be initialized as a virtual image that has a black or dark background with no signals from polonies. For example, the polony map can be initialized to be zero or include otherwise minimal image intensity at all pixels.
[0282] After the coordinates of a polony is determined by image registration of flow cell images, e.g., across different channels, the intensity of the polony can be added to the polony map at the location determined by the coordinates and with the size and shape determined based on registration. The polony map can be a virtual image that combines image intensity from polonies obtained from 2, 3, 4, or even more channels at the reference cycle. The pixels of the template containing no polonies in them remains to be black or dark so that the polony map can have a cleaner background without noise that appear in actual flow cell images. In some embodiments, the polony map includes a list of entries, and each entry corresponding to information for identifying a corresponding polony. For example, each entry can include spatial coordinates of the corresponding polony center in the reference coordinate system, and image intensity of the polony. The entry may also include a unique identification number of the polony.
[0283] The polonies can be from a subtile of flow cell images within a reference cycle, and more specifically, from one or more selected regions of the subtile. The flow cell images can be from different channels of 1, 2, 3, 4, or more channels of the system 100. As a nonlimiting example, a reference cycle can be any cycle of the first 5 or 6 cycles. In some embodiments, the reference cycle can be any cycle that is greater than 0. In some embodiments, the reference cycle is the first cycle.
[0284] In some embodiments, the operation 540 comprises performing image processing step(s) to adjust image intensities of polonies. In some embodiments, the image processing steps comprise one or more of the following: background subtraction; image sharpening; intensity offset adjustment; color correction; intensity normalization; phasing and prephasing correction; image registration; quality score estimation, or the like. In some embodiments, the image registration is configured to align images from different cycles and/or different channels, for example, with respect to a template image (i.e., a polony map) or a reference coordinate system. In some embodiments, the image registration herein is configured to register polonies or clusters from different cycles and different channels, to a template image or a reference coordinate system.
[0285] In some embodiments, the second plurality of flow cell images may be the output of the neural network. The second resolution may be 2 to 32 times greater than the first resolution in one or more spatial dimensions. The second resolution may be 4 to 32 times greater than the first resolution in 2D or 3D.
[0286] In some embodiments, the operation 540 is based on a polony map that has been generated. The polony map may be 2D or 3D. In some embodiments, the polony map has the second resolution. In some embodiments, the operation 540 comprises generating a polony map, and determining the polonies based on the generated polony map. The details of generating a 2D or 3D polony map has been disclosed in U.S. Patent Application Nos. 18/078,820 and 18/078,797, and are incorporated herein by reference in their entirety.
[0287] For example, the base calling can be performed using polony locations in the second plurality of flow cell images from different channels in cycle N, after the second plurality of flow cell images from different channels are registered relative to the polony map disclosed herein. Various existing 2D base calling algorithms can be used. The base calling results can be saved with its 3D coordinates. Such 3D coordinates can be used to register the base calling across different cycles and at different z levels.
[0288] The method 500 can comprise an operation 550 of (v) performing, by the processor, a corresponding base calling for each of the determined polonies. The operation 550 of performing base calling may be based on the second plurality of images generated in operation 530. The operation 540 may be further based on the determined polony map in operation 540. The base calling can be performed using intensity of the polonies from different channels per cycle per z level.
[0289] In some embodiments, the method 500 may include an operation of saving the base calls obtained in operation 550 in a predetermined format, e.g., in a FastQ file compatible with subsequent operations so that subsequent analysis such as adaptor trimming and secondary analysis can be performed.
[0290] In some embodiments, the neural network is a convolutional neural network (CNN). In some embodiments, the neural network is a U-Net. In some embodiments, the neural network comprises a U-Net with a first predetermined repetition of down-sampling and convolution operations and then a second predetermined repetition of up-sampling, concatenation, and convolution operations. The first and second predetermined repetition can have an identical quantity, e.g., 3 or 4. In some embodiments, the neural network is a U-Net with a first predetermined number of filters in each repetition of down sampling, and then a second predetermined number of filters in each repetition of up sampling and/or concatenation. For example, the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 128, 64, 64, and 32 filters in the corresponding three repetitions. As another example, the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 256, 128, 64, and 32 filters in the corresponding three repetitions.
[0291] In some embodiments, the operation 530 may comprise: performing, by the processor, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising: (a) performing, by the processor, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (b) performing, by the processor, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result. In each repetition, the second convolution may comprises a corresponding number of filters, thereby generating a third convolution result after the repetitions.
[0292] In some embodiments, the operation 530 may further comprise: performing, by the processor, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (c) performing, by the processor, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (d) performing, by the processor, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result. In each repetition, the second convolution may comprise a corresponding number of filters, thereby generating a sixth convolution result after the repetitions.
[0293] In some embodiments, the first convolution comprises a 3D convolution with a convolution kernel. In some embodiments, the convolutional kernel may have 4 dimensions. In some embodiments, the convolutional kernel is m*m*m for the first three spatial dimensions and the size of its fourth dimension is determined by the filter number in the corresponding repetition. In some embodiments, m can be an integer in the range of 2 to 20. For example, the input can be 512x512 flow cell images, and the z-stack can have 12 slices. The first convolution can include 32 filters and each filter has one kernel that is 3x3x3xl. The output from that convolutional block is 512x512x12x32. Then there is a double convolutional block, i.e., the second convolution having two first convolutions with 32 filters. The input to both of those blocks is 512x512x12x32 and the output is 512x512x12x32. Each filter uses a kernel sized 3x3x3x3x32. The number of filters may correspond to features of the input.
[0294] In some embodiments, the second convolution comprises two 3D convolutional layers, e.g., as shown in the pseudo code. In other words, the second convolution comprises two repetition or blocks of the first convolution in 3D, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image. The depth of image may increase as the number of features or filters increases. In some embodiments, the first and second resolution is in 2D or 3D.
[0295] In some embodiments, the first convolution comprises a 2D convolution with a convolution kernel. In some embodiments, the convolutional kernel may have 3 dimensions. In some embodiments, the convolutional kernel is m x m for the first two spatial dimensions and the size of its third dimension is determined by the filter number in the corresponding repetition. In some embodiments, m can be an integer in the range of 2 to 20. For example, the input can be flow cell images with a size of 512x512x1. The first convolution can include 64 filters and each filter has one kernel that is 3x3x1. The output from that convolutional block is 512x512x64. Then there is a double convolutional block, i.e., the second convolution having two first convolutions with 32 filters. The input to both of those blocks is 512x512x64 and the output is 512x512x32. Each filter can use a kernel sized 3x3x32.
[0296] In some embodiments, the second convolution comprises two convolutional layers, e.g., as shown in the pseudo codes. In other words, the second convolution comprises two repetition or blocks of the first convolution, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image. The depth of image may increase as the number of features or filters increases. In some embodiments, the first and second resolution is in 2D or 3D.
[0297] In some embodiments, the second convolution in operation (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively. In some embodiments, the second convolution in operation (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively. In some embodiments, n can be an integer in the range from 8 to 256. For example, operation (a) comprises 32, 64, 128, and 256 filters in three repetitions and operation (c) comprises 128, 64, 64, and 32 filters in the corresponding three repetitions.
[0298] In some embodiments, the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively. For example, operation (a) comprises 32, 64, 128, and 256 filters in four repetitions and operation (c) comprises 256, 128, 64, and 32 filters in the corresponding four repetitions.
[0299] In some embodiments, the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n filters in a last repetition, last minus one, last minus two, repetition, respectively. For example, operation (a) comprises 32, 64, 128 filters in three repetitions and operation (c) comprises 128, 64, and 32 filters in the corresponding three repetitions.
[0300] In some embodiments, the operation 530 may further comprise: performing, by the processor, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and predicting, by the processing, the second plurality of flow cell images based on the seventh convolution result. Each of the second plurality of flow cell images may correspond to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions. In some embodiments, the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
[0301] In some embodiments, the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles. In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 102 -1015 per mm2. In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 103 -1010 2 per mm .
[0302] In some embodiments, the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the first resolution is in a range of 0.01 um to 10 um. In some embodiments, the second resolution is in a range of 0.02 um to 2 um. In some embodiments, the second resolution is in a range of 0.001 um to 3 um. In some embodiments, the down-sampling factor is 2, 4, 6, 8, 16, or more. In some embodiments, the up-sampling factor is 2, 4, 6, 8, 16, or more.
[0303] In some embodiments, one or more of operations (ii) to (v) are performed while a sequencing run is being performed. In some embodiments, one or more operations (ii) to (v) are performed in parallel as the corresponding sequencing run to reduce sequencing analysis time.
[0304] In some embodiments, the one or more cycles comprises a current cycle N. N may be in a range from 1 to 150, 1 to 300, 1 to 500, or 1 to 1000. In some embodiments, one or more of operations (ii) to (v) are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
[0305] In some embodiments, the training data set of training flow cell images comprises z-stacks of training flow cell images taken at different z-locations. Each z-stack may represent an individual FOV of cellular sample(s). In some embodiments, the z-axis is orthogonal to image planes of the flow cell images.
[0306] In some embodiments, the training data set of training flow cell images comprises flow cell images from multiple sequencing cycles. One or more sequencing cycles may be of unbalanced diversity so that image appear dimmer or the number of polonies are less than images from sequencing cycles of high nucleotide diversity. In other words, the number of polonies in the training flow cell images in a particular cycle may vary from 1% to 99% of a total number of polonies within a FOV of that cycle. When the number of polonies in the training flow cell image of a particular cycle is from 1% to 5% or 1% to 10% of the total number of polonies within that cycle, it is of low or unbalance diversity. When the number of polonies in the training flow cell image of a particular cycle is greater than 10% or 15% of the total number of polonies within that cycle, it is of high or unbalanced diversity.
[0307] In some embodiments, the training data set of training flow cell images comprises flow cell images from multiple samples and multiple sequencing cycles, and the training flow cell images include a subset of flow cell images with unbalanced diversity in multiple sequencing cycles and another subset of flow cell images with balanced diversity in multiple sequencing cycles. [0308] In some embodiments, the training flow cell images from one or more cycles may be transformed from other training flow cell images from different cycle(s) to simulate the transformation that may occur across cycles within a same color channel.
[0309] In some embodiments, the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, operation (a) comprises performing, by the processor, the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
[0310] In some embodiments, the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, operation (a) comprises performing, by the processor, the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
[0311] In some embodiments, repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing, for one or more times, operations comprising (c), (d), and (e), wherein (e) is after operation (c) and before operation (e), and wherein (e) comprises: concatenating, by the processor, the first up- sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous down-sampling repetition. In some embodiments, operation (e) is in each repetition. In other words, repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing operations comprising (c), (d), and (e) in each repetition of one or more repetitions.
[0312] The kernel may take any size that is smaller than the size of the flow cell image undergoing the convolution. For example, with an opening operation, the kernel can be 2 by 2 by 2, 3 by 3 by 3, 4 by 4 by 4, 5 by 5 by 5, or 6 by 6 by 6 in the first three spatial dimensions. In some embodiments, the kernel size can be customized to remove at least some of the noise and unwanted signal that are larger than the kernel size. In some embodiments, the kernel can be circular. The kernel can be in various other shapes. [0313] In some embodiments, when the focus of the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis. Polonies or clusters that are within the range of focus can appear in-focus or about in-focus in the flow cell image. Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image, but at different z levels. Such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurred signal spots represent out-of-focus polonies or clusters. Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
[0314] Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample. The undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria. Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour. In some embodiments, background objects can include any objects within the 3D sample but are not polonies or clusters.
[0315] In some embodiments, the method 500 include an operation of registering the second plurality of flow cell images. In some embodiments, the images are registered across channels and/or across different cycles. In some embodiments, the images are registered before any base calling are performed in operation 550. In some embodiments, the images are registered across channels and different cycles before generating or obtaining the polony maps. In some embodiments, the images are registered across channels and different cycles before one or more primary analysis steps here. In some embodiments, the images can be registered after one or more preprocessing operations disclosed herein are performed. Various image registration techniques can be used to register the images. Various image registration techniques can be used to register the images. The images can be registered using 2D or 3D registration techniques.
[0316] In some embodiments, the operation of registering the flow cell images is with respect to a reference coordinate system. In some embodiments, the operation of registering the flow cell images is with respect to one or more template images. The operation of registering the images can comprise generating the one or more template images in a reference coordinate system. In some embodiments, the operation of registering the images can comprise registering polonies to template polonies in the one or more template images. The operation of registering the images can comprise determining a plurality of transformations based on the one or more template images. Each of the plurality of transformations can corresponds to a corresponding subtile of the flow cell images, the processed images, or the filtered images and configured to register the subtile to the one or more template images. Each transformation can be used to register a corresponding subtile or tile to the one or more template images. The plurality of transformations can comprise one or more affine transformations.
[0317] In some embodiments, the operation of registering the images can comprise performing image registration of the polonies based on fiducial markers. The fiducial markers can be located on the flow cell. Alternatively, the fiducial markers can be external to the flow cell.
[0318] In some embodiments, the image registration as an image processing step herein is configured to align images from different cycles and/or different channels, for example, with respect to a template image or a reference coordinate system. In some embodiments, the image registration herein is configured to register polonies or clusters from different cycles and/or different channels, e.g., in the filtered image, to a template image or a reference coordinate system.
[0319] For example, the base calling can be performed using the filtered images from different channels in cycle N after the filtered images from different channels are registered relative to the corresponding template image disclosed herein.
[0320] The operation 540 can comprise an operation of extracting polony intensities based on the polony map. For each polony in the polony map, the location information of such polony can be obtained from the polony map, e.g., 2D coordinates of the polony and the z level. Using the 2D coordinates and the z level, the corresponding flow cell image and its pixel(s) can be determined. Image intensity of such pixels can be extracted from the corresponding processed image after one or more image processing steps as intensity of such pixel for performing base calling.
[0321] In some embodiments, the operation of registering the flow cell images may be based on background objects in the flow cell images. The background objects can be used to align the flow cell image to the cell images by using one or more transformation(s).
The cell staining images herein are staining images of the sample(s) immobilized on the support, with possible transformation (e.g., translation) from the sample(s) in the flow cell images. The transformation may be represented by a single transformation of the whole image or be separated into multiple transformations, each representing a portion of the whole image. After finding the transformation(s) of the background objects between the flow cell images and the cell staining images, the polonies or clusters can be registered to the cell staining images.
[0322] In some embodiments, the method 500 may include an operation of registering the base calling in 550 to the cell staining images. In some embodiments, such registration may be based on fiducial markers. Such fiducial markers can also be included in the cell staining images. Aligning the fiducial markers can generate the transformation(s) between the flow cell images or between flow cell images and cell staining images. The transformation(s) can be used to register or align polonies or clusters between the sequencing images and the cell images.
[0323] As an example, the simulated z-stack is 2048x2048x3, each cell may include 200 to 2000 polonies per cell. The spatial resolution can be about 0.1 um. Prediction is performed independently for each 512x512 region of the simulated z-stack. The predicted high-resolution z-stack is 8192x8192x12. FIGS. 2A-2C show simulated flow cell images, and two different predicted flow cell images with 4x resolution at different z-locations. FIGS. 3 A and 3D show two actual flow cell images at different z-locations in a 512x512x3 z-stack. The predicted high resolution flow cell images (2048x2048) in FIGS. 3B-3C are at two different z-locations corresponding to the low resolution image in FIG. 3A.
Exemplary pseudo code for predicting high resolution flow cell images
[0324] An exemplary neural network is shown below in the pseudo code. The neural network may be used to predict polony locations using z-stack(s) of flow cell images comprising flow cell images from multiple z-levels forming 3d volume(s). In some embodiments, m is in a range from 2 to 10, filters can be in a range from 8 to 1024, k size and be in 4 dimensions, and the fourth dimension of k size can match the number of filters in the corresponding repetition. The input flow cell images can have various sizes in 3D as disclosed herein, e.g., 1024 by 1024 by 4. def conv block(x, filters, k size) x=conv3D (filters, (k size, k size, k size)) (x) x = tf.keras. filters. ReLu()(x) def double conv (): x=conv block(x, filters, k size) x=conv block(x, filter s,k size) inputs= input (dim x, dim y, dim z, 1) bi = conv block (inputs, filters, k size) b2 = double conv (bi, filters* 2, k size) % repeating a first predetermined number of down sampling and convolutions for n= 2:m dn= downsampling3D(bn) bn+i = double conv(dn, filter s*2n+1 , k size)
% repeating a second predetermined number of down sampling, concatenation, and convolutions for n = l:m-l un = upsampling3D(bm+n) catsn= concatenate (un, bm+i-n) bm+n+1 = double conv(catsn, filter s* 2m~n~ 1 , k size) b2m+i = conv block b 2m, filters, k size)
Exemplary pseudo code for predicting high resolution flow cell images
[0325] An exemplary neural network is shown below in the pseudo code. The neural network may be used for predicting polony locations based 2D flow cell images at different z-levels. In some embodiments, m is in a range from 2 to 10, filters can be in a range from 8 to 1024, k size can be in 3 dimensions, and the third dimension of k size can match the number of filters in the corresponding repetition. The input flow cell images can have various sizes in 2D as disclosed herein, e.g., 1024 by 1024, and there can be 3, 4, 5, or other numbers of z-levels. def conv block(x, filters, k size) x=conv2D (filters, (k size, k size)) (x) x = tf.keras. filters. ReLu()(x) def double conv (): x=conv block(x, filters, k size) x=conv block(x, filter s,k size) k size = n filter num = input [0] def u_net(): inputs = Input() b2 = double conv(inputs, filter num, k size)
% repeating a first predetermined number of down sampling and convolutions for n= 2:m dn-i = MaxPooling2D(bn) bn+i = double conv(dn-i, filter s*2n l , k size)
% repeating a second predetermined number of down sampling, concatenation, and convolutions for n = 2:m-l un = upsampling2D(b m+n- 2) catsn= concatenate (un+i, bm-n+i) bm+n-l = double conv(catsn, filter s* 2m~n~ 1 , k size) model = tfkeras.Model(inputs, outputs)
Predicting base calls using neural networks
[0326] In some embodiments, the methods and systems herein can be used to predict base calls for some or all polonies of the flow cell images. The systems and methods herein advantageously use a neural network that is pretrained for predicting the base calls for polonies of flow cell images. The same neural network may also be advantageously used, without additional training, to generate a polony map or a template image so that the locations of the predicted base calls can be determined. The embodiments herein used convolutional neural network as an example, however, it is understood that various other neural networks or machine learning models may also be used achieve prediction of base calls using the systems and methods herein.
[0327] In some embodiments, the methods for predicting base calls may include one or more operations here. When there are multiple operations involved, such operations may or may not be performed in the order that is described herein.
[0328] FIG. 28 shows a flow chart of a computer-implemented method 2800 for predicting base calls for flow cell images of biological samples, e.g., cellular samples, thereby enabling efficient and accurate primary analysis. The method 2800 can include some or all of the operations disclosed herein. The operations may be performed in but is not limited to the order that is described herein. [0329] The method 2800 can be performed by one or more processors disclosed herein. In some embodiments, the processor can include one or more of: a processing unit, e.g., a CPU, a reconfigurable logic device, an integrated circuit that is not reconfigurable, or their combinations. For example, the processing unit can include a central processing unit (CPU). The reconfigurable logic device can include one or more FPGA devices. The integrated circuit can include a chip such as an Al chip or an ASIC chip. In some embodiments, the processor can include the computing system 400.
[0330] In some embodiments, some or all operations in method 2800 can be performed by the reconfigurable logic device, e.g., the FPGA(s), and/or the integrated circuit, e.g., the Al chip(s). In embodiments when some operations are performed by the reconfigurable logic device and/or integrated circuit, e.g., FPGA(s), the data produced by the reconfigurable logic device and/or integrated circuit, e.g., the FPGA(s), after performing one or more operations can be communicated to various hardware elements of the system 100, e.g., CPU(s) or GPU(s), so that subsequent operation(s) in method 500, 600, 700, 2800, and 2900 can be performed by such various hardware using the communicated data. Similarly, data can also be communicated in the opposite direction from various hardware e.g., CPU(s), to the reconfigurable logic device or the integrated circuit for processing. In some embodiments, all the operations in the methods herein can be performed by CPU(s). Alternatively, the operations performed by CPU(s) can be performed by other processors such as the dedicated processors, or GPU(s). In some embodiments, all the operations in the methods herein can be performed by the reconfigurable logic device and/or the integrated circuit, e.g., FPGA(s) and/or the Al chip(s).
[0331] In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit, e.g., via DMA connections. In some embodiments, the sensor data acquired by the imager 116 may be directly communicated to the reconfigurable logic device and/or the integrated circuit without being routed first to a CPU, a GPU, or any other processing units before reaching the reconfigurable logic device and/or the integrated circuit.
[0332] In some embodiments, making predictions or inferences using the methods 2800 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips, may require at least 2x, 8x, lOx, 15x, 20x, 40x, 50x, or lOOx less power than making prediction(s) or interference(s) with the same neural network(s) with identical training images using other computing hardware including but not limited to CPUs or GPUs.
[0333] In some embodiments, the sequencing system herein further comprises: a power source that is configured to supply identical or different power levels to the reconfigurable logic device and the integrated circuit. In some embodiments, a maximum power output of the power source to the sequencing system in performing methods 500, 600, 700, 2800, and/or 2900 is less than 2000 Watts, 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, 300 Watts, 200 Watts, or 100 Watts.
[0334] The method 2800 can comprise an operation 2810 of (i) generating, by the sequencing system 110, a first plurality of flow cell images of sample(s) immobilized on a support by conducting one or more cycles of sequencing reactions.
[0335] The sample(s) may be traditional 2D sequencing samples containing biological analytes. The sample(s) may be cellular or tissue samples. The samples may comprise concatemer molecules therewithin. The sample(s) may include concatemer molecules from one or more different sample sources. The sample(s) may include a thickness along the z-axis so that the first plurality of flow cell images may be acquired at a z-stack of different z-locations with a first resolution to cover the cellular sample in 3D.
[0336] The sample can be in situ. The sample can be a 3D sample. The sample can be a volumetric sample that may contain different biological information at the same x-y location but different z level. The sample can include multiple cells, tissue, or their combinations. The 3D sample can be any biological sample that has a thickness that is greater than a predetermined threshold along the z axis. For example, the thickness can be greater than 1 um, 2 um, 3 um, 4 um, 5 um, 10 um, 20 um, or more. The z axis (e.g., z axis) is orthogonal to the image plane defined by x and y axes. In some embodiments, the sample can be traditional 2D sequencing samples.
[0337] The flow cell images can be acquired using the optical system of the imager 116 disclosed herein, from the 1, 2, 3, 4, or more channels. Each flow cell image can include at least a portion of one or more tiles (e.g., imaging areas), and each tile can be divided into multiple subtiles. Each tile or subtile can include a plurality of polonies or clusters. Each subtile can include multiple regions with each region including a number of polonies. The flow cell image as disclosed herein can be an image that is acquired from a flow cell 112 as shown in FIG. 1 or 2712 as shown in FIG. 27. In some embodiments, the flow cell images are acquired from a single color channel, and subsequent prediction is by using a pretrained neural network corresponding to that single channel. In some embodiments, the flow cell images are acquired from 2, 3, 4, or more color channels, and subsequent prediction is by using a pretrained neural network corresponding to the multiple color channels.
[0338] In some embodiments, a flow cell image herein can be an image of one or more tiles, one or more subtiles, one or more segmented regions within tile(s) or subtile(s), or their combinations. Each flow cell image can comprise a field of view (FOV). The FOV can be orthogonal to the z axis. The FOV can be within the x-y plane. The FOV of different flow cell images at different z levels can be identical within the x-y plane. The FOV of different flow cell images at different z levels can have at least an overlapping portion within the x-y plane. The image resolution of different flow cell images at different z levels can be about identical or exactly identical. In some embodiments, The image resolution of different flow cell images at different z levels is different. FIGS. 3A and 3D show two exemplary flow cell images acquired at two different z levels along the z axis of a same 3D sample within a same sequencing cycle. The FOV can be in 3D and be of various sizes to cover the volumetric sample to be imaged. The FOV along x, y, and/or z direction can be in a range from 10 um to 5 mm. The FOV along x, y, and/or z direction can be in a range from about 0.1 um to about 2 mm. The FOV along x, y, and/or z direction can be in a range from 0.5 um to 1 mm. For example, the FOV can be about 0.5 mm by 0.5 mm by 20 um for certain cellular samples along the x, y, and z direction, respectively.
[0339] The flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be any integer greater than 64 or 128. The flow cell images herein may be of various sizes, the pixel number along x, y, and/or z axis may be in a range from 2 to 65536. A single flow cell image can be separated into different number of regions, for example, 4, 8, 16, or even more regions, and each region may include a size of 256 by 256 by 1, 512 by 512 by 3, or other sizes. In some embodiments, the number of pixels along x, y, and/or z direction may be adjusted to maintain a particular spatial resolution in a given FOV. For example, with a spatial resolution of 0.2 um, to cover a FOV of 0.8 mm, the number of pixels may be 4000.
[0340] Each flow cell image at a specific z level may include intensities generated by polonies or clusters at the corresponding z level. As shown in FIGS. 3 A and 3D, signals from polonies or clusters are small bright spots within the images. Each bright spot can be of various sizes that is less than a couple of pixels, e.g., less than a pixel, about a pixel, about 2 pixels, 3 pixels, 4, pixels, 5 pixels, or more. In some embodiments, each signal spot of the polonies or clusters can be any number of pixels in the range from 0.01 pixel to about 100 pixels. In some embodiments, each signal spot of the polonies or clusters can be any number of pixels in the range from 0.1 pixel to about 16 pixels.
[0341] Each flow cell image can also include intensities generated by the cell and its structural elements. Such structural elements can be background objects or components, e.g., in FIG. 3 A. Each flow cell images can also include noise and/or artifacts that are not from the polonies or cellular structures.
[0342] In some embodiments, when the depth of field the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis. Polonies or clusters that are within the range of depth of field can appear in-focus or about in-focus in the flow cell image. Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image. Such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurry signal spots represent out-of-focus polonies or clusters. Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
[0343] Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample. The undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria. Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour. In some embodiments, background objects can include any objects within the 3D sample but are not polonies or clusters.
[0344] In some embodiments, base calls from the polonies include 4 different bases, and percentage of polonies for each of the 4 different bases can be greater than about 10% so that the data are relatively diverse. In some other embodiments, bases called from the plurality of polonies includes 4 or less different bases, and percentage of polonies for one or more bases can be less than about 10%, and such data can be considered as data of unbalanced diversity. In some embodiments, bases called from the plurality of polonies include 4 or less different bases, and percentage of polonies for some of the bases can be less than about 5%, about 2%, or even about 1%, and such data can be considered as data of unbalanced diversity. As an example, the base called for bases A, T/U, C, G in the plurality of polonies can be about 1%, about 2%, about 1%, and about 95%. As another example, the base called for bases A, T/U, C, G in the plurality of polonies can be about 10%, about 10%, about 10%, and about 70%, respectively. In addition to the base biases affecting diversity, plexity can also be a factor that when plexity is lower than a number, e.g., 8 or 16, the signal could be of unbalanced diversity . The method 2800 is configured to predict base calls of flow cell images, e.g., of a first resolution, even if the polonies in the flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles, and the base calls may be spatially aligned to the polonies of the flow cell images, of a second resolution. The second resolution may be higher than the first resolution.
[0345] In some embodiments, the method 2800 comprises an operation 2802 of (ia) generating, by a processor or a first reconfigurable logic device, a second plurality of flow cell images comprising a second resolution. In some embodiments, each of the second plurality of flow cell images corresponds to a corresponding flow cell image of the first plurality of flow cell images. The second plurality of flow cell images may be generated using various up-sampling algorithms including but not limited to interpolation. The second resolution may be greater than the first resolution in one or more spatial dimensions. The second resolution may be at least 2 times greater than the first resolution in one or more spatial dimensions. The second resolution may be 2 to 32 times greater than the first resolution in one or more spatial dimensions. The second resolution may be 4 to 64 times greater than the first resolution in one or more spatial dimensions, e.g., along x, y, and/or z direction. The second resolution may be at least 2 to 32 times greater than the first resolution in one or more spatial dimensions. The second resolution may be at least 4 to 64 times greater than the first resolution in one or more spatial dimensions.
[0346] In some embodiments, the method 2800 comprises an operation 2804 of (ii) providing, by a processor, the second plurality of flow cell images as an input to a neural network, e.g., a convolutional neural network (CNN), wherein the neural network is pretrained using a training data set of training flow cell images using a training method disclosed herein, e.g., 600, 700, 2900 herein. In some embodiments, the neural network is pre-trained so that the values of parameters (e.g., weights) of the neural network has been optimized based on the training. The neural network may be retrained when needed, for example, for predicting flow cell images from different cellular samples. [0347] In some embodiments, the method 2800 may include image processing step(s) that can be performed on the first or second plurality of flow cell images, optionally prior to providing any input to the neural network. The processing step(s) may include: intensity normalization, background subtraction, background removal, artifact reduction, artifact removal, adjustment of signal to noise ratio, adjustment of contrast to noise ratio, color correction, adjusting intensity offset, image registration, phasing and prephasing, filtering, segmentation, noise reduction, deconvolution (e.g., to differentiate neighboring or at least partly overlapping signal spots), or a combination thereof.
[0348] In some embodiments, the method 2800 comprises an operation 2804’ of providing, by the processor, the first reconfigurable logical device, or the integrated circuit, the first or the second plurality of flow cell images to a polony map generation algorithm or a base calling algorithm. In some embodiments, the polony map generation algorithm and the base calling algorithm does not include a trained neural network or an artificial intelligence-based algorithm. In some embodiments, the polony map generation algorithm and base calling algorithm does not include a trained neural network or an artificial intelligence-based algorithm. Exemplary polony map generation algorithms for generating 2D or 3D polony maps and base calling algorithms for generating base calls have been disclosed in U.S. Application No. 18/078,797 and 18/078,820, and U.S. Patent No. 10,266,888, and are incorporated herein by reference in their entireties.
[0349] In some embodiments, the method 2800 comprises an operation 2806 of (iia) of determining, by the first reconfigurable device or the integrated circuit, the polony map based on the second plurality of flow cell images. The operation 2806 can be based on the operation of 2804 in some embodiments, and based on the operation of 2804’ in some other embodiments.
[0350] In some embodiments, the polony map is 3D. In some embodiment, the 3D polony map includes multiple 2D polony maps at different z levels. In some embodiments, the 3D polony map has the second resolution. In some embodiments, generating a polony map using a polony map generation algorithm. In some embodiments, the polony map generation algorithm lacks any neural network or artificial intelligence based algorithms. In some embodiments, the polony map generation algorithm lacks any neural network that has been pretrained and can predict base calls in operation 2812 without additional training for predicting the polony map. In some embodiments, the polony map generation algorithm utilize traditional algorithms that lacks artificial intelligence. [0351] In some embodiments, the neural network in operation 2804-2806 is the same pretrained neural network used in operation 2812. In some embodiments, the same pretrained neural networks may include identical parameters, layers, and neural network structures therewithin. In some embodiments, the same pretrained neural networks may include an identical number of parameters, an identical number of layers, and neural network structures therewithin. In some embodiments, the method 2800 may further comprise an operation to train the neural network before operations 2804 and 2806. In some embodiments, the neural network is trained before operation 2804 and 2806, e.g., using method 700 or 2900 disclosed herein. In some embodiments, the pretrained neural network may be used to predict polony locations, polony shape and/or size, polony center locations, or equivalently the polony map. In some embodiments, the operation 2800 may further include one or more operations in method 500, e.g., operation 530 and 540, and/or 550 for predicting locations of the polonies, thus predicting the polony map.
[0352] In some embodiments, the same neural network used in operations 2804, 2806, and 2812 may be trained using identical training data including identical flow cell images of samples. The identical training data may also include identical “ground truths” or references in training. As a result, after training, the same neural networks may comprise identical values for parameters, identical number of layers, and identical neural network structures.
[0353] In some embodiments, the same neural network used in operations 2804, 2806, and 2812 may be trained using at least a different portion of the identical training data. As a result, the same neural networks may comprise identical parameters with identical or different values for such parameters, identical layers, and identical neural network structures. Training of the same neural network may be performed before operation 2804 and does not require retraining the neural network after operation 2806 and before operation 2812. The pretrained neural network may then be used in operations 2804-2806 and operation 2812 without retraining to allow fast and efficient prediction of the base calls using methods 2800.
[0354] In some embodiments, the pretrained neural network may be used in operations 2804-2806 to update an existing polony map. The existing polony map may be generated in an earlier cycle of the sequencing run. The predicted polony map using the pretrained neural network may be used to update the existing polony map in a later cycle of the sequencing run. For example, an initial polony map may be generated by a non-neural network algorithm in the first cycle or first several cycles, e.g., cycles 1-4, of the sequencing run. The neural network may be trained using data of the first cycles or a number of cycles, e.g., cycles 1-4 or cycles 1-5. The pretrained neural network then can be used to predict a second polony map that can be used to update the initial polony map. The second polony map may advantageously reselect more accurate and reliable locations of the polonies for making predictions of base calls, intensities, or classifications, e.g., in operation 2812. Such prediction may be repeated by training the neural network with different cycles that has been completed in the sequencing run to improve reselection of polony locations. For example, the trained neural network may be retrained using data of cycles 1-6 or 1-7 following the training using data from cycle 1-4, and make another prediction of the polony map after the training.
[0355] In some embodiments, the same neural network used in operation 2806 and 2816 may be trained using different reference information as the “ground truth” in training. In some embodiments, the training of the neural network for predicting polony locations may use reference intensities as the “ground truth,” while the training of the neural network for predicting base call may use reference base calls as the “ground truth.”
[0356] In some embodiments, the same neural network used in operations 2804-2806 and 2816 may be trained using identical reference information as the “ground truth” in training. In some embodiments, the training of the neural network for predicting polony locations may use reference intensities as the “ground truth,” and the training of the neural network for predicting base call may use reference base calls that can be determined based on such reference intensities.
[0357] In some embodiments, the same neural network may be trained to predict base calls using various training methods, e.g., method 2900 disclosed herein. Reference base calls may be used for the training of the neural network. The reference base calls used in training, e.g., using method 2900, may include spatial information thereof. The reference base calls used in training, e.g., using method 2900, may be of a first resolution, a second resolution, or a third resolution. In some embodiments, the third resolution can be higher than the first and second resolution.
[0358] In some embodiments, the same neural network may be trained to predict base calls, e.g., using method 2900. After being trained, such neural network may be used to predict base calls of the second plurality of flow cell images. The prediction of base calls can then be processed for determining locations of the polonies, thereby generating the polony map. For example, the polony map may be determined as the locations at which the base calls are predicted with a probability satisfying a predetermined threshold. As another example, the polony locations, thus the polony map, may be determined as the locations in which one or more quality metrics satisfy a predetermined threshold. Such quality metrics can include but is not limited to maximum, medium, or average intensity of the polony among different color channels, a Q score of the base call, a clarity of the base call, and a purity of the base call.
[0359] As disclosed above, the second plurality of flow cell images may be used for generating the polony map at the second resolution using operations 2804- 2806 or operations 2804’-2806. Alternatively, in some embodiments, the method may include an operation of generating the polony map based on the first plurality of flow cell images at the first resolution, and an operation of up-sampling to generate the polony map at the second resolution after operation 2810. In such embodiment, the first plurality of flow cell images may be provided instead of the second plurality of flow cell images in operation 2804 or operation 2804’ and then the operation 2806 may be replaced by an operation of determining the polony map based on the first plurality of flow cell images.
[0360] In some embodiments, the method 2800 is performed during a cycle N, cycle N may be one of the reference cycle(s) for generating the polony map. In some embodiments, cycle N may be a cycle different from the reference cycle(s). The polony map can be generated in the reference cycle(s) as a subsequent operation after the methods herein have improved the detectable polony density in flow cell images. Polonies from one or more channels within the reference cycle(s) can be included in the polony in a reference coordinate system, while base calling of cycle N is yet to be performed. In some embodiments, cycle N is the current cycle. N can be any non-zero integer. For example, for short read sequencing, N can be any integer from 1 to 150. In some embodiments, N can be any integer from 1 to 20, 1 to 200, 1 to 300, 1 to 500, or 1 to 1000.
[0361] In some embodiments, the polony map disclosed herein can include individual regions within a subtile or subtile. Each polony map can include a plurality of polonies therein. In some embodiments, the polony map can be of about the same size of a flow cell image so that all the polonies, from different tiles, and from multiple channels, can be registered to the same polony map. However, such polony map may contain polonies that will not be used in at least some operations described herein to reduce computational burden without sacrificing accuracy. In some embodiments, more than one polony map can be generated, and each corresponds to at least part of a subtile of a flow cell image from a channel. The more than one polony map may be tiled together in order to cover the entire sample region of the flow cell device.
[0362] In some embodiments, the polony map disclosed herein can include polonies that are within individual cells or tissue, or on the membrane thereof. In some embodiments, the polony map disclosed herein can exclude polonies or signal spots that are outside cell boundaries. In some embodiments, the polony map disclosed herein can exclude duplicate polonies, such duplication may occur at different z-locations, with one or more in-focus and/or out-of-focus in the flow cell images. The duplicate polonies may be within the same flow cell image or in different flow cell images.
[0363] The polony map herein can be initialized as a virtual image that has a black or dark background with no signals from polonies. For example, the polony map can be initialized to be zero or include otherwise minimal image intensity at all pixels.
[0364] After the coordinates (e.g., 3D coordinates) of a polony is determined by image registration of flow cell images, e.g., across different channels, the intensity of the polony can be added to the polony map at the location determined by the coordinates and with the size and shape determined based on registration. The polony map can be a virtual image that combines image intensity from polonies obtained from 2, 3, 4, or even more channels at the reference cycle. The pixels of the template containing no polonies in them remains to be black or dark so that the polony map can have a cleaner background without noise that appear in actual flow cell images. In some embodiments, the polony map includes a list of entries, and each entry corresponding to information for identifying a corresponding polony. For example, each entry can include spatial coordinates of the corresponding polony center in the reference coordinate system, and image intensity of the polony. The entry may also include a unique identification number of the polony.
[0365] The polonies can be from a subtile of flow cell images within a reference cycle, and more specifically, from one or more selected regions of the subtile. The flow cell images can be from different channels of 1, 2, 3, 4, or more channels of the system 100. As a nonlimiting example, a reference cycle can be any cycle of the first 5 or 6 cycles. In some embodiments, the reference cycle can be any cycle that is greater than 0. In some embodiments, the reference cycle is the first cycle. [0366] In some embodiments, the processing steps herein comprises performing image processing step(s) herein to adjust image intensities of polonies. In some embodiments, the image processing steps comprise one or more of the following: background subtraction; image sharpening; intensity offset adjustment; color correction; intensity normalization; phasing and prephasing correction; image registration; quality score estimation, or the like . In some embodiments, the image registration is configured to align images from different cycles and/or different channels, for example, with respect to a template image (i.e., a polony map) or a reference coordinate system. In some embodiments, the image registration herein is configured to register polonies or clusters from different cycles and different channels, to a template image or a reference coordinate system.
[0367] The method 2800 can comprise an operation 2812 of: (iii) predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network; or predicting, by the first reconfigurable device or the integrated circuit, one or more classifications corresponding to one or more pixels of the second plurality of flow cell images using the neural network. The operation 2812 of performing base calling may be based on the second plurality of flow cell images. The operation 2812 may be further based on the determined polony map in operation 2804 or 2804’.
[0368] In some embodiments, the second plurality of flow cell images may be from one or more color channels, one or more z levels, and/or one or more cycles. The prediction of base calls in operation 2812 can be performed using intensity of the polonies. In some embodiments, the second plurality of flow cell images may be from a single color channel, a single z level, and/or a single cycle. In some embodiments, the prediction of base calling can be performed using intensity of the polonies from a single color channel and one or more cycles. For example, flow cell images acquired from each color channel of the multiple color channels in multiple cycles may use a different pre-trained neural network for predicting the polony intensity of the corresponding channel. The prediction in operation 2812 of base calling can then be performed using intensities of the polony from different color channels. As another example, flow cell images from a single z level may require a different pre-trained neural network for predicting the base calls from a different z level using operation 2812. In some embodiments, prediction of base calling in operation 2812 can be performed using intensity of the polonies from different color channels, multiple z levels, and multiple cycles. In some embodiments, prediction of base calling in operation 2812 can be performed using intensity of the polonies from different color channels, a single z level, and multiple cycles. In some embodiments, prediction of base calling in operation 2812 can be performed using intensity of the polonies from a single color channel, one or more z levels, and one or more cycles. In some embodiments, prediction of base calling in operation 2812 can be performed using intensity of the polonies from one or more color channels, one or more z levels, and one or more cycles.
[0369] In some embodiments, the operation 2812 (iii) may include generating outputs that includes base calls, e.g., A, T, C, G, and/or U for one or more pixels of the second plurality of flow cell images. The one or more pixels may be determined using a polony map or a location list of polonies disclosed herein so that each pixel of the one or more e pixels is comprised in at least one polony in the polony map.
[0370] In some embodiments, the operation 2812 of (iii) may comprise generating outputs that includes base calls, e.g., A, T, C, G, and/or U for one or more pixels of the second plurality of flow cell images. In some embodiments, the operation 2812 of (iii) may comprise generating outputs that includes classifications, e.g., A, T, C, G, U, and/or background for one or more pixels of the second plurality of flow cell images. In some embodiments, the one or more pixels may include pixels that are not included in the polony map or the location list disclosed herein. For example, the one or more pixels may include all pixels within the FOV of the second plurality of flow cell images. In some embodiments, the one or more pixels include at least one pixel that is not comprised in any polony of the polony map. In some embodiments, the one or more pixels include at least one pixel that is comprised in the background of the polonies comprise noise signal(s). In some embodiments, the one or more pixels include at least one pixel that is not comprised in any polony in the polony map and at least one pixel that is comprised in at least one polony in the polony map. In some embodiments, the one or more pixels include at least one pixel that is not within a cell membrane or on the cell membrane.
[0371] FIGS. 3E -3F show comparison of accuracy of identifying transcripts (corresponding to polonies) using the neural network methods herein (“new algorithm”), e.g., 2800, and a traditional non-neural network based algorithm (“POR-YOLO”). In this case, simulated flow cell images of in situ sample with multiple cells are used. Each area may include a number of targets ranging from 0 to 4000. Such targets can be transcripts. The neural network herein, e.g., in method 2800, and a classic non-neural network based algorithm are used to predict/detect transcripts in such cells. And the prediction/determination is then compared with ground truths (or equivalently, the reference polony map) for accuracy. The correct number of targets per area is higher using the neural network and method disclosed herein than using the non-neural network based algorithm. The detected targets per area using the neural network and methods herein are much higher (2x or 3x higher) than that detected by the non-neural network based algorithm when the target density per area is greater than 2000 per area. FIG. 3F shows the false negative per cell for both the neural network (“new algorithm”) and non- neural network based algorithm (“POR-YOLO”). The false negative per area using the neural network and methods herein are much lower (lOx or more) than that detected by the non-neural network based algorithm when the target density per area is greater than 1000 per area.
[0372] FIG. 3G shows comparison of accuracy of identifying transcripts using the methods, e.g., 2800, and a traditional non-neural network based algorithm. In this case, simulated flow cell images of in situ sample with multiple cells are used. Each cell may include a number of transcripts ranging from 0 to 6000. The neural network herein, e.g., in method 2800, and a classic non-neural network based algorithm are used to predict/detect transcripts in such cells. And the prediction/determination is then compared with ground truths (or equivalently, the reference polony map) for accuracy. The R2 values show correlations of the prediction/determination with the references. The neural network and method herein, e.g., method 2800, showed consistently higher correlation with all the different numbers of transcripts per cell than the correlation using classic non- neural network based algorithm, thereby indicating higher accuracy in identifying polonies or clusters in flow cell images (e.g., transcripts) of in situ samples.
[0373] In some embodiments, the method 500, 2800 may include an operation of determining a biological analyte including but not limited to a morphological feature, a transcript, a RNA, a mRNA, a protein, or their combinations based on the base calling or classification of the polony in one or more sequencing cycles. For example, base calling or classification sequence of a polony in 6 consecutive sequencing cycles of ATTCGA may indicate a cellular protein that may be labeled by the unique barcode of “ATTCGA.” [0374] In some embodiments, the method 2800 further include an operation (iv) of: in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background (e.g., the classifications may include A, T, C, G, U, or background), determining a first morphological feature, a first RNA or mRNA, or a first protein based on the one or more predicted classifications. In some embodiments, the method 2800 further include an operation (v) of in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification (e.g., the classifications may include A, T, C, G, U, or background), determining a second morphological feature, a second RNA or mRNA, or a second protein based on the one or more predicted classifications.
[0375] In some embodiments, the method 2800 further include an operation (iv) of determining a first morphological feature, a first RNA or mRNA, or a first protein based on predicted base calls of a first pixel in one or more cycles. In some embodiments, the method 2800 further include an operation (v) of determining a second morphological feature, a second RNA or mRNA, or a second protein based on predicted base calls of a second pixel in one or more cycles.
[0376] In some embodiments, the method 2800 further include an operation of determining a spatial relationship of the first pixel and the second pixel which may include one or more of visualizing the first and second pixels within a common coordinate system, calculating a spatial distance in 2D or 3D between the first and second pixels; and determining whether the first and second pixels are within a same polony or not.
[0377] In some embodiments, the method 2800 further comprises: (iv) in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background classification, determining at least a first target of a first morphological feature; a first RNA or mRNA; and a first protein based on the one or more predicted classifications; and (v) in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification, determining at least a second target different from the first target from: the first morphological feature; the first RNA or mRNA; and the first protein based on the one or more predicted classifications. In some embodiments, the second target is of a different type of target from the first target (e.g., a protein vs. a morphological feature) thereby advantageously enable multi-omics analysis and research of the biological analyte(s) of interest using the methods herein. In some embodiments, the first target and the second target correspond to the biological analyte(s) of the sample. In some embodiments, the method 2800 further comprises: spatially aligning the location of the first and the second targets based on the one or more predicted classifications; and determining a biological analyte of the sample immobilized on the support based on the spatial alignment.
[0378] In some embodiments, the methods 2800 herein advantageously allow spatial alignment or in other words, co-localization of two or more different biological analytes using the neural network disclosed herein. Such different biological analyte may be of a different type. For example, a first biological analyte may be a morphological feature, and a second biological analyte may be a protein or mRNA. Such different biological analytes may be sequenced within a same sequencing run in same or different sequencing cycles. Exemplary embodiment of staining and sequencing different target analytes within cells or tissue are disclosed in PCT application No. PCT/US2025/10310, filed January 3, 2025, the contents of which are incorporated by reference in their entireties. The number of different biological analytes may be limited by the availability of unique barcodes that may be used to differentiate the biological analyte from others. For example, the number of different biological analytes can be in a range from 2 to 100, 4 to 350, 10 to 500, 50 to 1000, or more. For example, protein A may be localized to be within the nucleus of a specific cell type, while protein B may be localized to be adjacent to a certain transcript within the mitochondria but not within the cytosol based on the prediction of intensities, base calling, and/or classification in one or more cycles using methods 500 or 2800. Identification of such different biological analytes may advantageously provide more information, e.g., spatial relationships, which may facilitate biological, physiological, or pathological analysis of the sample(s) being sequenced. In some embodiments, the biological analytes herein may be any physical features of the sample(s) or source of sample(s). The detection, localization, and spatial alignment of the biological analytes may correspond to various physiological, biological, pathological characteristics of cells or tissue which may advantageously provide information that may advance understanding of cellular function, regulation, and interactions which in turn may advance existing biomedical research, including but not limited to, more effective disease modeling and drug discovery efforts.
[0379] In some embodiments, the method 2800 further comprises an operation of (iv) determining a location of one or more of a first morphological feature, a first RNA or mRNA, a first transcript, and a first protein based on the corresponding location of the one or more predicted base calls or predicted classifications. In some embodiments, the method 2800 further comprises an operation of (v) determining a location of one or more of: a second morphological feature, a second RNA or mRNA, a second transcript, and a second protein based on the corresponding location of one or more second predicted base calls or predicted classifications. In some embodiments, the method 2800 further comprises an operation of (vi) spatially aligning the location of one or more of: a second morphological feature, a second RNA or mRNA, and second protein with the location of one or more of: the first morphological feature, the first RNA or mRNA, and the first protein; and an operation of (vii) determining a biological character of the sample immobilized on the support based on the spatial alignment.
[0380] In some embodiments, the method 2800 may include an operation of saving the base calls obtained in operation 2812 in a predetermined format, e.g., in a FastQ file compatible with subsequent operations so that subsequent analysis such as adaptor trimming and secondary analysis can be performed.
Predicting base calls using patches in flow cell images
[0381] In some embodiments, the method 2800 may include an operation 2812 of (iii) performing, by the processor, a corresponding base calling for each of the determined polonies. In some embodiments, the operation 2812 comprises extracting a plurality of patches from the second plurality of flow cell images based on the polony map. The polony map may be generated using various algorithms, for example, from operation 2804 or 2804’. In some embodiments, the operation 2812 further comprises providing input to the neural network, the input comprising the plurality of patches, wherein each patch comprises one or more patch images from the multiple color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images; and predicting a plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch.
[0382] In some embodiments, each corresponding patch comprises a polony located at or in close vicinity to a center of the corresponding patch. For example, the polony may be no more than 1 to 10 pixels away from the center of the corresponding patch. In some embodiments, each patch comprises 3 to 128 pixels along a spatial dimension, e.g., along x or y direction. The size of the patches are maintained to be relatively small comparing to the size of the flow cell images, e.g., lOx, 20x, 50x, lOOx, 500x, lOOOx or less than the size of the flow cell image. In some embodiments, the plurality of patches comprises 100 to 108 patches. In some embodiments, two or more different patches may overlap at least partly with each other. In some embodiments, each patch may contain more than one, two, three, five, or ten polonies therewithin, but only the pixel(s)of the single polony at its center is used for generating base call(s) corresponding to the patch. For example, when each patch include a patch image sized to be 32 by 32, a first patch may include pixels 1- 32 in both x and y directions to cover a polony centered at pixels (16, 16) of the flow cell images, a second patch may include pixels 2-33 in both x and y directions to cover a second polony centered at pixels (17, 17.5), and a third patch may include pixels 5-36 in both x and y directions to cover a third polony centered at pixels (19, 19) of the flow cell images. In some embodiments, instead of using only the single polony for generating reference base calls, reference intensities or making predictions, a very limited number of polonies in each patch may be used. The very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100. The very limited number of polonies can be lOOx, lOOOx, 104x, 105x, 106x, 107x, or 108x less than a total number of polonies in a corresponding flow cell image.
[0383] In some embodiments, the number of pixels within each patch can be optimized to balance the computational complexity and spatial context information to be included for training the neural network(s). The number of patch images within each patch can be optimized to balance the computational complexity and the spatial context information within each patch for accurate and reliable prediction using the neural network. In some embodiments, the number of pixels within each patch can be at least partly based on polony density of the sample being imaged. In some embodiments, each patch may include multiple pixels, but prediction may only be performed for a single polony at or near the center of the patch. In training the neural network, e.g., using methods 2900, for predicting the base call, similarly reference base calls are only for a single polony at or near the center of the patch. In some embodiments, instead of the single polony, a very limited number of polonies in each patch may be used for training the neural network(s) or making predictions. The very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100. The very limited number of polonies can be lOOx, lOOOx, 104x, 105x, 106x, 107x, or 108x less than a total number of polonies in a corresponding flow cell image.
[0384] In some embodiments, each patch may comprise multiple patch images corresponding to different color channels. For example, each patch may comprise a patch image covering same pixels within the x-y plane in three different color channels. The same pixels may be pixels determined after registration to correct for the spatial offset across different color channels. In some embodiments, each patch may comprise multiple patch images corresponding to different cycles, e.g., continuous cycles n-1, n, n+1, within a sequencing run. For example, each patch may comprise 3 images, each from a different color channel in 4 adjacent cycles, so that each patch may comprise 12 patch images in total. When the sample is in 3D, e.g., an in situ cell sample, each patch may include 5 different z levels to make the total number of patch images of 60.
[0385] In some embodiments, at least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise some identical pixels. In some embodiments, each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
[0386] In some embodiments, the first plurality of flow cell images are acquired only from a single color channel so that flow cell images acquired from different color channels may require different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
[0387] In some embodiments, the first plurality of flow cell images are acquired only from a single z level, so that flow cell images acquired at different z levels of 3D sample(s), e.g., in situ cells, may require different neural network for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein.
[0388] In some embodiments, the first plurality of flow cell images are acquired from the one or more cycles. In some embodiments, the one or more cycles comprises a plurality of cycles in a sequencing run. In some embodiments, the one or more cycles comprises a current cycle N, and the first plurality of flow cell images are acquired from at least one cycle prior to the current cycle N. The current cycle N is a cycle in which sequencing is currently being performed in of a sequencing cycle. In some embodiments, the flow cell images may have been acquired in the current cycle N, but no flow cell images have been acquired in the next cycle N+1.
[0389] In some embodiments, the operation 2802 (ii) of providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network comprises: (ii) providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network without providing a polony map or locations of polonies in the second plurality of flow cell images as the input to the neural network. In other words, the operation (ii) of method 2800 does not require the input of a polony map, a location list of polonies, or the like to be provided as input to the neural network in order to predict the base calls. In some embodiments, the spatial location of the polonies within the flow cell images, e.g., the second plurality of flow cell images are not used in predicting the base calling using the neural network. Instead, each patch may contain relative spatial information of the polony with respect to the rest of the pixels in the same patch(es) that may be used for predicting the base calling using the neural network. The method 2800 may predict base calling, e.g., in operation 2812, without using the input of a polony map, a location list of polonies, or the like. Instead, the polony map, the location list of polonies, or the like may be used to extract the plurality of patches from the second plurality of flow cell images.
[0390] In some embodiments, the operation of predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: predicting a probability map for each channel of the multiple color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the probability maps. For example, for flow cell images from 4 different color channels, 4 different probability maps may be generated. Each probability map may have the same size and dimension as the flow cell images or covering at least a portion of the flow cell images. Each pixel in the probability map may a probability value corresponding to the channel. As an example, pixel (12,12) may have a probability value of 0.2, 0.01, 0.2, and 0.59 in 4 different channels representing nucleotides A, T, C, and G, and the base call of pixel (12, 12) may be determined as the largest probability among probabilities of different color channels, which is 0.59 and correspond to nucleotide G for its base calling. The neural network may be trained to predict probability maps. In some embodiments, training of the neural network to predict probability maps can be based on reference polony maps or any equivalent information indicative of polony locations, e.g., a location list of polonies. In some embodiments, the neural network to predict probability maps can be trained by comparing each probability map to a corresponding reference polony map. In some embodiments, the neural network may be trained to minimize a loss function based on the comparison of the probability map and the corresponding reference polony map. For example, a probability map may be initialized to have random values in each pixel, and the neural network may be trained to produce higher value for pixel(s) corresponding to polonies than pixels corresponding to non-polony structure(s) in the probability map. In some embodiments, the sum of values for each pixel in all probability maps of different color channels may add up to a fixed number, e.g., 1, 10, 100, etc. As an example, pixel (24, 25) in 3 probability maps corresponding to 3 different color channels may be 0.24, 0.51, and 0.25, which adds up to 1. In some embodiments, each base call corresponds to a corresponding patch which includes one or more patch images. In some embodiments, the operation of predicting the plurality of base calls using the neural network and based on the input comprises: generating a first single intensity for a first channel of the multiple color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the single intensity. As an example, a first single intensity of a first color channel may be determined using prediction by the neural network disclosed herein. The first single intensity may or may not be normalized. The first single intensity may correspond to the single polony of the corresponding patch containing one or multiple patch images of the same polony at adjacent cycles of a sequencing run. The first single intensity may correspond to one of the adjacent cycles, e.g., a current cycle. A base call may be determined based on the first single intensity of the current cycle, e.g., by comparing the first single intensity with other intensities of the same polony from other color channels. The other intensities may be predicted similarly using the same or different neural networks.
[0391] In some embodiments, the method further comprises an operation of predicting a second single intensity for a second channel of the multiple color channels corresponding to the corresponding patch using a second neural network; and determining the base call of the corresponding patch based on at least the first single intensity and the second single intensity.
[0392] In some embodiments, the method further comprises an operation of predicting a second single intensity for a second channel of the multiple color channels corresponding to the corresponding patch using a second neural network or the same first neural network; and an operation of predicting a third single intensity for a third channel of the multiple color channels corresponding to the corresponding patch using a third neural network or the same first neural network; and determining the base call of the corresponding patch based on at least the first, second, and third single intensities. For example, at a current cycle N, the first, second, and third intensities may be predicted using different neural networks (e.g., each of the neural networks may be trained using different training data but with identical neural network layers and numbers of parameters) to be 50, 690, 80 for the same polony. The base call of the polony may correspond to the nucleotide that lights up in the second color channel with an intensity of 690 but not the first or third color channel.
[0393] In some embodiments, the operation (iii) of predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network comprises: determining two or more pixels of the second plurality of flow cell images as duplications of a single polony; and selecting one pixel of the two or more pixels as a center of the single polony. In some embodiments, the two or more pixels may be at a same z level. In some embodiments, the two or more pixels may be at different z levels. Exemplary embodiments of the operation of determining two or more pixels of the second plurality of flow cell images as duplications of a single polony and selecting one pixel of the two or more pixels as a center of the single polony are disclosed in PCT Application No. PCT/US23/76125, and is incorporated herein by reference in its entirety.
[0394] Although embodiments herein are disclosed with a focus on using and training neural networks, other artificial intelligence-based models may also be used for similar purposes. In some embodiments, the methods 500 and 2800 herein may be performed using artificial intelligence-based models other than neural networks. In some embodiments, the methods 600, 700 and 2900 may be used to train artificial intelligencebased models other than neural networks for making predictions or inferences using methods 500 or 2800. Some non-limiting examples of the artificial intelligence-based models include: random forest, decision tree, k-mean clustering, and gradient boosted tree. In some embodiments, the artificial intelligence-based models may be used to predict intensities, classifications, or base calls by working on intensities from flow cell images and/or the high resolution flow cell images. In some embodiments, the artificial intelligence-based models other than neural networks may predict intensities, classifications, or base calls using information only including intensities, and such information may lack spatial context of the intensities, shapes of the polonies, background noise, signal from other cellular structures, etc. In some embodiments, the neural networks herein predict intensities, classifications, or base calls by advantageously using the flow cell images or high resolution flow cell images which not only include the intensities but also other information including but not limited to background noise, polony sizes and shapes, spatial relationship among polonies, etc. for more accurate predictions or inferences.
[0395] In some embodiments, the neural network herein is a convolutional neural network (CNN). In some embodiments, the neural network is a 3D CNN. In some embodiments, the neural network is a 2D CNN. In some embodiments, the neural network comprises one or more convolutional layers. In some embodiments, the neural network is a recurrent neural network (RNN). In some embodiments, the neural network is a 3D RNN. In some embodiments, the neural network is a 2D RNN. In some embodiments, the neural network comprises one or more long short-term memory (LSTM) layers. In some embodiments, the neural network is a U-Net. In some embodiments, the neural network includes a residual network (ResNet). In some embodiments, the neural network can include a transformer based model like a vision transformer (ViT). In some embodiments, the neural network comprises a U-Net with a first predetermined repetition of down-sampling and convolution operations and then a second predetermined repetition of up-sampling, concatenation, and convolution operations. The first and second predetermined repetition can have an identical quantity, e.g., 3 or 4. In some embodiments, the neural network is a U-Net with a first predetermined number of filters in each repetition of down sampling, and then a second predetermined number of filters in each repetition of up sampling and/or concatenation. For example, the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 128, 64, 64, and 32 filters in the corresponding three repetitions. As another example, the first predetermined number of filters can be 32, 64, 128, and 256 filters in three repetitions and the second predetermined number can be 256, 128, 64, and 32 filters in the corresponding three repetitions.
[0396] In some embodiments, the operation 2812 may comprise: performing, by the processor, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising: (a) performing, by the processor, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and (b) performing, by the processor, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result. In each repetition, the second convolution may comprises a corresponding number of filters, thereby generating a third convolution result after the repetitions.
[0397] In some embodiments, the operation 2812 may further comprise: performing, by the processor, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (c) performing, by the processor, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and (d) performing, by the processor, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result. In each repetition, the second convolution may comprise a corresponding number of filters, thereby generating a sixth convolution result after the repetitions.
[0398] In some embodiments, the first convolution comprises a 3D convolution with a convolution kernel. In some embodiments, the convolutional kernel may have 4 dimensions. In some embodiments, the convolutional kernel is m*m*m for the first three spatial dimensions and the size of its fourth dimension is determined by the filter number in the corresponding repetition. In some embodiments, m can be an integer in the range of 2 to 20. For example, the input can be 512x512 flow cell images, and the z-stack can have 12 slices. The first convolution can include 32 filters and each filter has one kernel that is 3x3x3xl. The output from that convolutional block is 512x512x12x32. Then there is a double convolutional block, i.e., the second convolution having two first convolutions with 32 filters. The input to both of those blocks is 512x512x12x32 and the output is 512x512x12x32. Each filter uses a kernel sized 3x3x3x3x32. The number of filters may correspond to features of the input.
[0399] In some embodiments, the second convolution comprises two 3D convolutional layers, e.g., as shown in the pseudo code. In other words, the second convolution comprises two repetition or blocks of the first convolution in 3D, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image. The depth of image may increase as the number of features or filters increases. In some embodiments, the first and second resolution is in 2D or 3D.
[0400] In some embodiments, the first convolution comprises a 2D convolution with a convolution kernel. In some embodiments, the convolutional kernel may have 3 dimensions. In some embodiments, the convolutional kernel is m x m for the first two spatial dimensions and the size of its third dimension is determined by the filter number in the corresponding repetition. In some embodiments, m can be an integer in the range of 2 to 20. For example, the input can be flow cell images with a size of 512x512x1. The first convolution can include 64 filters and each filter has one kernel that is 3x3x1. The output from that convolutional block is 512x512x64. Then there is a double convolutional block, i.e., the second convolution having two first convolutions with 32 filters. The input to both of those blocks is 512x512x64 and the output is 512x512x32. Each filter can use a kernel sized 3x3x32.
[0401] In some embodiments, the second convolution comprises at least two convolutional layers or exactly two convolutional layers, e.g., as shown in the pseudo codes. In other words, the second convolution comprises two repetition or blocks of the first convolution, and usage of the output and the number of filters changes, as convolution process will increase the depth of the image. The depth of image may increase as the number of features or filters increases. In some embodiments, the first and second resolution is in 2D or 3D.
[0402] In some embodiments, the second convolution in operation (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively. In some embodiments, the second convolution in operation (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively. In some embodiments, n can be an integer in the range from 8 to 256. For example, operation (a) comprises 32, 64, 128, and 256 filters in three repetitions and operation (c) comprises 128, 64, 64, and 32 filters in the corresponding three repetitions.
[0403] In some embodiments, the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively. For example, operation (a) comprises 32, 64, 128, and 256 filters in four repetitions and operation (c) comprises 256, 128, 64, and 32 filters in the corresponding four repetitions.
[0404] In some embodiments, the second convolution in operation (c) comprises a corresponding number of n, 2*n, 4*n filters in a last repetition, last minus one, last minus two, repetition, respectively. For example, operation (a) comprises 32, 64, 128 filters in three repetitions and operation (c) comprises 128, 64, and 32 filters in the corresponding three repetitions.
[0405] In some embodiments, the operation 2800 may further comprise: performing, by the processor, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and predicting, by the processing, the second plurality of flow cell images based on the seventh convolution result. Each of the second plurality of flow cell images may correspond to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions. In some embodiments, the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
[0406] In some embodiments, the first plurality of flow cell images are from a single color channel. In some embodiments, the first plurality of flow cell images are from one or more color channels. In some embodiments, the first plurality of flow cell images are of unbalanced nucleotide diversity in one or more sequencing cycles. In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 102 -1015 per mm2. In some embodiments, the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 103 -1010 2 per mm .
[0407] In some embodiments, the first resolution is in a range of 0.1 um to 5 um. In some embodiments, the first resolution is in a range of 0.01 um to 10 um. In some embodiments, the second resolution is in a range of 0.02 um to 2 um. In some embodiments, the second resolution is in a range of 0.001 um to 3 um. In some embodiments, the down-sampling factor is 2, 4, 6, 8, 16, or more. In some embodiments, the up-sampling factor is 2, 4, 6, 8, 16, or more.
[0408] In some embodiments, one or more of operations, e.g., operation 2810, 2802, 2804, 2804’, 2806, 2812, are performed while a sequencing run is being performed. In some embodiments, one or more operations are performed in parallel as the corresponding sequencing run to reduce sequencing analysis time.
[0409] In some embodiments, the sequencing analysis time includes a total time required from when the raw flow cell images are acquired in each cycle of a sequencing run to when the base calls for each cycle of the sequencing run are generated. [0410] In some embodiments, the sequencing analysis time includes a total time required from when a sequencing run starts to when the base calls for each cycle of the sequencing run are generated.
[0411] In some embodiments, the sequencing analysis time includes a first time duration to complete a sequencing run and a second time duration to generate base calls for the sequencing run. The first and second time durations may overlap at least partly with each other (e.g., performing base calling while the sequencing run is still in progress) to reduce the sequencing analysis time.
[0412] In some embodiments, the one or more cycles comprises a current cycle N. N may be in a range from 1 to 1000. In some embodiments, one or more of operations are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
[0413] In some embodiments, the training data set of training flow cell images comprises z-stacks of training flow cell images taken at different z-locations. Each z-stack may represent an individual FOV of a 3D sample(s), e.g., an in situ cellular sample. In some embodiments, the z-axis is orthogonal to image planes of the flow cell images.
[0414] In some embodiments, the training data set of training flow cell images comprises flow cell images from multiple sequencing cycles. One or more sequencing cycles may be of unbalanced nucleotide diversity so that image appear dimmer or the number of polonies are less than images from sequencing cycles of high nucleotide diversity. In other words, the number of polonies in the training flow cell images in a particular cycle may vary from 1% to 99% of a total number of polonies within a FOV of that cycle. When the number of polonies in the training flow cell image of a particular cycle is from 1% to 5% or 1% to 10% of the total number of polonies within that cycle, it is of low or unbalance diversity. When the number of polonies in the training flow cell image of a particular cycle is greater than 10% or 15% of the total number of polonies within that cycle, it is of high or unbalanced diversity.
[0415] In some embodiments, the training data set of training flow cell images comprises flow cell images from multiple samples and multiple sequencing cycles, and the training flow cell images include a subset of flow cell images with unbalanced diversity in multiple sequencing cycles and another subset of flow cell images with balanced diversity in multiple sequencing cycles. [0416] In some embodiments, the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, operation (a) comprises: performing, by the processor, the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
[0417] In some embodiments, the operation of performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result. In some embodiments, operation (a) comprises: performing, by the processor, the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
[0418] In some embodiments, repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing, for one or more times, operations comprising (c), (d), and (e), wherein (e) is after operation (c) and before operation (e), and wherein (e) comprises: concatenating, by the processor, the first up- sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous down-sampling repetition. In some embodiments, operation (e) is in each repetition. In other words, repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing operations comprising (c), (d), and (e) in each repetition of one or more repetitions.
[0419] The kernel may take any size that is smaller than the size of the flow cell image undergoing the convolution. For example, with an opening operation, the kernel can be 2 by 2 by 2, 3 by 3 by 3, 4 by 4 by 4, 5 by 5 by 5, 6 by 6 by 6, 10 by 10 by 10 in the first three spatial dimensions. In some embodiments, the kernel size can be customized to remove at least some of the noise and unwanted signal that are larger than the kernel size. In some embodiments, the kernel can be circular. The kernel can be in various other shapes.
[0420] In some embodiments, when the focus of the optical system includes a range, e.g., 0.1 um, 0.2 um, 0.3 um, 0.5 um, 0.6 um, 0.8 um, 1 um, 2 um, 3, um, 4 um, 5 um, etc. expanding along z axis. Polonies or clusters that are within the range of focus can appear in-focus or about in-focus in the flow cell image. Flow cell images at a specific z level can also include signals from polonies or clusters that are not within the focus range of the image, but at different z levels. So, such polonies or clusters are out-of-focus. As shown in FIG. 3 A, bigger and blurred signal spots represent out-of-focus polonies or clusters. Some of the out-of-focus polonies or clusters are circled in FIG. 3 A.
[0421] Each flow cell image at a specific z level can also include noises caused by the optical system and/or undesired signal from the sample. The undesired signal can be signal coming from components of the sample such as membrane, cytosol, and mitochondria. Such background objects can be any objects, relatively larger in size than the polonies or clusters. As shown in FIG. 3 A, there is a blurry cellular contour (at the arrows) in the flow cell image, and most of the signal spots are contained within the blurry contour. In some embodiments, background objects can include any objects within the 3D sample but are not polonies or clusters.
[0422] In some embodiments, the method 2800 include an operation of registering the second plurality of flow cell images. In some embodiments, the images are registered across channels and/or across different cycles. In some embodiments, the flow cell images are registered before any base calling are performed in operation 2812 or 2804, 2804’. In some embodiments, the images are registered across channels and different cycles before generating or obtaining the 3D polony maps. In some embodiments, the flow cell images are registered across channels and different cycles before one or more primary analysis steps here. In some embodiments, the flow cell images can be registered after one or more preprocessing operations disclosed herein are performed. Various image registration techniques can be used to register the flow cell images. Various image registration techniques can be used to register the images. The flow cell images can be registered using 2D or 3D registration techniques.
[0423] In some embodiments, the operation of registering the flow cell images is with respect to a reference coordinate system. In some embodiments, the operation of registering the flow cell images is with respect to one or more template images. The operation of registering the images can comprise generating the one or more template images in a reference coordinate system. In some embodiments, the operation of registering the images can comprise registering polonies to template polonies in the one or more template images. The operation of registering the images can comprise determining a plurality of transformations based on the one or more template images. Each of the plurality of transformations can corresponds to a corresponding subtile of the flow cell images, the processed images, or the filtered images and configured to register the subtile to the one or more template images. Each transformation can be used to register a corresponding subtile or tile to the one or more template images. The plurality of transformations can comprise one or more affine transformations.
[0424] In some embodiments, the operation of registering the images can comprise performing image registration of the polonies based on fiducial markers. The fiducial markers can be located on the flow cell. Alternatively, the fiducial markers can be external to the flow cell.
[0425] In some embodiments, the image registration herein is configured to align images from different cycles and/or different channels, for example, with respect to a template image or a reference coordinate system. In some embodiments, the image registration herein is configured to register polonies or clusters from different cycles and/or different channels, e.g., in the filtered image, to a template image or a reference coordinate system.
[0426] For example, the base calling can be performed using the filtered images from different channels in cycle N after the filtered images from different channels are registered relative to the corresponding template image disclosed herein.
[0427] For each polony in the polony map, e.g., 3D polony map, the location information of such polony can be obtained from the polony map, e.g., 2D coordinates of the polony and the z level. Using the 2D coordinates and the z level, the corresponding flow cell image and its pixel(s) can be determined. Image intensity of such pixels can be extracted from the corresponding processed image after one or more primary analysis steps as intensity of such pixel for performing base calling.
[0428] In some embodiments, the operation of registering the flow cell images may be based on background objects in the flow cell images. The background objects can be used to align the flow cell image to the cell images by using one or more transformation(s).
The cell staining images herein are staining images of the sample(s) immobilized on the support, with possible transformation (e.g., translation) from the sample(s) in the flow cell images. The transformation may be represented by a single transformation of the whole image or be separated into multiple transformations, each representing a portion of the whole image. After finding the transformation(s) of the background objects between the flow cell images and the cell staining images, the polonies or clusters can be registered to the cell staining images. [0429] In some embodiments, the method 2800 may further include an operation of registering the base callings, e.g., of a 3D sample, to the cell staining images containing morphological information of the sample. In some embodiments, such registration may be based on fiducial markers. Such fiducial markers can also be included in the cell staining images. Aligning the fiducial markers can generate the transformation(s) between the flow cell images or between flow cell images and cell staining images. The transformation(s) can be used to register or align polonies or clusters between the sequencing images and the cell images. The fiducial markers can be within the sample or external to the sample. For example, the fiducial markers can be biological features inherent to the sample(s). As another example, the fiducial markers may be immobilized on the flow cell but external to the sample.
[0430] In some embodiments, the method 2800 further comprises an operation of determining a location of one or more of: a morphological feature, a RNA or mRNA , and a protein based on the corresponding location of each predicted base call. In some embodiments, the samples may be labeled so that the base calls may uniquely identify a morphological feature, a RNA or mRNA, or a protein of the sample in 3D. Such information can be used to advantageously provide nucleotide sequencing in spatial context of the sample.
[0431] In some embodiments, the same pretrained neural network (e.g., with same parameters and neural network structure) can be advantageously used for predicting the polony map and for predicting the base calls. The same neural network herein can be trained before operation 2806 and 2812, and requires no additional training in between the operations of 2806 and 2812.
[0432] In some embodiments, the operation 2806 further comprises predicting, by the first reconfigurable device or the integrated circuit, a base call corresponding to each polony of the second plurality of flow cell images using the neural network at a third resolution; and determining the polony map based on the predicted base calls and a corresponding quality index of each predicted base call at the third resolution. In some embodiments, the third resolution is at least 2 to 32 times greater than the first or second resolution in one or more spatial dimensions. In some embodiments the third resolution is greater than the first and second resolution in one or more spatial dimensions. In some embodiments, the third resolution is identical to the first or second resolution in one or more spatial dimensions. [0433] In some embodiments, the different patches may include some overlapped pixels. In some embodiments, the different patches does not include any overlapped pixel. For example, patch 1 may include 12 different patch images, each from one of the 4 different color channels and one of the three consecutive cycles in a sequence run. Patch 2 may also include 12 different patches cropped from non-overlapped pixels of the same flow cell images. Patch 3 may include 12 different patched images, each patch image with more than half of the pixels being identical to the patch images of patch 1.
Training neural networks
[0434] In some embodiments, disclosed herein are methods for training the neural network, e.g., CNN, for predicting high resolution flow cell images with improved detectable polony density than what existing sequencing systems or methods are capable of providing.
[0435] In some embodiments, the sequencing system herein comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations in methods herein (e.g., methods 600, 700, 2900) to train the neural network.
[0436] In some embodiments, the sequencing system herein comprises: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit in data communication with the first reconfigurable logic device; a neural network deployed at least partly on the integrated circuit and/or the first reconfigurable logic device; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations in methods herein (e.g., methods 600, 700, 2900) to train the neural network.
[0437] In some embodiments, the first reconfigurable logic device and the integrated circuit is within the same physical housing as the other elements of the sequencing system as show in FIG 1. In some embodiments, the first reconfigurable logic device and the integrated circuit is not physically external to the sequencing system 110 as show in FIG 1, e.g., not in the cloud 130.
[0438] FIG. 5B shows an exemplary method 600 for training the neural network, e.g., CNN, which can be used to predict high resolution flow cell images with improved detectable polony density.
[0439] In some embodiments, training can be done onboard using the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system. In such cases, training may be done using hardware elements within the physical housing of the sequencing system 110 shown in FIG. 1. In some embodiments, training can be performed external to the sequencing system 110. For example, training may be performed using hardware elements over the cloud 130. In some embodiments, training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs. In some embodiments, training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
[0440] In some embodiments, the neural network is trained with the same type of flow cell images as which the neural network may make predictions on after being trained. For example, the neural network is trained with 2D flow cell images at multiple z levels and then may be used to predict base calls for 2D flow cell images at multiple z levels to cover a 3D in situ sample. As another example, the neural network is trained with 2D flow cell images from a single organ origin and then may be used to predict base calls for 2D flow cell images of samples extracted from the same organ, e.g., liver.
[0441] In some embodiments, the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with z-stacks of flow cell images, training the neural networks with 2D flow cell images reduces the amount of computational effort, and reduces training time and cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
[0442] In some embodiments, the sequencing system comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels, each connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform operations to train the neural network comprising: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution; (c) generating a training output by inputting the training set to the neural network; (d) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error; and (e) generating a trained neural network with adjusted parameters.
[0443] In some embodiments, the sequencing system comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising: processing sensor data to generate the first plurality of flow cell images, wherein the integrated circuit is configured to perform operations including: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution; (c) generating a training output by inputting the training set to the neural network; (d) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error; and (e) generating a trained neural network with adjusted parameters.
[0444] In some embodiments, the system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution; (b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution; (c) generating a training output by inputting the training set to the neural network; (d) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error; and (e) generating a trained neural network with adjusted parameters.
[0445] In some embodiments, the method 600 for training the neural network, e.g., CNN, comprises an operation 610 of generating a corresponding plurality of training flow cell images for one or more sample(s) with a first resolution. The operation 610 may be performed by simulation, thus the corresponding plurality of training flow cell images may be simulated images of 2D or 3D samples. The simulation can be based on characteristics of actual flow cell images of sample(s). Such characteristics may include but is not limited to: image resolution, FOV, pixel size, and/or characteristics of the optical system, field of depth, point spread function, etc.
[0446] In some embodiment, the operation 610 may be performed using the imager 116 of the sequencing system. The corresponding plurality of training flow cell images may be real images of 2D or 3D samples with a first resolution. It is worth noting that the training flow cell images may be generated based on the characteristics of the sample(s) that predictions are going to be made. For example, for predicting polony locations in 3D samples, the training flow cell image may only include images (simulated or real images) of 3D samples of similar characteristics, e.g., liver samples, kidney samples, etc. As another example, for prediction polony locations in traditional 2D samples, the training flow cell images may only include 2D flow cell images with similar plexity and/or polony density. In some embodiments, the training flow cell images may include a combination of flow cell images, either of 2D or 3D samples, and with or without similar characteristics.
[0447] In some embodiments, the corresponding plurality of training flow cell images (simulated or real images) may include flow cell images at multiple z-levels. In some embodiments where the neural network is in 3D, the corresponding plurality of training flow cell images may include z-stacks of flow cell images, each z-stack may include a 3D volume made up from multiple z-levels of flow cell images comprised in the z-stack.
[0448] In some embodiments where the neural network is in 2D, the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels (2D images) but not a z-stack of flow cell images.
[0449] In some embodiments, the training data set of flow cell images comprises simulated flow cell images of in situ samples at different z-locations. In some embodiments, the training data set of flow cell images comprises actual flow cell images acquired from in situ samples at different z-locations. In some embodiments, polony locations are identified in such actual flow cell images at a sub-pixel resolution to provide the high resolution “truth maps” in the training data set. Identification of polony or cluster locations at a sub-pixel resolution, e.g., at 0.02 pixel, 0.05 pixel, 0.1 pixel, 0.25 pixel, etc., may be performed using various image processing methods. For example, embodiments of identification of polony or cluster locations at a sub-pixel resolution has been disclosed in U.S. Patent No. 11,200,446, and is incorporated herein by reference in its entirety.
[0450] In some embodiments, the method 600 comprises an operation 620 of (1) up- sampling, by the processor, the corresponding plurality of training flow cell images for each cellular sample to a second resolution to generate a reference set comprising high resolution training flow cell images or (2) generating, by the processor, a reference set of reference flow cell images at a second resolution higher than the first resolution, each reference flow cell image in the reference set corresponding to an individual image of the corresponding pluralities of training flow cell images. The operation of up-sampling in (1) can be based on the imaging process. For example, the point spread function can be virtually improved by 4x if the up-sampling is to achieve 4x spatial resolution. In some embodiments, the operation of up-sampling is in 2D. In some embodiments, the operation of up-sampling is in 3D. Each corresponding plurality of training flow cell images may include a z-stack with more than one z levels to cover a 3D volumetric sample. The resolution in x and y may be different from the resolution in z direction.
[0451] FIGS. 2A and 2D-2E show exemplary flow cell images that are generated for training the neural network, e.g., CNN. In some embodiments, the simulated flow cell images with higher resolution, e.g., FIG. 2E, are included in the reference set. Such images are used as “ground truth.” In some embodiments, such images have no signal originating from pixels other than the polonies. In some embodiments, such images have no signal originating from cellular background in the sample(s). In some embodiments, such images may include features that are specific to polonies in flow cell images during sequencing runs, such as polony intensity, polony shape, pattern of distribution (e.g., within regions determined by the cell boundaries).
[0452] In some embodiments, the method includes generating simulated flow cell images with low resolution, e.g., FIG. 2D, which mimic the real flow cell images that a user would acquire during sequencing of cells and are included in a training set. Such simulated flow cell images may have polony features, cell features, background, noise, etc. The low resolution simulated flow cell images may then be up-sampled to be at the high resolution.
[0453] The simulated images, either with high or low resolution, may include a z-stack of flow cell images taken at different z-locations to simulate flow cell images of a volumetric sample. In some embodiments, generating simulated images may add additional computational load to the training process, and may require specific criteria in order to mimic polony features and other information may be contained within the real flow cell images during sequencing. However, simulated images may remove possible imaging artifacts, e.g., caused by vibration, over-heating, bubbles, etc., and avoid training on such distracting features that are not part of the polonies in the sample and may reduce accuracy and reliability of training the neural network. [0454] The training set may include flow cell images from different cell geometries, different in situ samples, different image intensities, different polony densities, different nucleotide diversities, etc.
[0455] In some embodiments, the method 600 comprise an operation 630 of providing, by the processor, the training set as inputs to the neural network to generate corresponding training outputs. Each corresponding training output may include output flow cell images, e.g., a z-stack of output images.
[0456] In some embodiments, the method 600 comprises the operation 640 of repeatedly training the neural network, e.g., CNN, by performing one or more operations until the output error satisfies a stopping criterion. The training operation 640 comprises one or more operations including: the operation 655 of determining an output error by comparing the training output and the reference set; and the operation 660 of adjusting current values of parameters of the convolutional neural network based on the output error. Determining the output error can be based on various metrics. For example, the metrics can include minimum mean square error of images intensities from some or all of the pixels of the training output to the corresponding z-stack in the reference set. Values of the parameters of the neural network, e.g., CNN, can be adjusted based on the output error or one or more previous output errors. The stopping criterion can be customized based on but not limited to training time, computational complexity, required accuracy, power consumption, and/or convergence rate. For example, the stopping criterion can be (1) stop after 10 epochs to reduce training time. As another example, the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0.
[0457] In some embodiments, z-stacks of training flow cell images from a same color channel can be used to train the neural network, e.g., CNN, for that particular channel. A certain percentage, e.g., 80%, of the training set may be used for training, and the rest of the training set, e.g., 20%, may be used for validation. Batch size can be one, Epochs can be about 10, 12, 15, 20, or more. Various optimizers can be used.
[0458] In some embodiments, the convolutional neural network comprises one or more U-Net units.
[0459] In some embodiments, comparing the training output to the reference set comprises: calculating mean square error in image intensity of one or more pixels in each pair of an image from the reference set and a corresponding image from the training output. In some embodiments, comparing the training output to the reference set comprises: determining one or more values of a loss function. In some embodiments, each pair of the image from the reference set and the corresponding image from the training output comprises a same image size, a same field of view, a same resolution, or a combination thereof. In some embodiments, the one or more pixels excludes pixels that are outside of cell boundaries. In some embodiments, the cell boundaries are determined based on image segmentation of cell boundaries of the high resolution flow cell images in the reference set.
[0460] In some embodiments, the method 600 includes an operation 670 of generating a trained neural network with the adjusted values in parameters obtained in operation 660. The trained neural network may be used to predict high resolution intensities that can be used to determine high resolution base calls of flow cell images, e.g., using method 500.
[0461] FIG. 5E shows an exemplary method 700 for training the neural network, e.g., CNN, which can be used to predict high resolution flow cell images with improved detectable polony density.
[0462] In some embodiments, training of the neural networks using the methods 600, 700 may utilize training images that are real flow cell images of samples, simulated flow cell images, or a combination thereof. Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue. Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity. When a predetermined prediction quality is needed, training with real flow cell images may advantageously allow reduced complexity of the neural network to achieve the predetermined quality than the neural network trained using simulated data. In some embodiments, the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc. The values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s). For example, the error rate of base calling using a first neural network trained on simulated flow cell images can be determined in comparison with base calling using an existing primary analysis method without neural networks. The error rate in base calling using a second neural network trained using real images of in situ sample can also be obtained in comparison with base calling using the existing primary analysis method without neural network. The error rate in base calling using the first neural network can be higher than the error rate in base calling using the second neural network. The error rate in base calling using the first neural network can be 2x, 3x, 4x, 5x, 6x, lOx, or higher than the error rate in base calling using the second neural network.
[0463] In some embodiments, training of the neural networks herein can be completed using only the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system 100. In some embodiments, training can be performed at least partly external to the sequencing system. For example, at least part of the training may be performed using hardware over the cloud.
[0464] In some embodiments, the sequencing system 110 comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network (e.g., trained neural network) deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations of the sequencing method 600, 700; a second processor or the first processor to control the integrated circuit to perform one or more operations of the sequencing methods 600, 700 to facilitate generating the sequencing analysis result(s).
[0465] In some embodiments, the operations performed by the first reconfigurable logic device may comprise processing or receiving sensor data to generate the first plurality of flow cell images after operation 705. In some embodiments, the first reconfigurable logic device or the integrated circuit is configured to perform operations including operation 715 of up-sampling the corresponding plurality of training flow cell images to generate high resolution training flow cell images having a second resolution. [0466] In some embodiments, the sequencing system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform one or more operations of the methods 600, 700, 2800, and/or 2900.
[0467] In some embodiments, training the neural network using the methods 600, 700, and/or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips, can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs. In some embodiments, training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 20x, 40x, 60, 80x, lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
[0468] In some embodiments, training the neural network using the methods 600, 700, and/or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips, may require at least 2x, 8x, 10, 15x, 20x, 40x, 50x, or lOOx less power than training the same neural network(s) with identical training images using CPUs or GPUs.
[0469] In some embodiments, the sequencing system further comprises: a power source that is configured to supply identical or different power levels to the first reconfigurable logic device and the integrated circuit. In some embodiments, a maximum power output of the power source to the sequencing system in training the neural network using methods 600, 700, and/or 2900 is less than 1000 Watts, 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, 500 Watts, 400 Watts, or 300 Watts.
[0470] In some embodiments, the neural network is trained with traditional 2D flow cell images at a single z-level. In some embodiments, each neural network is trained with 2D flow cell images at a single z-level, and multiple neural networks may be trained to cover a 3D volumetric sample, e.g., in situ sample.
[0471] In some embodiments, the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with 3D flow cell images (3D volumetric image), training the neural networks with 2D flow cell images reduces the amount of computation, training time and training cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
[0472] In some embodiments, the sequencing method 700 comprises an operation 705 of acquiring, by the imager 116 of the sequencing system 110, a training set comprising corresponding a plurality of training flow cell images with a first resolution. The first resolution can be a standard resolution that can be achieved using the imager disclosed herein. For example, the first resolution can be within the range from 0.01 um to 15 um. For example, the first resolution can be within the range from 0.1 um to 5 um. The plurality of training flow cell images in the training set can be from one or more color channels. In some embodiments, The plurality of training flow cell images in the training set can be from 2, 3, 4, or more color channels. The plurality of training flow cell images in the training set can be from one or more cycles. For examples, the one or more cycles can be any number ranging from 1 to 10, 1 to 20, Ito 30, 1 to 50, 1 to 100, 1 to 200, or 1 to 500. The plurality of flow cell images can be at a single z level or multiple z levels.
[0473] In some embodiments, the sequencing method 700 comprises an operation 715 of up-sampling, by the sequencing system, the corresponding plurality of training flow cell images to generate high-resolution training flow cell images having a second resolution. The second resolution can be 2x, 4x, 8x, 16x, or higher than the first resolution. For example, if the first resolution is in the range from 0.01 um to 5 um, the corresponding second resolution that is 4x higher than the first resolution can be in the range from 0.0025 um to 1.25 um. Various up-sampling methods can be used for generating the high- resolution training flow cell images. Each high-resolution training flow cell image corresponds to a training flow cell image at the first resolution. In some embodiments, the operation 715 is optional. For example, the high resolution images may be directly generated via computer simulation or acquisition using the sequencing system disclosed herein.
[0474] In some embodiments, the sequencing method 700 comprises determining, by the sequencing system, a location list of polonies in the plurality of flow cell images; and extracting, by the sequencing system, intensities in the plurality of flow cell images based on the location list. [0475] In some embodiments, the sequencing method 700 comprises determining, by the sequencing system, a location list of polonies in the high resolution training flow cell images; and extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
[0476] In some embodiments, the sequencing method 700 comprises an operation of processing the high resolution training flow cell images to determine a location list of the polonies (e.g., bright spots in the image) and their processed intensities. Their processed intensities may have been processed using standard image processing such as background noise reduction, filtering, and intensity normalization. The operation of processing the training flow cell images or the high resolution training flow cell images can include polony map generation using the methods disclosed in details in U.S. Patent No.
11,200,446 and patent application Nos.18/078,820 and 18/078,797.
[0477] In some embodiments, the method 700 comprises an operation 725 of generating, by the sequencing system, reference intensities corresponding to the intensities (e.g., processed intensities) in the high resolution training flow cell images based on base calls of the high resolution training flow cell images. The operation 725 may be based on the location list so that only signals from polonies identified are used for generating the reference intensities, other signals, including background noise, possible artifacts from cellular structures in the images can be excluded. The operation 725 may be based on one or more image processing steps of the training flow cell images (e.g., cell segmentation, cell contouring, noise removal) so that only signals from polonies that are within an area of interest (e.g., within cells) are used for generating the reference intensities.
[0478] In some embodiments, at least part of the one or more samples comprises predetermined bases in the one or more cycles. In other words, the base calls for at least some of the polonies in the flow cell images in cycle(s) are predetermined. The base calls can be predetermined by sequencing known barcode sequences in the one or more cycles.
[0479] Various algorithms can be used to generate the reference intensities corresponding to the intensities in the processed high resolution training flow cell images. In some embodiments, the operation of generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity. [0480] In some embodiments, the intensities may undergo color correction, phasing/dephasing, normalization, and/or other corrections to reach the reference intensities. In some embodiments, the intensities may undergo de-noising to generate the reference intensities. As a nonlimiting example, as shown in FIG. 5F, the intensities of the high resolution training flow cell images from two different channels are plotted. Each plot is plotted as a dot with its corresponding intensity in channels 1, 2, 3, and 4. Based on the predetermined base call, the polonies within area 790 would have a base call of A, thus, corresponding reference intensity of each polony having a base call of A can be obtained by projecting the dots to the fitted line in the region 790, e.g., projection with the shortest distance. Then vertical axis of the projected intensity on the line may be the reference intensity of the polonies in channel 2, and the horizontal axis of the projected intensity would be the reference intensities in channel 1. Similarly, based on the predetermined base calls, corresponding reference intensity of each polony in area 791 can be obtained by projecting the dots to the fitted line in the region 791, e.g., projection with a shortest distance. Then horizontal axis of the projected intensity on the line may be the reference intensities of the corresponding polonies in area 790 in channel 1, and the vertical axis of the projected intensity on the fitted line in area 791 may be the reference intensity for the corresponding polonies within area 791 in channel 2. Similar projection may be performed for polonies plotted in the right panel for channels 3 and 4. By using projections of intensities to the fitted line instead of intensities from raw flow cell images, noises and artifacts, such as noise correlated with different channels, e.g., channel optics, illumination, etc., may be removed from reference intensities, thereby resulting in less interferences and more accurate reference intensities. It is understood that the reference intensity determination can be based on various methods for noise reduction and is not limited to the shortest distance projection in FIG. 5F.
[0481] In some embodiments, the algorithm for determining the reference intensity may be iterative such that the reference intensities obtained in earlier iteration(s) can be improved based on customized quality criteria in later iterations. The number of repetitions can be various numbers in a range from 1 to 10, 1 to 100, or more. For example, later iterations can use a different projection method that generates a smaller total distance to the fitted line as shown in FIG. 5F than the projection method that was used in earlier iteration(s). [0482] In some embodiments, the sequencing methods 700 may include an operation 730 of providing the reference intensities for comparison to training output(s) of the neural network. The reference intensities may be provided as flow cell image(s). In some embodiments, the reference intensities may be provided as a list of intensities corresponding to their locations in the flow cell images, e.g., as a array with a first column of reference intensity values and a second column with corresponding spatial coordinates of the reference intensity value. It is advantageous to use the list of intensities to save storage space, reduce data size, and allow efficient data communication. The input to the neural network may also include the location list.
[0483] In some embodiments, the operation 730 comprises an operation of providing the reference intensities in a plurality of patches for comparison to training output(s) of the neural network, wherein each patch comprise one or more patch images from one or more color channels, one or more cycles, one or more z-levels, or a combination thereof . In some embodiments, instead of training based on full sized flow cell images, the patches of the flow cell images may be used for training. Each patch may comprise one or more patch images cropped from the flow cell images (e.g., the second plurality of flow cell images). The training method 700 is configured to train the neural network for predicting one or more base calls within each individual patch, e.g., a single base call at or close to the center of the patch. The one or more base calls may be much less than the total number of base calls in the flow cell images. In some embodiments, the one or more base calls may be lOx, lOOx, 500x, lOOOx, 5000x, 104x, 105x, 106x, or more times less than the total number of base calls in the corresponding flow cell images. The method of training using patches of flow cell images does not require training of a large number of polonies (e.g., 1000 polonies) within a patch, thus may advantageously reduce computational complexity and increase training efficiency and accuracy.
[0484] In some embodiments, the sequencing method 700 herein include an operation 740 of repeatedly performing, until the output error satisfies a stopping criterion, one or more training operations comprising: an operation 755 of determining an output error by comparing the training output to the reference intensities; and an operation 760 of adjusting current values of parameters of the neural network, e.g., CNN, based on the output error. [0485] In some embodiment the operation 740 repeats itself using its output (e.g., adjusted parameters of the neural network) from the previous iteration as input to the current iteration. In some embodiment the operation
[0486] customized based on training 740 uses back propagation to adjust parameters of the neural network, e.g., weights.
[0487] In some embodiments, the output error may be based on a comparison between the reference intensities and the predicted intensities during an iteration of training. The comparison may be limited to those intensities and locations included in the location list. In some embodiments, the comparison may be limited to only a subset of intensities and corresponding locations in the location list.
[0488] The operation of 740 may stop when a stop criterion is met. The stop criterion can be customized. The stopping criterion can be customized based on training time, computational complexity, convergence rate, and/or various other metrics. Exemplary stopping criterion may include a fixed number of iterations, a fixed duration of training time, or a loss function belong a threshold. For example, the stopping criterion can be (1) stop after 10 epochs to reduce training time. As another example, the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0. Determining the output error can be based on various metrics, e.g., a loss function. Nonlimiting examples of the loss function can include: the sum of root mean square of the difference between the predicted intensities and the corresponding reference intensities based on the location list, or the sum of mean square errors.
[0489] In some embodiments, the method 700 may further comprises an operation 770 of generating the trained neural network with the adjusted parameters obtained in operation 760, e.g., in the last iteration or any other iterations during the repetition of operation 740. The trained neural network may then be used to predict high resolution intensities that can be used to determine high resolution base calls of flow cell images, e.g., using methods 500.
[0490] FIG. 29 shows an exemplary method 2900 for training the neural network, e.g., CNN, which can be used to predict polony locations (e.g., in operation 2804-2806), intensities of polonies, base calls, and/or classifications of one or more pixels. The prediction using the neural network trained by method 2900 (e.g., in operation 2812) may advantageously allow improved detectable polony density in the sample(s). [0491] In some embodiments, predicting base calls using method 2800 (with operation 2804’, without predicting the polony map using the neural network in operation 2804, and with predicting the base calls using the neural network in operation 2812) at a polony density of 300,000/mm2 or greater, e.g., 750,000/mm2, produces an error rate in base calling that is lower than the error rate of base calling using the classic non-neural network based algorithm. In some embodiments, predicting base calls using method 2800 (with operation 2804’, without predicting the polony map using the neural network in operation 2804, and with predicting the base calls using the neural network in operation 2812) at a polony density of 750,000/mm2, produces an error rate in base calling that is 40%, 50%, 60%, 70% or less of the error rate of base calling using the classic non-neural network based algorithm.
[0492] In some embodiments, predicting base calls using method 2800 (with predicting the polony map using the neural network in operation 2804 and predicting the base calls using the neural network in operation 2812) at a polony density of 300,000/mm2 or greater, e.g., 750,000/mm2 produces an error rate in base calling that is lower than the error rate of base calling using the classic non-neural network based algorithm. In some embodiments, predicting base calls using method 2800 (with predicting the polony map using the neural network in operation 2804 and predicting the base calls using the neural network in operation 2812) at a polony density of 750,000/mm2 produces an error rate in base calling that is 50%, 40%, 30%, 20%, 10%, 5% or less of the error rate of base calling using the classic non-neural network based algorithm.
[0493] In some embodiments, training of the neural networks using the methods 600, 700, or 2900 may use training images that are real flow cell images of samples, simulated flow cell images with distribution of signal spots and noise level that is similar to real flow cell images, or a combination thereof. Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue. Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity. When a predetermined prediction quality is needed, training with real flow cell images may advantageously allow reduced complexity of the neural network to achieve the predetermined quality than the complexity of the neural network trained using simulated data. In some embodiments, the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc. The values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s). For example, the error rate of base calling using a first neural network trained on simulated flow cell images can be determined in comparison with base calling using an existing primary analysis method without using any neural networks. The error rate in base calling using a second neural network trained using real flow cell images of in situ sample can also be obtained in comparison with base calling using the same existing primary analysis method without using any neural networks. The error rate in base calling using the first neural network can be higher than the error rate in base calling using the second neural network. The error rate in base calling using the first neural network can be 2x, 3x, 4x, 5x, 6x, lOx, or higher than the error rate in base calling using the second neural network.
[0494] In some embodiments, training of the neural networks herein can be done using only the sequencing system, e.g., the FPGA or Al chips onboard the sequencing system 110. In such cases, training may be done using hardware elements within the physical housing of the sequencing system 110 shown in FIG. 1. In some embodiments, training can be performed at least partly external to the sequencing system. For example, at least part of the training may be performed using hardware over the cloud 130.
[0495] In some embodiments, the sequencing system 110 comprises: a first reconfigurable logic device, e.g., a FPGA unit, comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit, e.g., a NPU chip or Al chip, comprising a second plurality of data processing engines configured to perform data processing in parallel, wherein the first reconfigurable logic device is configured to communicate data with the integrated circuit; a neural network (e.g., trained neural network) deployed at least partly on the second reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform one or more operations of the sequencing method 600, 700, or 2900; a second processor or the first processor to control the integrated circuit to perform one or more operations of the sequencing methods 600, 700, or 2900 to facilitate generating the sequencing analysis result(s).
[0496] In some embodiments, the sequencing system herein may comprise one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform one or more operations of the sequencing method 600, 700, or 2900.
[0497] In some embodiments, training the neural network using the methods 600, 700, or 2900 herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips, can be at least 2x, 8x, lOx, 20x, 40x, 50x, lOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs. In some embodiments, training the neural network using the methods herein with the reconfigurable logic device, e.g., the FPGA, and/or other integrated circuit, e.g., Al chips can be at least 20x, 40x, 60x, 80x, lOOx, 200x, 400x, 500x, 800x, lOOOx or faster than training the same neural network(s) with similar training images using CPUs or GPUs.
[0498] In some embodiments, the neural network is trained with the same type of flow cell images as which the neural network may make predictions on after being trained. For example, the neural network is trained with 2D flow cell images at multiple z levels and then may be used to predict base calls for 2D flow cell images at multiple z levels to cover a 3D in situ sample. As another example, the neural network is trained with 2D flow cell images from a single organ origin and then may be used to predict base calls for 2D flow cell images of samples extracted from the same organ, e.g., liver.
[0499] In some embodiments, the neural network is trained with traditional 2D flow cell images at a single z-level. In some embodiments, each neural network is trained with 2D flow cell images at a single z-level, and multiple neural networks may be trained to cover a 3D volumetric sample, e.g., in situ sample.
[0500] In some embodiments, the neural network is trained with 2D flow cell images at multiple z-levels that encompass the 3D volume of the volumetric sample(s). Comparing with training the neural network with 3D flow cell images (3D volumetric image), training the neural networks with 2D flow cell images reduces the amount of computation, training time and training cost. Further, the neural network trained with 2D flow cell images can be less complicated than the neural network trained with 3D training data, and makes prediction more efficient and simpler. In some embodiments, the neural network trained with 2D flow cell images may provide higher efficiency, save time and computational effort in its training and subsequent prediction of polony locations.
[0501] In some embodiments, the sequencing method 2900 comprises an operation 705 of acquiring, by the imager 116 of the sequencing system 110, a training set comprising a plurality of training flow cell images with a first resolution. The plurality of training flow cell images may be real images that are acquired using a sequencing system disclosed herein. The plurality of training flow cell images may be real images of one or more samples immobilized on a support, e.g., a flow cell device. The training flow cell images may be of 2D or 3D samples as disclosed herein in operation 705 relative to methods 700. Alternatively, the plurality of training flow cell images may include simulated flow cell images disclosed herein.
[0502] In some embodiments, the training flow cell images may be generated based on the characteristics of the sample(s) that predictions are going to be made. For example, for predicting polony locations in cellular samples, the training flow cell image may only include images (simulated or real images) of 3D samples of similar characteristics, e.g., liver samples, kidney samples, etc. As another example, for prediction polony locations in traditional 2D samples, the training flow cell images may only include 2D flow cell images with similar plexity and/or sample density. In some embodiments, the training flow cell images may include a combination of flow cell images, either of 2D or 3D samples, and with or without similar characteristics.
[0503] The training flow cell images may be generated at multiple z-locations in order to cover characteristics of the sample at different z levels. In some embodiments, the corresponding plurality of training flow cell images (simulated or real images) may include flow cell images at multiple z-levels. In some embodiments where the neural network is in 3D, the corresponding plurality of training flow cell images may include z- stacks of flow cell images, each z-stack may include a 3D volume made up from multiple z-levels of flow cell images comprised in the z-stack.
[0504] In some embodiments where the neural network is in 2D, the corresponding plurality of training flow cell images may include flow cell images at multiple z-levels (2D images) but not a z-stack of flow cell images (e.g., a 3D volume ).
[0505] In some embodiments, the systems and methods, e.g., method 2900, herein can be used to train neural networks to predict base calls for flow cell images acquired from one or more color channels, one or more cycles, and/or one or more z-levels in a sequence run. The training data used to train the neural networks herein may be generated using real flow cell images, and the reference intensities of the training data are advantageously determined after removing errors therein that may be caused by various sources including but not limited to: color cross-talk, spatial misalignment of polonies, and/or phase and dephasing, blurriness of out-of-focus polonies, thereby allowing more reliable training.
[0506] In some embodiments, the training data used to train the neural network herein does not include full flow cell images. Instead, the training data include patches (e.g., 16 pixels by 16 pixels patches) of the flow cell images from one or more color channels, one or more cycles, and/or one or more z-levels to provide spatial and temporal context for training. In some embodiments, the training using method 700 or 2900 may be training per polony as each patch only contain a very limited number of polonies, e.g., a single polony. The very limited number of polonies can be in a range from 1 to 4, 1 to 8, 1 to 20, 1 to 50, or 1 to 100. The very limited number of polonies can be lOOx, lOOOx, 104x, 105x, 106x, 107x, or 108x less than a total number of polonies in a corresponding flow cell image. Each patch may include a patch image per color channel, per cycle, and per z- level. Each patch image may share the same pixels of the corresponding portion of the flow cell images. Each patch image may include a single polony at or near the center of the patch image or a very limited number of polonies. Such training data may advantageously allow less complicated and more reliable training than training using flow cell images of one or more subtiles (e.g., 6000 pixels by 8000 pixels).
[0507] Training with real flow cell images may advantageously eliminate the need for generating simulated images that mimics the characteristics of polonies of different samples, which simplifies the training process especially when the sample include heterogenous intensities, polony densities across the flow cell image(s) and may include various types of cells or tissue. Training with real flow cell images may advantageously improve training results (e.g., the trained neural network can make improved prediction) than training using only simulated images with similar computational cost and neural network complexity. When a predetermined prediction quality is needed, training with real flow cell images may advantageously allow reduced complexity of the neural network to achieve the predetermined quality than the neural network trained using simulated data. In some embodiments, the prediction quality can be measured based on various metrics including but not limited to error rate in base calls, error rate in intensity values, density of base calls, density of polonies, etc. The values of metrics can be determined in alignment with results produced using existing primary analysis methods without using neural network(s).
[0508] In some embodiments, the methods 600, 700, 2900 may be used to train a neural network or any other artificial intelligence-based models using various references or ground truth that are not limited to reference base calls or reference intensities, e.g., in the second resolution.
[0509] In some embodiments, the methods 2900 may include an operation 2925’ of generating, by the sequencing system, references corresponding to the intensities in the high resolution training flow cell images. In some embodiments, the references have the same spatial resolution as the high resolution training flow cell images. In some embodiments, the plurality of training flow cell images are acquired from one or more color channels, and the references comprises reference base calls. Each reference base call may correspond to a polony in the plurality of high resolution training flow cell images. The references may be generated using various algorithms. The references may be based on existing datasets that are publicly available.
[0510] In some embodiments, the plurality of training flow cell images are acquired from one or more color channels, and the references comprises reference classifications. Each reference classification may correspond to a pixel in the plurality of high resolution training flow cell images from the one or more color channels. Exemplary classifications may include nucleotides A, T, C, G, U, and background. The classification of background can be for pixels that are not classified as any type of nucleotides, e.g., not classified as A, T, C, G, or U.
[0511] In some embodiments, the plurality of training flow cell images are acquired from one or more color channels in one or more cycles at one or more z-levels, and the references comprise reference classifications. A first reference classification may correspond to a pixel of a polony, and may have a classification that is a base call of that polony, and a second reference classification may correspond to a pixel outside any polony in the plurality of high resolution training flow cell images from multiple color channels, e.g., a background classification. The background classification may or may not be within a cell boundary of in situ cellular sample(s).
[0512] In some embodiments, the plurality of training flow cell images are acquired from a single color channel from one or more sequencing cycles at one or more different z- levels, and the references comprise reference polony maps. Each reference polony map may correspond to at least a portion of an image of the plurality of high resolution training flow cell images in a sequencing cycle. For example, each reference polony map may correspond to a patch extracted from the high resolution training flow cell images so that each pixel in the polony map corresponds to a corresponding pixel of the patch, and the reference polony map indicates which pixel(s) are within a polony, and which pixel(s) are not.
[0513] In some embodiments, the reference polony maps are generated using various algorithms for polony map generation. Exemplary polony map generation algorithms for generating 2D or 3D polony maps have been disclosed in U.S. Application No. 18/078,797 and 18/078,820, and U.S. Patent No. 10,266,888, and are incorporated herein by reference in their entireties.
[0514] The first resolution can be a standard resolution that can be achieved using the imager disclosed herein. For example, the first resolution can be within the range from 0.01 um to 15 um. For example, the first resolution can be within the range from 0.01 um to 5 um. The plurality of training flow cell images in the training set can be from one or more color channels. In some embodiments, The plurality of training flow cell images in the training set can be from 4 color channels. The plurality of training flow cell images in the training set can be from one or more cycles. For examples, the one or more cycles can be any number ranging from 1 to 10, 1 to 20, Ito 30, 1 to 50, 1 to 100, 1 to 200, or 1 to 500. The plurality of flow cell images can be at a single z level or multiple z levels.
[0515] In some embodiments, the sequencing method 2900 comprises an operation 715 of up-sampling, by the sequencing system, the plurality of training flow cell images to generate high-resolution training flow cell images having a second resolution. The second resolution can be 2x, 4x, 8x, 16x, or higher than the first resolution. For example, if the first resolution is in the range from 0.01 um to 5 um, the corresponding second resolution that is 4x higher than the first resolution can be in the range from 0.0025 um to 1.25 um. Various up-sampling methods can be used for generating the high- resolution training flow cell images. Each high-resolution training flow cell image corresponds to a training flow cell image at the first resolution.
[0516] In some embodiments, the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies in the plurality of flow cell images (e.g., a polony map containing locations of polonies or a polony map containing a location list of the polonies); and optionally extracting, by the sequencing system, intensities in the plurality of flow cell images based on the location list. Such operation in method 2900 can be performed as similar operation for determining a polony map or a location list as disclosed herein with respect to other methods, e.g., similar as operation 2806.
[0517] In some embodiments, the sequencing method 2900 comprises an operation of determining, by the sequencing system, locations of polonies (e.g., a polony map containing locations of polonies); in the high resolution training flow cell images; and optionally extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
[0518] In some embodiments, the sequencing method 2900 comprises an operation of processing the high resolution training flow cell images to determine location of the polonies (e.g., bright spots in the image) and their processed intensities. Their processed intensities may have been processed using image processing steps including but not limited to background removal, noise reduction, filtering, intensity normalization, intensity offset adjustment, phase and paraphrasing, image registration, color correction, and deconvolution.
[0519] The operation of processing the training flow cell images or the high resolution training flow cell images can include polony map generation. Exemplary polony map generation embodiments are disclosed in details in U.S. Patent No. 11,200,446 and U.S. patent application Nos.18/078,820 and 18/078,797, which are incorporated herein by reference in their entireties.
[0520] In some embodiments, the method 2900 comprises an operation 2925 of generating, by the sequencing system, reference base calls of the high resolution training flow cell images. The operation 2925 may be based on locations of polonies (e.g., the polony map) so that only signals from polonies identified in the polony map are used for generating the reference base calls, other signals, including background noise, possible artifacts from cellular structures in the images can be excluded. In some embodiments, the reference base calls of the high resolution training flow cell images may be generated based on multiple patches, and each patch comprises one or more patch images from one or more color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images. In some embodiment, the patches may be generated based on the location list or the polony map so that each patch image has a single polony at or near its center pixel(s). Each patch image corresponds to a reference base call of the single polony at or near its center pixels. In some other embodiments, each patch image corresponds to a very limited number of reference base calls in the patch image, e.g., in a range from 1 to 10 or from 1 to 100.
[0521] In some embodiments, the method 2900 comprises an operation 2925’ (which replaces operation 2925) of generating, by the sequencing system, references, instead of reference base calls, of the high resolution training flow cell images. The operation 2925’ may be based on locations of polonies (e.g., the polony map) so that only signals from polonies identified in the polony map are used for generating the references, including background noise, possible artifacts from cellular structures in the images can be excluded. In some embodiments, the operation 2925’ may generate the references for some or all of the pixels of the flow cell images without requiring locations of the polonies or the polony map. For example, the operation 2925’ may generate the references as reference classifications of A, T, C, G, U, or background for each pixel of the flow cell images after aligning flow cell images from different color channels.
[0522] In some embodiments, the patches extracted from the training flow cell images have properties that are similar as patches that predictions are going to be generated, e.g., using methods 2800. As nonlimiting examples, such properties can include patch size, location of the single polony within the patch, range of intensities for pixels within patches.
[0523] In some embodiments, each patch comprises a single polony located at or in close vicinity to a center of the corresponding patch. For example, the polony may be no more than 1 to 10 pixels away from the center of the corresponding patch. In some embodiments, each patch comprises 3 to 128 pixels along a spatial dimension, e.g., along x or y direction. The size of the patches are maintained to be relatively small comparing to the size of the flow cell images, e.g., lOx, 20x, 50x, lOOx, 500x, lOOOx or less than the size of the flow cell image. In some embodiments, the plurality of patches comprises 100 to 108 patches. In some embodiments, two or more different patches may overlap at least partly with each other. In some embodiments, each patch may or may not contain more than one, two, three, five, or ten polonies therewithin, but only the pixel(s) of the single polony at its center is used for generating base call(s) corresponding to the patch. For example, when each patch include a patch image sized to be 32 by 32, a first patch may include pixels 1-32 in both x and y directions to cover a polony centered at pixels (16, 16) of the flow cell images, a second patch may include pixels 2-33 in both x and y directions to cover a second polony centered at pixels (17, 17.5), and a third patch may include pixels 5-36 in both x and y directions to cover a third polony centered at pixels (19, 19) of the flow cell images.
[0524] In some embodiments, the number of pixels within each patch can be optimized to balance the computational complexity and spatial context information to be included for training the neural network(s). In some embodiments, the number of pixels within each patch can be at least partly based on polony density of the sample being imaged. The number of patch images within each patch can be optimized to balance the computational complexity and the spatial context information within each patch for accurate and reliable prediction using the neural network. In some embodiments, each patch may comprise multiple patch images corresponding to different color channels. For example, each patch may comprise a patch image covering same pixels within the x-y plane in three different color channels. The same pixels may be pixels determined after registration to correct for the spatial offset across different color channels. In some embodiments, each patch may comprise multiple patch images corresponding to different cycles, e.g., continuous cycles n-1, n, n+1, within a sequencing run. For example, each patch may comprise 3 images, each from a different color channel in 4 adjacent cycles, so that each patch may comprise 12 patch images in total. When the sample is in 3D, e.g., an in situ cell sample, each patch may include 5 different z levels to make the total number of patch images of 60.
[0525] In some embodiments, at least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise some identical pixels. In some embodiments, each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
[0526] In some embodiments, the training flow cell images are acquired only from a single color channel in one or more sequencing cycles and/or one or more z-levels, so that training flow cell images acquired from different color channels may be used to train different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein, for a single color channel.
[0527] In some embodiments, the training flow cell images are acquired only from a single z level from one or more color channels in one or more sequencing cycles, so that training flow cell images acquired at different z levels of 3D sample(s), e.g., in situ cells, may be used to train different neural networks for predicting high resolution intensities, base calls, classifications, etc., as disclosed herein. [0528] In some embodiments, the training flow cell images are acquired from the one or more cycles from one or more color channels and at one or more z-levels. In some embodiments, the one or more cycles comprises a plurality of consecutive cycles in a sequencing run.
[0529] In some embodiments, the operation 2925 or 2925’ of generating the reference base calls or references of the high resolution training flow cell images is for each patch of the plurality of patches. For example, reference intensities of the high resolution training flow cell images may be determined using an operation similar to operation 725 disclosed herein.
[0530] Various algorithms can be used to generate the reference intensities corresponding to the intensities in the processed high resolution training flow cell images. In some embodiments, the operation of generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity.
[0531] In some embodiments, the intensities may undergo color correction, phasing/dephasing, normalization, and/or other corrections to reach the reference intensities. As a nonlimiting example, as shown in FIG. 5F, the intensities of the high resolution training flow cell images from two different channels are plotted. Each plot is plotted as a dot with its corresponding intensity in channels 1, 2, 3, and 4. Based on the predetermined base call, the polonies within area 790 would have a base call of A, thus, corresponding reference intensity of each polony having a base call of A can be obtained by projecting the dots to the fitted line in the region 790, e.g., projection with shortest distance. Then vertical axis of the projected intensity on the line may be the reference intensity, and the horizontal axis of the projected intensity would be close to zero. It is understood that the reference intensity determination is not limited to the shortest distance projection in FIG. 5F.
[0532] In some embodiments, the algorithm for determining the reference intensity may be iterative such that the reference intensities obtained in earlier iteration(s) can be improved based on customized quality criteria in later iterations. The number of repetitions can be various numbers in a range from 1 to 10, 1 to 100, or more. For example, later iterations can use a different projection method that generates a smaller total distance to the fitted line as shown in FIG. 5F than the projection method that was used in earlier iteration(s).
[0533] In some embodiments, the plurality of patches can be extracted from the high resolution training flow cell images after reference intensities are generated. Each patch may include a patch image corresponding to a different color channel, and reference base calls may be determined based on the reference intensities from all color channels. Similarly, reference classifications of the patch may be determined similarly except that patches that satisfy certain customized conditions are background but not any type of nucleotides. For example, for a patch with 4 different patch images each corresponding to a color channel, if the reference intensities from all 4 channels are very similar to each other and all below a predetermined signal level, the patch then can have background classification.
[0534] In some embodiments, each patch may only include a single patch image from a single color channel. The operation 2925 or 2925’ may comprise generating a first single reference intensity for a first channel of the multiple color channels corresponding to the corresponding patch in a single sequencing cycle.
[0535] In some embodiments, each patch may include multiple patch images from the same single color channel but from different sequencing cycles. The operation 2925 or 2925’ may comprise generating reference intensities for a first channel of the multiple color channels corresponding to the corresponding patch in one or more sequencing cycles.
[0536] In some embodiments, the operation 2930 or 2930’ may include providing the reference base calls or references so that they are available for comparison to training output(s) of the neural network, depending on how the user may want to train the neural network which may include: a single intensity of a single color channel at one sequencing cycle for each patch (for training a different neural network for each color channel), multiple intensities of a single color channel at multiple sequencing cycles (for training a different neural network for each color channel), or multiple intensities of multiple different color channels at one or more sequencing cycles (for training a single neural network for different color channels).
[0537] In some embodiments, patches of reference base calls or references may also be separated based on z-levels of a 3D sample in order to train different neural networks at different z levels. Alternatively, a single neural network may be trained using patches from different z levels.
[0538] In embodiments where references are used as ground truth, the method 2900 may include an operation 2930’ of providing, by the processor, the references for comparison to training output(s) of the neural network. In some embodiments, the high resolution training flow cell images are also provided in operation 2930 or 2930’ for comparison to training output(s) of the neural network.
[0539] In embodiments where references are used as ground truth, the method 2900 may include an operation 2955’ of determining an output error by comparing the training output and the references, instead of operation 2955.
[0540] In some embodiments, at least part of the one or more samples comprises predetermined nucleotide bases in the one or more cycles. In other words, the base calls for at least some of the polonies in the flow cell images in cycle(s) are predetermined. The base calls can be predetermined by sequencing known barcode sequences in the one or more cycles.
[0541] Various algorithms can be used to generate the reference base calls corresponding to the intensities in the processed high resolution training flow cell images. In some embodiments, the operation of generating the reference base calls in the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity used for generating reference base calls.
[0542] In some embodiments, the algorithm for determining the reference base calls is based on determination of the reference intensities as disclosed herein, e.g., in methods 700.
[0543] In some embodiment, the sequencing methods 2900 may include an operation 2930 of providing the reference base calls so that they can be compared to the training output(s) of the neural network for training. The operation 2930 is similar to operation 730 in method 700. The reference base calls may be provided as flow cell image(s) or alternatively as patches, each patch may comprise one or more patch images, and each patch image have a polony at or near its center. In some embodiments, the reference base calls may be provided as a list of base calls corresponding to their locations in the flow cell images. [0544] In some embodiments, the operation 2930 comprises an operation of providing the reference base calls in a plurality of patches for comparison during training to the training output(s) of the neural network, wherein each patch comprise one or more patch images from multiple color channels. In some embodiments, instead of training based on full sized flow cell images, the patches of the flow cell images may be used for training per polony. Each patch may comprise one or more patch images cropped from the flow cell images (e.g., the second plurality of flow cell images). The training method 2900 may be configured to train the neural network, e.g., CNN for predicting a single base call at or close to the center of the patch. The method of training 2900 using patches of flow cell images does not require training of a large number of polonies within a patch, thus may advantageously reduce computational complexity and increase training efficiency and accuracy.
[0545] The input to the neural network may also include the location list, e.g., a polony map.
[0546] In some embodiments, the sequencing method 2900 herein include an operation 740 of repeatedly performing, until the output error satisfies a stopping criterion, one or more training operations comprising: an operation 2955 of determining an output error by comparing the training output and the reference base calls; and an operation 760 of adjust current values of parameters of the convolutional neural network based on the output error.
[0547] In some embodiment the operation 740 repeats itself using its output (e.g., adjusted parameters of the neural network) from the previous iteration as input to the current iteration.
[0548] In some embodiments, the output error may be based on a comparison between the reference base calls and the predicted base calls during an iteration of training. The comparison may only include locations in the location list, e.g., the polony map. In some embodiments, the comparison may include a subset of location in the location list.
[0549] The operation of 740 may stop when a stop criterion is met. For example, the stop criterion can be customized. The stopping criterion can be customized based on training time, computational complexity, and convergence rate. Exemplary stopping criterion include a fixed number of iterations, a fixed duration of training time, or a minimized loss function. For example, the stopping criterion can be (1) stop after 10 epochs to reduce training time. As another example, the stopping criterion can be (2) stop when the value of the loss function (or the output error) is less than a predetermined value close to 0. Determining the output error can be based on various metrics, e.g., a loss function. Nonlimiting examples of the loss function can include: the sum of root mean square of the difference between the predicted intensities and the corresponding reference base calls based on the location list, or the sum of mean square errors.
[0550] In some embodiments, the method 2900 may further comprises an operation 770 of generating the trained neural network with eh adjusted parameters obtained in operation 760.
[0551] Values of the parameters of the neural network, e.g., weights of parameters of a convolutional layer, can be adjusted based on the output error or one or more previous output errors.
[0552] In some embodiments, z-stacks of training flow cell images from a same channel can be acquired, e.g., in operation 705, to train the neural network, e.g., CNN, for that particular channel. A certain percentage, e.g., 80%, of the training set may be used for training, and the rest of the training set, e.g., 20%, may be used for validation. Batch size can be one, Epochs can be about 10, 12, 15, 20, or more. And various optimizers can be used.
[0553] In some embodiments, the neural network comprises one or more convolution neural networks. In some embodiments, the neural network comprises one or more U-Net units.
[0554] In some embodiments, comparing the training output to the reference set comprises: calculating mean square error in the predicted intensities generated by the neural network being trained and the corresponding reference intensities based on the location list.
Cell images and staining
[0555] In some embodiments, the sequencing system is configured to acquire one or more cell images that may include images of the cell and/or tissue with various types of staining, e.g., fluorescent staining, configured to show morphological information of the sample. In some embodiments, the one or more cell images can comprise staining of cellular structures that help locate polonies or clusters relative to the stained structures. For example, staining can be of cellular structures or components including but not limited to membranes, nuclei, and mitochondria. Different staining colors may be used to stain different components of the cell.
[0556] In some embodiments, the cell membrane after sequencing analysis and imaging using the sequencing system and reactions can be permeabilized. In some embodiments, the one or more cell images can comprise staining of lipids, such as lipids comprised in the cell membrane. In some embodiments, instead of labeling the lipids, the one or more cell images can comprise staining of one or more transmembrane proteins. The transmembrane proteins can be proteins embedded in the permeabilized membrane.
[0557] In some embodiments, the one or more cell images comprise fluorescence or luminescence signals from cell membranes. The one or more cell images can be microscopic images. The one or more images can be fluorescent images. In some embodiments, different fluorescent colors can be included in the cell images. For example, the nuclei and the cell membrane can be stained with different colors.
[0558] In some embodiments, the one or more cell images can comprise segments of: cells, membranes, nuclei, and/or other morphological structures. In some embodiments, the edge(s) of each segment encompass the entire membrane of the cell within the segment. There can be only one cell in each segment. Some segments may not have any cell in them. In some embodiments, adjacent segments do not overlap with each other. In some embodiments, adjacent segments only overlap with each other by sharing one or more edges. In some embodiments, various segmentation algorithms can be used for segmenting the cells.
[0559] In some embodiments, the cell images disclosed herein are stained. The staining can occur after acquiring flow cell images using the sequencing system 110. In some embodiments, the staining can occur before acquiring sequencing images. The methods of staining the 3D sample such as the cells, tissue can include one or more operations disclosed herein. The staining of the 3D sample can use various methods that can specifically label one or more cell protein(s) that are located mostly in the membrane but with negligible occurrence in other regions of the cell (e.g., less than 10%, 5%, 2% in amount or concentration).
[0560] In some embodiments, the cell images may be acquired using the sequencing system 100 herein without moving the sample(s) from its position during sequencing. It is advantageous to stain the sample after sequencing and acquire the cell images while keep the samples immobilized to the sample stage of the sequencing system. Some transformation, e.g., rotation, translation, shearing may still occur so that there is a need to registered the flow cell images during sequencing to the cell images acquired after sequencing and staining. In some embodiments, the cell images may be acquired using optical device(s) external to the sequencing system 100 after the sequencing run has been completed and after moving the sample away from the sequencing system 100.
Samples
[0561] In some embodiments, the sequencing system including optical system advantageously enables sequencing and imaging of target analyte(s) or features while they remain intact inside the cell or tissue. In some embodiments, the cell or tissue and the targets (e.g., target analytes, structure elements, organelles, etc.) therewithin remain intact during sequencing and/or imaging. In some embodiments, the one or more samples being imaged using the optical systems herein can be 2D or 3D samples. The 2D sample(s) may include traditional nucleotide acid molecules extracted from various sources. The 3D samples can include various samples in which polonies within the sample does not fit into a single z level while keeping the polonies in focus. The 3D samples may include in situ samples such as cells and/or tissues. In some embodiments, the cells or tissue samples are immobilized on the flow cell device or otherwise substrate for sequencing and/or imaging without modifying the spatial locations of targets within the cells or tissue. In some embodiments, the cells or tissue samples are immobilized on the flow cell device or otherwise substrate for sequencing or imaging without modifying the spatial relationship of targets or target analytes within the cells or tissue. In some embodiments, the cells and/or tissue are immobilized with the morphological features, RNA, mRNA, and protein targets of the samples intact inside the cell(s) or tissue during sequencing and/or imaging. In some embodiments, the spatial locations or relationships of the target analytes or targets remain intact during sequencing and/or imaging. In some embodiments, the spatial locations or relationships of the target analytes or targets during sequencing and/or imaging are not manually reconstructed using artificially added structure or features in the sample. For example, the nucleus, cell membrane, mitochondria, and extracellular matrix can retain their relative spatial relationship to each other in the sample(s) during imaging and/or sequencing.
[0562] In some embodiments, the one or more samples include target analyte(s) that are located inside the sample(s) or on the membrane of the sample(s). In some embodiments, the one or more samples include target analyte(s) that are on the exterior or interior surface of the cell. In some embodiments, the one or more samples include target analyte(s) that are on the exterior or interior surface of the cell membrane. In some embodiments, In some embodiments, the one or more samples include target analyte(s) that are part of the extracellular matrix. In some embodiments, the one or more samples include target analyte(s) that are part of and/or located on one or more organelles within the cell or tissue. In some embodiments, the one or more samples include target analytes that are on or in the glycocalyx or belong to part of the glycocalyx.
[0563] In some embodiments, the target analyte(s) comprise at least one polypeptide, lipid, nucleic acid or polysaccharide. In some embodiments, the target analyte(s) comprise at least one polypeptide, enzyme or lipid located anywhere in the sample(s) including the cytoplasm and nucleus. In some embodiments, the target analyte(s) comprise at least one polypeptide, enzyme or lipid located in or on a cellular structure including without limits any cellular membrane, nucleus, nucleolus, mitochondria, chloroplast, Golgi apparatus, ribosome, endoplasmic reticulum, microtubules, peroxisome and lysosome.
[0564] The methods, devices, and systems disclosed herein allow sequencing and analysis of various samples and sources. The samples may include nucleic acids extracted from any of a variety of biological samples, e.g., blood samples, saliva samples, urine samples, cell samples, tissue samples, and the like. In some embodiments, the samples here may include a variety of different cell, tissue, or sample types known to those of skill in the art. For example, the sample(s) may be from eukaryotes (such as animals, plants, fungi, protista), archaebacteria, or eubacteria. In some embodiments, the sample(s) may include prokaryotic or eukaryotic cells, such as adherent or non-adherent eukaryotic cells. In some embodiments, the sample(s) may be from, for example, primary or immortalized rodent, porcine, feline, canine, bovine, equine, primate, or human cell lines. In some embodiments, the sample(s) may include a variety of different cell, organ, or tissue types (e.g., white blood cells, red blood cells, platelets, epithelial cells, endothelial cells, neurons, glial cells, astrocytes, fibroblasts, skeletal muscle cells, smooth muscle cells, gametes, or cells from the heart, lungs, brain, liver, kidney, spleen, pancreas, thymus, bladder, stomach, colon, or small intestine). In some embodiments, the sample(s) may include normal or healthy cells. Alternately or in combination, the sample(s) may include diseased cells, such as cancerous cells, or from pathogenic cells that are infecting a host. In some embodiments, the sample(s) may include a distinct subset of cell types, e.g., immune cells (such as T cells, cytotoxic (killer) T cells, helper T cells, alpha beta T cells, gamma delta T cells, T cell progenitors, B cells, B-cell progenitors, lymphoid stem cells, myeloid progenitor cells, lymphocytes, granulocytes, Natural Killer cells, plasma cells, memory cells, neutrophils, eosinophils, basophils, mast cells, monocytes, dendritic cells, and/or macrophages, or any combination thereof), undifferentiated human stem cells, human stem cells that have been induced to differentiate, rare cells (e.g., circulating tumor cells (CTCs), circulating epithelial cells, circulating endothelial cells, circulating endometrial cells, bone marrow cells, progenitor cells, foam cells, mesenchymal cells, or trophoblasts). Other cells are contemplated and consistent with the disclosure herein.
Image registration to cell images
[0565] In some embodiments, the methods disclosed herein may comprise an operation of registering, by the reconfigurable logic device and/or the integrated circuit, the one or more cell images (e.g., with staining) to sequencing images or results of the sample, e.g., base calls of the determined polonies. In some embodiments, such operation is performed by the different combinations of the first plurality of data processing engines and the first reconfigurable routing channels after the operation of determining polonies from the second plurality of flow cell images or after the operation of performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images. In some embodiments, such of operation of registering the cell images to flow cell images or base calls may be performed by the integrated circuits, and the registration results, e.g., the transformation(s), may be communicated from the integrated circuit to the reconfigurable logic device or the one or more processors of the sequencing system. In some embodiments, the methods herein include saving the registration results, by the reconfigurable logic device, the integrated circuit, or the one or more processors into a predetermined file format, e.g., a FastQ data file, so that it can be accessed using similar software that is configured to access sequencing results such as base calls.
[0566] In some embodiments, the methods further include an operation of accessing both the registration results of the cell images and other sequencing results to present sequencing results in correspondence with the morphological information of the sample, e.g., to a user. For example, the methods may include an operation of displaying a base calling results in color that is spatially registered to cellular features, e.g., the nucleus, so that the aligned results can conveniently allow the user to identify base calls in relation to the morphological information of cells. In some embodiments, saving and access the registration results of the cell images and other sequencing results may be performed by the one or more processors, the reconfigurable logic device, and/or the integrated circuit. In some embodiments, the registration results of the cell images and other sequencing results may be saved into a memory device that is within the housing of the sequencing system. In some embodiments, the registration results of the cell images and other sequencing results may be saved into a memory device that is on the cloud 130 external to the sequencing system.
[0567] Various methods can be used for registering the cell images, e.g., based on fiducial markers, to one or more of: the second plurality of flow cell images, the determined polonies, the base calls of the determined polonies. At least some fiducial marks are included in the cell images and the images that the cell images are to be registered or aligned to.
[0568] The fiducial markers can be internal or external to the sample. For example, internal fiducial markers can include at least some of the polonies or clusters or background objects in the sample. As another example, external fiducial markers can be microspheres coated on the flow cell so that the signal from the microspheres can function similarly as internal fiducial markers for registration. The same fiducial markers can appear in sequencing images, e.g., the flow cell image(s), the cell images so that transformation(s) can be derived from aligning the fiducial markers in different images. The transformation(s) can be used for registering or aligning the sequencing image(s) and cell image(s) and objects that appear in them. Exemplary embodiments of image registration methods are described in PCT patent application No. PCT/US2023/067931 (where the contents of the patent are hereby incorporated by reference in its entirety).
[0569] For example, a polony or other object, e.g., background objects as fiducial markers, with image intensity I centers at location (xl,yl) in a sequencing image can appear at location (x2, y2) with intensity F in a cell image, where (x2,y2) Mr *(xl,yl), and Mr is the transformation matrix. Similarly, the inverse transformation matrix Mr 1 can be determined such that (xl,y 1) — Mr-1*(x2,y2). The registration of images can be in 2D and can include translation, scaling, rotation, and/or shearing of flow cell images among different channels. Multiple points in the sequencing image and their corresponding points in the cell image can be used to determine the transformation. The minimum number of points that is needed can be determined by the degree of freedom in the transformation. In some embodiments, the image registration can be 3D with coordinates in x, y, and z axes.
[0570] In some embodiments, an image, e.g., a flow cell images, a cell images, etc., can be divided into multiple subtitles, and a transformation can be determined for each subtile to represent the transformation of the whole image. In some embodiments, the image transformation of each subtile can be uniquely represented by a transformation matrix. The transformation matrix can be determined as below:
/al a2 an \ /Mil M12 M13\ /xl x2 xn \ bl b2 ...bn M21 M22 M23 j I yl y2 ...yn j (1)
\ 1 1 1 / \M31 M32 M33/ V 1 1 1 /
/Mil M12 M13
M = M21 M22 M23 (2) 31 M32 M33 where n is the number of subtiles, al = xl+ dxl, bl=yl+dyl, a2 = x2 +dx2, b2= y2+dy2, . . . an=xn+dxn, bn = yn +dyn, dl . . . dn are 2D shifts corresponding to the subtiles, and where dxn and dyn are shift components of the 2D shift, dn, in the x and y axis, respectively, and wherein M is the 3 x 3 transformation matrix of the subtile.
[0571] In some embodiments, the transformation matrix can be defined as the inverse matrix of M, i.e., M’1, so that equation (1) can be expressed differently as
[0572] In some embodiments, the transformation matrix M is an estimation in equations (1) and (3) based on the 2D shifts. In some embodiments, the value of n may affect the accuracy of the estimation.
[0573] In some embodiments, more than one region can be selected within a subtile for cross correlation calculation, and more than one 2D shift can be calculated for each subtile and used for estimating the transformation of the subtile. In these embodiments, n in equation (1) can be replaced by a larger number, e.g., 2*n when 2 regions are selected per subtile, and the transformation matrix M can be estimated using equations (1) and (2).
[0574] In some embodiments, (al, bl) . . . (an, bn) in equations (l)-(3) are coordinates for selected region(s) (e.g., coordinates of a center pixel of the corresponding region(s)) after transformation, (xl, yl). . . (xn, yn) are coordinates of the selected region(s) before transformation, e.g., coordinates of a center pixel. [0575] In some embodiments, n is a number that is no less than 3. The larger the n, the more information can be used to estimate the transformation matrix M. In some embodiments, n is not greater than 9.
[0576] In some embodiments, the transformation of one or more subtiles is linear. In some embodiments, the transformation of all subtiles is linear. In some embodiments, the transformation matrix is a matrix in which M31 and M32 is equal to 0, and M33 is 1. In some embodiments, one or more of the transformations per subtile is an affine transformation and the transformation matrix of the entire flow cell image is an affine matrix.
[0577] In some embodiments, the transformation matrix M is an estimation in equations (1) and (3) based on the size of the selected region(s). In some embodiments, the size of selected region may affect the accuracy of the estimation. In some embodiments, the size of the select region can be about 128 x 128. In some embodiments, the size of the selected region can be about 32 x 32, 48 x 48, 64 x 64, 96 x 96, 160 x 160, 196 x 196, 256 x 256, or of various different sizes. The transformations per subtile as disclosed herein can be calculated using a selected region within a subtile, the selected region can be equal to or smaller than the subtile. In either case, the transformation estimated using the region can be used to estimate the transformation of the entire subtile given the intrinsic characteristics of image transformation across sequencing cycles. The image transformation between cycles and/or between neighboring pixels can be relatively small, e.g., with less than about 8%, 5% or less than about 1% of scaling, rotation, and/or shearing. In some embodiments, the transformations disclosed herein can include an image translation with greater than about 5% difference between cycles and/or between neighboring pixels.
[0578] After the plurality of transformations are determined for individual subtiles, the transformation of the entire flow cell image can be accurately and reliably estimated by transforming individual subtiles using the plurality of transformations and combining the transformed subtiles into a transformed flow cell. The techniques disclosed herein advantageously estimate the transformation of the flow cell image by determining a plurality of transformations of its individual subtiles. The plurality of transformations can be linear and yet accurately and reliably estimate the transformation of the flow cell image even if the transformation is non-linear. The techniques disclosed herein advantageously eliminate the need to calculate the transformation of the entire images to be registered or aligned which can be more computationally intensive and timeconsuming and prone to failure than estimating a plurality of transformations for the corresponding subtiles of the entire images.
Computer systems
[0579] Various aspects of the methods described herein, such as methods 500, 600, 700, 2800 and 2900, may be implemented, for example, using one or more computer systems, such as computer system 400 shown in FIG. 4. One or more computer systems 400 may be used, for example, to implement any of the aspects discussed herein, as well as combinations and sub-combinations thereof.
[0580] Computer system 400 may include one or more hardware processors 404. The hardware processor 404 may be central processing unit (CPU), graphic processing units (GPU), or their combination. Processor 404 may be connected to a bus or communication infrastructure 406.
[0581] Computer system 400 may also include user input/output device(s) 403, such as monitors, keyboards, pointing devices, etc., which may communicate with communication infrastructure 406 through user input/output interface(s) 402. The user input/output devices 403 may be coupled to the user interface 124 in FIG. 1.
[0582] One or more units of processors 404 may be a graphics processing unit (GPU). In an aspect, a GPU may be a processor that is a specialized electronic circuit designed to process mathematically intensive applications. The GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, vector processing, array processing, etc., as well as cryptography (including brute-force cracking), generating cryptographic hashes or hash sequences, solving partial hashinversion problems, and/or producing results of other proof-of-work computations for some blockchain-based applications, for example. With capabilities of general-purpose computing on graphics processing units (GPGPU), the GPU may be particularly useful in at least the image recognition and machine learning aspects described herein.
[0583] Additionally, one or more of processors 404 may include a coprocessor or other implementation of logic for accelerating cryptographic calculations or other specialized mathematical functions, including hardware-accelerated cryptographic coprocessors. Such accelerated processors may further include instruction set(s) for acceleration using coprocessors and/or other logic to facilitate such acceleration.
[0584] Computer system 400 may also include a data storage device such as a main or primary memory 408, e.g., random access memory (RAM). Main memory 408 may include one or more levels of cache. Main memory 408 may have stored therein control logic (i.e., computer software) and/or data.
[0585] Computer system 400 may also include one or more secondary data storage devices or secondary memory 410. Secondary memory 410 may include, for example, a main storage drive 412 and/or a removable storage device or drive 414. Main storage drive 412 may be a hard disk drive or solid-state drive, for example. Removable storage drive 414 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
[0586] Removable storage drive 414 may interact with a removable storage unit 418.
[0587] Removable storage unit 418 may include a computer usable or readable storage device having stored thereon computer software and/or data. The software may include control logic. The software may include instructions executable by the hardware processor(s) 404. Removable storage unit 418 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device. Removable storage drive 414 may read from and/or write to removable storage unit 418.
[0588] Secondary memory 410 may include other means, devices, components, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 400. Such means, devices, components, instrumentalities or other approaches may include, for example, a removable storage unit 422 and an interface 420. Examples of the removable storage unit 422 and the interface 420 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
[0589] Computer system 400 may further include a communication or network interface 424. Communication interface 424 may enable computer system 400 to communicate and interact with any combination of external devices, external networks, external entities, etc. (individually and collectively referenced by reference number 428). For example, communication interface 424 may allow computer system 400 to communicate with external or remote devices 428 over communication path 426, which may be wired and/or wireless (or a combination thereof), and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 400 via communication path 426. In some aspects, communication path 426 is the connection to the cloud 130, as depicted in FIG. 1. The external devices, etc. referred to by reference number 428 may be devices, networks, entities, etc. in the cloud 130.
[0590] Computer system 400 may also be any of a personal digital assistant (PDA), desktop workstation, laptop or notebook computer, netbook, tablet, smart phone, smart watch or other wearable, appliance, part of the Internet of Things (loT), and/or embedded system, to name a few non-limiting examples, or any combination thereof.
[0591] It should be appreciated that the framework described herein may be implemented as a method, process, apparatus, system, or article of manufacture such as a non-transitory computer-readable medium or device. For illustration purposes, the present framework may be described in the context of distributed ledgers being publicly available, or at least available to untrusted third parties. One example as a modern use case is with blockchainbased systems. It should be appreciated, however, that the present framework may also be applied in other settings where sensitive or confidential information may need to pass by or through hands of untrusted third parties, and that this technology is in no way limited to distributed ledgers or blockchain uses.
[0592] Computer system 400 may be a client or server, accessing or hosting any applications and/or data through any delivery paradigm, including but not limited to remote or distributed cloud computing solutions; local or on-premises software (e.g., “onpremise” cloud-based solutions); “as a service” models (e.g., content as a service (CaaS), digital content as a service (DCaaS), software as a service (SaaS), managed software as a service (MSaaS), platform as a service (PaaS), desktop as a service (DaaS), framework as a service (FaaS), backend as a service (BaaS), mobile backend as a service (MBaaS), infrastructure as a service (laaS), database as a service (DBaaS), etc.); and/or a hybrid model including any combination of the foregoing examples or other services or delivery paradigms.
[0593] Any applicable data structures, file formats, and schemas may be derived from standards including but not limited to JavaScript Object Notation (JSON), Extensible Markup Language (XML), Yet Another Markup Language (YAML), Extensible Hypertext Markup Language (XHTML), Wireless Markup Language (WML), MessagePack, XML User Interface Language (XUL), or any other functionally similar representations alone or in combination. Alternatively, proprietary data structures, formats or schemas may be used, either exclusively or in combination with known or open standards.
[0594] Any pertinent data, files, and/or databases may be stored, retrieved, accessed, and/or transmitted in human-readable formats such as numeric, textual, graphic, or multimedia formats, further including various types of markup language, among other possible formats. Alternatively or in combination with the above formats, the data, files, and/or databases may be stored, retrieved, accessed, and/or transmitted in binary, encoded, compressed, and/or encrypted formats, or any other machine-readable formats.
[0595] Interfacing or interconnection among various systems and layers may employ any number of mechanisms, such as any number of protocols, programmatic frameworks, floorplans, or application programming interfaces (API), including but not limited to Document Object Model (DOM), Discovery Service (DS), NSUserDefaults, Web Services Description Language (WSDL), Message Exchange Pattern (MEP), Web Distributed Data Exchange (WDDX), Web Hypertext Application Technology Working Group (WHATWG) HTML5 Web Messaging, Representational State Transfer (REST or RESTful web services), Extensible User Interface Protocol (XUP), Simple Object Access Protocol (SOAP), XML Schema Definition (XSD), XML Remote Procedure Call (XML- RPC), or any other mechanisms, open or proprietary, that may achieve similar functionality and results.
[0596] Such interfacing or interconnection may also make use of uniform resource identifiers (URI), which may further include uniform resource locators (URL) or uniform resource names (URN). Other forms of uniform and/or unique identifiers, locators, or names may be used, either exclusively or in combination with forms such as those set forth above.
[0597] Any of the above protocols or APIs may interface with or be implemented in any programming language, procedural, functional, or object-oriented, and may be compiled or interpreted. Non-limiting examples include C, C++, C#, Objective-C, Java, Scala, Clojure, Elixir, Swift, Go, Perl, PHP, Python, Ruby, JavaScript, WebAssembly, or virtually any other language, with any other libraries or schemas, in any kind of framework, runtime environment, virtual machine, interpreter, stack, engine, or similar mechanism, including but not limited to Node.js, V8, Knockout, j Query, Dojo, Dijit, 0penUI5, AngularJS, Expressjs, Backbone js, Ember.js, DHTMLX, Vue, React, Electron, and so on, among many other non-limiting examples.
[0598] In some aspects, a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon may also be referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 400, main memory 408, secondary memory 410, and removable storage units 418 and 422, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (such as computer system 400), may cause such data processing devices to operate as described herein.
[0599] Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use aspects of this disclosure using data processing devices, computer systems and/or computer architectures other than that shown in FIG. 4. In particular, aspects may operate with software, hardware, and/or operating system implementations other than those described herein.
Methods for Conducting in situ Short Read Sequencing
[0600] In the methods described herein, the RNA is not extracted from the cellular sample and sequencing information does not need to be tracked and mapped back to an image of the cellular sample. Rather, RNA is retained inside the cellular sample to permit direct imaging of the spatial location of target RNAs within the cells. Additionally, RNA within the cellular sample is not fragmented and enrichment of target RNA is not necessary. Use of target-specific and/or random-sequence reverse transcription primers enables detection of both poly-A and non-poly-A RNAs in either uni-plex or multi-plex modes.
[0601] In some embodiments, the methods comprise repeatedly conducting a short number of sequencing cycles of the same region of the template molecules (e.g., concatemer molecules). By conducting reiterative short sequencing cycles, the RNA content of the cellular sample can be discovered. Compared to long read sequencing workflows, the reiterative short sequencing cycles described herein use a reduced amount of sequencing reagents which reduces cost and saves time. Methods for conducting reiterative short sequencing cycles has many uses including but not limited to detecting specific RNAs of interest, mutant RNA sequences, splice variants, and their abundance levels thereof.
[0602] The concatemers carry tandem repeat units of a cDNA-of-interest, the universal sequencing primer binding site, and the target barcode sequence. The concatemers are sequenced inside the cellular sample where a short number of sequencing cycles are conducted for each round and multiple rounds of short read sequencing is conducted. The full length of the target barcode and cDNA region are not sequenced. Instead, at least a portion of the target barcode region is reiteratively sequenced. In some embodiments, it is not necessary to sequence the cDNA region. In some embodiments, the target barcode and a portion of the cDNA region are reiteratively sequenced. It is not necessary to sequence the entire length of the cDNA region. It is not necessary to assemble the sequencing reads or to obtain a full length sequence of the cDNAs-of-interest. The redundant sequencing information obtained from the short sequencing reads obviates the need to sequence the complementary strand of the concatemer. Thus pairwise sequencing is not necessary.
[0603] Additionally, a short portion of the cDNA region in the concatemer is resequenced at least once (e.g., reiterative sequencing) from the same start position to generate overlapping sequencing reads that can be aligned to a reference sequence. For example, the same portion of the concatemer molecule can be sequenced at least two, three, four, five, or up to 50 times. The start sequencing site can be any location of the concatemer and is dictated by the sequencing primers which are designed to anneal to a selected position within the concatemer. The reiterative short sequencing reads increase the redundancy of sequencing information for individual bases in the cDNA region. Reiteratively sequencing one strand of the concatemer template molecule provides enough base coverage to reveal the presence of target RNAs in the cellular sample so that pairwise sequencing of the complementary strand is not necessary.
[0604] A concatemer template molecule includes multiple sequencing primer binding sites along the same concatemer molecule which can be used to generate multiple usable sequencing reads for increased sequencing depth. Together, reiteratively sequencing one strand of the concatemer templates increases sequencing base coverage and sequencing depth compared to sequencing a one-copy template molecule. [0605] The methods described herein can be conducted in uni-plex or multi-plex modes. Two or more different target RNAs can be detected and imaged simultaneously inside a cellular sample using different reverse transcription primers, different target-specific padlock probes, and universal sequencing primers. For example, the presence of a housekeeping RNA and at least one target RNA in a cellular sample can be simultaneously detected and imaged using any of the reiterative short read sequencing methods described herein.
[0606] The present disclosure provides methods for detecting in situ at least two different target RNA molecules in a cellular sample comprising step (a): providing a cellular sample harboring a plurality of RNA which comprises at least a first target RNA molecule and a second target RNA molecule. In some embodiments, the cellular sample is fixed and permeabilized. In some embodiments, the cellular sample harbors 2-25 different target RNA molecules, or harbors 25-50 different target RNA molecules, or harbors 50-75 different target RNA molecules, or harbors 75-100 different target RNA molecules. In some embodiments, the cellular sample harbors more than 100 different target RNA molecules, or more than 250 different target RNA molecules, or more than 500 different target molecules, or more than 1000 different target RNA molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target RNA molecules. In some embodiments, the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor. In some embodiments, the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, an FFPE cellular sample, or a sectioned FFPE cellular sample. In some embodiments, the cellular sample is deposited onto a solid support. In some embodiments, the cellular sample is deposited onto a solid support which is passivated with a coating that promotes cell adhesion. In some embodiments, the cellular sample is deposited on a support that lacks immobilized capture oligonucleotides. In some embodiments, the cellular sample is cultured before or after depositing the cellular sample onto the solid support. In some embodiments, the cellular sample is cultured prior to conducting step (b) which is described below. In some embodiments, the cellular sample comprises an expanded cellular sample that has been cultured in a simple or complex cell culture media. In some embodiments, the cellular sample is not cultured or expanded prior to conducting step (b). [0607] In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (b): generating inside the cellular sample a plurality of cDNA molecules which include at least a first target cDNA molecule that corresponds to the first target RNA molecule, and the plurality of cDNA molecules includes a second target cDNA molecule that corresponds to the second target RNA molecule. In some embodiments, the method comprises generating at least 2-10,000 different target cDNA molecules that correspond to 2-10,000 different target RNA molecules. In some embodiments, the generating of step (b) comprises contacting the plurality of RNA inside the cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample (e.g., FIG. 7).
[0608] In some embodiments, the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA, and comprises a second sub -population of targetspecific reverse transcription primers that hybridize selectively to the second target RNA. In some embodiments, the first and second sub-population of target-specific reverse transcription primers have the same sequence or different sequences.
[0609] In some embodiments, the entire length of the first sub-population of targetspecific reverse transcription primers hybridize to a first target RNA molecule. In some embodiments, the first sub-population of target-specific reverse transcription primers comprise tailed primers having a portion that hybridizes to a first target RNA molecule and a portion that does not hybridize to a first target RNA molecule. In some embodiments, the first sub-population of target-specific reverse transcription primers comprise at least a portion having a poly-T sequence. In some embodiments, the first subpopulation of target-specific reverse transcription primers comprise at least a portion having a random sequence and/or at least a portion having a target-specific sequence.
[0610] In some embodiments, the entire length of the second sub-population of targetspecific reverse transcription primers hybridize to a second target RNA molecule. In some embodiments, the second sub-population of target-specific reverse transcription primers comprise tailed primers having a portion that hybridizes to a second target RNA molecule and a portion that does not hybridize to a second target RNA molecule. In some embodiments, the second sub-population of target-specific reverse transcription primers comprise at least a portion having a poly-T sequence. In some embodiments, the second sub-population of target-specific reverse transcription primers comprise at least a portion having a random sequence and/or at least a portion having a target-specific sequence.
[0611] In some embodiments, a target RNA molecule that is hybridized to a cDNA molecule can be subjected to enzymatic degradation using a ribonuclease under a condition suitable for degrading RNA in an RNA/DNA duplex. In some embodiments, a target RNA molecule that is hybridized to a cDNA molecule is not subjected to enzymatic degradation.
[0612] In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (c): contacting the plurality of cDNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes. In some embodiments, the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes.
[0613] In an alternative embodiment, cDNA is not generated from RNA inside the cellular sample. In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprise contacting RNA inside the cell with a plurality of target-specific padlock probes and generating circularized padlock probes. In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprise step (c): contacting the plurality of RNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of targetspecific padlock probes. In some embodiments, the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different targetspecific padlock probes. In some embodiments, a target RNA molecule can be subjected to enzymatic degradation using a ribonuclease. In some embodiments, a target RNA molecule is not subjected to enzymatic degradation.
[0614] In some embodiments, individual padlock probes in the plurality of first targetspecific padlock probes comprise first and second terminal regions (e.g., first and second padlock binding arms), wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule (or the first target RNA molecule), and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule (or the first target RNA molecule). In some embodiments, the contacting of step (c) comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule (or the first target RNA molecule) to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions (e.g., FIG. 7, left). In some embodiments, the first target-specific padlock probe comprises a first target barcode sequence (target BC-1) that corresponds to and uniquely identifies the first target cDNA sequence (or the first target RNA sequence). In some embodiments, the first targetspecific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule (or the first target RNA sequence). In some embodiments, the first target-specific padlock probe comprises at least one universal adaptor sequence, such as for example a universal sequencing primer binding site (or a complementary sequence thereof). In some embodiments, the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof). In some embodiments, the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
[0615] In some embodiments, individual padlock probes in the plurality of second targetspecific padlock probes comprise first and second terminal regions (e.g., first and second padlock binding arms), wherein the first terminal region selectively hybridizes to a first region of the second target cDNA molecule (or the second target RNA molecule), and the second terminal region selectively hybridizes to a second region of the second target cDNA molecule (or the second target RNA molecule). In some embodiments, the contacting of step (c) comprises: hybridizing the first and second terminal regions of the second target-specific padlock probes to proximal positions on the second target cDNA molecule (or the second target RNA molecule) to form a circularized second targetspecific padlock probe having a nick or gap between the hybridized first and second terminal regions (e.g., FIG. 7, right). In some embodiments, the second target-specific padlock probe comprises a second target barcode sequence (target BC-2) that corresponds to and uniquely identifies the second target cDNA sequence (or the second target RNA sequence). In some embodiments, the second target-specific padlock probe comprises a second target barcode sequence that is located adjacent to one of the regions of the second target-specific padlock probe that selectively hybridizes to the second target cDNA molecule (or the second target RNA sequence). In some embodiments, the second targetspecific padlock probe comprises at least one universal adaptor sequence, such as for example a universal sequencing primer binding site (or a complementary sequence thereof). In some embodiments, the second target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof). In some embodiments, the second target-specific padlock probe comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof).
[0616] In some embodiments, the first target barcode sequence (target BC-1) and the second target barcode sequence (target BC-2) have different sequences and can be used to conduct multiplex RNA detection and sequencing. In some embodiments, the first target barcode sequence (target BC-1) and the second target barcode sequence (target BC-2) have the same sequence and can be used to conduct uni-plex RNA detection and sequencing.
[0617] In some embodiments, the first and second target-specific padlock probes comprise a universal sequencing primer binding site and a target barcode sequence that are adjacent to each other so that the target barcode region of the concatemer is sequenced first. The target barcode sequence can be any length, for example 3-15 bases, or 15-25 bases, or 25-40 bases, or longer.
[0618] In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (d): closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the cellular sample. In some embodiments, the closing the nick in the first and second circularized padlock probes comprises conducting an enzymatic ligation reaction. In some embodiments, closing the gap in the first and second circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the first or second target cDNA molecule (or the first or second RNA molecule) as a template, and conducting an enzymatic ligation reaction. In some embodiments, the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting one or more enzymatic reactions, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample.
[0619] In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (e): conducting a rolling circle amplification reaction inside the cellular sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least a first concatemer molecule that corresponds to a first target RNA molecule, and the plurality of concatemer molecules includes at least a second concatemer molecule that corresponds to a second target RNA molecule. In some embodiments, the first concatemer molecule comprises tandem repeat units, wherein a unit comprises a sequence that corresponds to the first target cDNA (or the first target RNA), the first target barcode sequence, and the universal sequencing primer binding site (or a complementary sequence thereof). In some embodiments, the second concatemer molecule comprises tandem repeat units, wherein a unit comprises a sequence that corresponds to the second target cDNA (or the second target RNA), the second target barcode sequence, and the universal sequencing primer binding site (or a complementary sequence thereof).
[0620] In some embodiments, the rolling circle amplification reaction of step (e) comprises contacting the covalently closed circularized padlock probes with an amplification primer (e.g., a universal rolling circle amplification primer), a stranddisplacing DNA polymerase, and a plurality of nucleotides, under a condition suitable for hybridizing individual amplification primers to a covalently closed padlock probe, and under a condition suitable for conducting primer extension using the covalently closed padlock probe as a template molecule to generate a nucleic acid concatemer. In some embodiments, the method comprises conducting a rolling circle amplification reaction inside the cellular sample using the at least 2-10,000 covalently closed circular padlock probes as template molecules, thereby generating at least 2-10,000 concatemer molecules that correspond to at least 2-10,000 target RNA molecules. In some embodiments, the plurality of concatemers that are generated inside the cellular sample collapse into a DNA nanoball having a shape and size that is more compact compared to a non-collapsed concatemer.
[0621] In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (f): sequencing the plurality of concatemer molecules inside the cellular sample, which comprises sequencing the first concatemer molecule by conducting no more than 2-30 sequencing cycles to generate a plurality of first sequencing read products, and sequencing the second concatemer molecule by conducting no more than 2-30 sequencing cycles to generate a plurality of second sequencing read products (FIG. 8). In some embodiments, the sequencing of step (f) comprises sequencing no more than 2-30 bases of the first concatemer molecules to generate a plurality of first sequencing read products, and which comprises sequencing no more than 2-30 bases of the second concatemer molecules to generate a plurality of second sequencing read products. In some embodiments, the method comprises sequencing the at least 2-10,000 concatemer molecules inside the cellular sample, which comprises conducting no more than 2-30 sequencing cycles on the 2-10,000 concatemer molecules to generate a plurality of sequencing read products.
[0622] In some embodiments, only the first target barcode region of the first concatemer molecules are sequenced (e.g., FIG. 8, top). In some embodiments, at least a portion or the full length of the first target barcode of the first concatemer molecules are sequenced (e.g., FIG. 8, top). In some embodiments, the first target barcode is sequenced and a portion of the first cDNA region (or the first RNA region) of the first concatemer molecules are sequenced. In some embodiments, at least a portion of the first cDNA region (or the first RNA region) of the first concatemer molecules are sequenced.
[0623] In some embodiments, only the second target barcode region of the second concatemer molecules are sequenced (e.g., FIG. 8, bottom). In some embodiments, at least a portion or the full length of the second target barcode of the second concatemer molecules are sequenced (e.g., FIG. 8, bottom). In some embodiments, the second target barcode is sequenced and a portion of the second cDNA region (or the second RNA region) of the second concatemer molecules are sequenced. In some embodiments, at least a portion of the second cDNA region (or the second RNA region) of the second concatemer molecules are sequenced.
[0624] In some embodiments, the sequencing of step (f) comprises contacting the plurality of concatemer molecules inside the cellular sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers. In some embodiments, the sequencing of step (f) further comprises conducting no more than 2-30 sequencing cycles to generate at least a first plurality of sequencing read products by sequencing at least the first target barcode region (Target BC-1), and optionally conducting no more than 2-30 sequencing cycles to generate at least a second plurality of sequencing read products by sequencing at least the second target barcode region (Target BC-2). In some embodiments, the nucleotide reagents comprise multivalent molecules, nucleotides and/or nucleotide analogs.
[0625] In some embodiments, the sequencing of step (f) comprises sequencing at least a portion of the first and second nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm2.
[0626] In some embodiments, in the sequencing of step (f), the plurality of first and second sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of first and second sequencing read products from the images obtained during the no more than 2-30 sequencing cycles.
[0627] In some embodiments, in the sequencing of step (f), the plurality of the first and second sequencing read products are detectable by imaging, and wherein the sequencing comprises simultaneously imaging the plurality of first and second detectable sequencing read products in the cellular sample (co-localization of the first and second sequencing read products).
[0628] In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (g): removing the plurality of first sequencing read products from the first concatemer molecules and retaining the first concatemer molecules in the cellular sample, and removing the plurality of second sequencing read products from the second concatemer molecules and retaining the second concatemer molecules in the cellular sample.
[0629] In some embodiments, methods for detecting at least two different target RNA molecules in a cellular sample further comprising step (h): reiteratively sequencing the plurality of concatemers by repeating steps (f) and (g) at least once, wherein the sequences of the plurality of first sequencing read products confirms the presence of the first target RNA molecules in the cellular sample, and wherein the sequences of the plurality of second sequencing read products confirms the presence of the second target RNA molecules in the cellular sample.
[0630] In some embodiments, reiteratively sequencing at least one region of the concatemer comprises repeating steps (f) - (g) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times.
[0631] In some embodiments, reiteratively sequencing at least one region of the concatemer comprises repeating steps (f) - (g) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times. An example of reiterative sequence is shown in a schematic in FIG. 9-12.
[0632] In some embodiments, e.g., in FIG. 9, the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), (iii) an insert sequence that corresponds to a given target cDNA, and (iv) a target barcode sequence that corresponds to the given target cDNA (BC). In some embodiments, universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include only the target barcode sequence. In some embodiments, the reiterative sequencing can be conducted up to 50 times. The sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., reference barcode sequence) to confirm the presence of the first target RNA molecules inside the cellular sample.
[0633] In some embodiments, e.g., in FIG. 10, the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), (iii) an insert sequence that corresponds to a given target cDNA, and (iv) a target barcode sequence that corresponds to the given target cDNA (BC). In some embodiments, universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include the target barcode sequence and a portion of the insert sequence. In some embodiments, the reiterative sequencing can be conducted up to 50 times. The sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., reference barcode sequence and the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
[0634] In some embodiments, e.g., in FIG. 11, the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq), (ii) universal compaction oligonucleotide binding site (CO), and (iii) an insert sequence that corresponds to a given target cDNA. In some embodiments, universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence. In some embodiments, the reiterative sequencing can be conducted up to 50 times. The sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
[0635] In some embodiments, e.g., in FIG. 12, the concatemer includes tandem repeat units where each unit comprises: (i) a universal sequencing primer binding site (Seq) and (ii) an insert sequence that corresponds to a given target cDNA. In some embodiments, universal sequencing primers (solid arrows) hybridize to the universal sequencing primer binding sites and no more than 30 sequencing cycles are conducted to generate a plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence. The plurality of first sequencing read products are removed from the concatemer, and the sequencing is once again repeated where no more than 30 sequencing cycles are conducted to generate another plurality of first sequencing read products (dashed arrows), where the first sequencing read products include a portion of the insert sequence. In some embodiments, the reiterative sequencing can be conducted up to 50 times. The sequences of all of the first sequencing read products can be determined and aligned with a first reference sequence (e.g., the insert sequence that corresponds to the target RNA) to confirm the presence of the first target RNA molecules inside the cellular sample.
[0636] In some embodiments, at least one concatemer is sequenced by conducting step (f) once (non-reiterative sequencing). In some embodiments, at least one concatemer is sequenced by conducting steps (f) - (g) once. In some embodiments, at least one concatemer is reiteratively sequenced by conducting steps (f) - (g) at least twice.
[0637] In some embodiments, the plurality of universal sequencing primers can be hybridized to concatemer template molecules with a hybridization reagent comprising an SSC buffer (e.g., 2X saline-sodium citrate) buffer with formamide (e.g., 10-20% formamide). The hybridization conditions comprise a temperature of about 20-30 °C, for about 10-60 minutes.
[0638] In some embodiments, the plurality of sequencing read products can be removed from the concatemers and the plurality of concatemers can be retained inside the cellular sample using a de-hybridization reagent comprising an SSC buffer (e.g., saline-sodium citrate) buffer, with or without formamide, at a temperature that promotes nucleic acid denaturation such as for example 30 - 90 °C.
[0639] In some embodiments, the plurality of nucleotide reagents of step (f) comprise a plurality of nucleotides that are detectably labeled or non-labeled. In some embodiments, individual nucleotides are linked to a detectable reporter moiety. In some embodiments, the detectable reporter moiety comprises a fluorophore. In some embodiments, the plurality of detectably labeled nucleotide analogs comprise a plurality of chain terminating nucleotides, where the chain terminating moiety is linked to the 3’ nucleotide sugar position to form a 3’ blocked nucleotide analog. In some embodiments, the chain terminating moiety can be removed to convert the 3’ blocked nucleotide analog to an extendible nucleotide having a 3’ OH group on the sugar. In some embodiments, the labeled nucleotide analogs are linked to a different fluorophore that corresponds to the nucleo-bases adenine, cytosine, guanine, thymine or uracil, where the different fluorophores emit a fluorescent signal during the sequencing of step (f). In some embodiments, a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a sequencing polymerase and a detectably labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the detectably labeled chain terminating nucleotide into the terminal end of the sequencing primer, (2) detecting and imaging the fluorescent signal and color emitted by the incorporated chain terminating nucleotide, and (3) removing the chain terminating moiety (e.g., unblocking) and the fluorophore from the incorporated nucleotide and retaining the concatemer/sequencing primer duplex. In some embodiments, no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products. In some embodiments, the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample. In some embodiments, the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
[0640] In some embodiments, the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length. In some embodiments, the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample. The sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules. In some embodiments, when the sequencing is conducted using different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U), then the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
[0641] In some embodiments, out-of-sync phasing and/or pre-phasing events can occur during synchronized sequencing reactions on clonally amplified template amplicons, where the sequencing reactions comprise polymerase-catalyzed sequencing reactions employing detectably labeled chain terminator nucleotides. In some embodiments, a sequencing reaction on one template molecule in the clonally-amplified template molecules moves ahead (e.g., pre-phasing) or fall behind (e.g., phasing) of the sequencing of the other template molecules within the clonally-amplified template molecules. During sequencing, a fluorescent signal is typically detected which corresponds to incorporation of a labeled chain terminator nucleotide. Thus, phasing and pre-phasing events can be detected and monitored using incorporation of a labeled chain terminator nucleotide.
[0642] In some embodiments, the plurality of nucleotide reagents of step (f) comprise a plurality of multivalent molecules each comprising a core attached to a plurality of nucleotide-arms, wherein the nucleotide-arms are attached to a nucleotide unit. In some embodiments, individual multivalent molecules are labeled with a detectably reporter moiety. In some embodiments, the detectable reporter moiety comprises a fluorophore. In some embodiments, the core of the multivalent molecule is labeled with a fluorophore, and wherein the fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm. In some embodiments, at least one of the nucleotide arms of the multivalent molecule comprises a linker and/or nucleotide base that is attached to a fluorophore, and wherein the fluorophore which is attached to a given nucleotide base corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm. In some embodiments, a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a first sequencing polymerase to form a complexed polymerase, (2) contacting the complexed polymerase with a detectably labeled multivalent molecule under a condition suitable for binding a complementary nucleotide unit of the multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex, and the condition is suitable for inhibiting incorporation of the complementary nucleotide unit into the terminal end of the sequencing primer, (3) detecting and imaging the fluorescent signal and color emitted by the bound detectably labeled multivalent molecule, (4) removing the first sequencing polymerase and the bound detectably labeled multivalent molecule, and retaining the concatemer/sequencing primer duplex, (5) contacting the retained concatemer/sequencing primer duplex with a second sequencing polymerase and a non-labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the non-labeled chain terminating nucleotide into the terminal end of the sequencing primer, and (6) removing the chain terminating moiety (e.g., unblocking) and retaining the concatemer/sequencing primer duplex. In some embodiments, no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products. In some embodiments, the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample. In some embodiments, the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample. In some embodiments, the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length. In some embodiments, the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample. The sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules. In some embodiments, individual cycle times can be achieved in less than 30 minutes. In some embodiments, the field of view (FOV) can exceed 1 mm2 and the cycle time for scanning large area (> 10 mm2) can be less than 5 minutes.
[0643] In some embodiments, when sequencing with detectably labeled multivalent molecules, step (2) in which multivalent-binding complexes are formed and step (3) in which the bound detectably labeled multivalent molecules are imaged and detected, the conditions are gentle compared to sequencing workflows that employ detectable labeled chain terminating nucleotides. For example, steps (2) and (3) can be conducted at a gentle temperature of about 35 - 45 °C, or about 39 - 42 °C. Steps (2) and (3) can be conducted at a gentle temperature which can help retain the compact size and shape of a DNA nanoball during multiple sequencing cycles (e.g., up to 30 cycles) which can improve FWHM (full width half maximum) of a spot image of the DNA nanoball inside a cellular sample. In some embodiments, the DNA nanoball does not unravel during multiple sequencing cycles. In some embodiments, the spot image of the DNA nanoball does not enlarge during multiple sequencing cycles. In some embodiments, the spot image of the DNA nanoball remains a discrete spot during multiple sequencing cycles. The spot image can be represented as a Gaussian spot and the size can be measured as a FWHM. A smaller spot size as indicated by a smaller FWHM typically correlates with an improved image of the spot. In some embodiments, the FWHM of a nanoball spot can be about 10 um or smaller.
[0644] In some embodiments, out-of-sync phasing and/or pre-phasing events can occur during synchronized polymerase-catalyzed sequencing reactions employing detectably labeled multivalent molecules. During sequencing, a fluorescent signal can be detected which corresponds to binding of complementary nucleotide unit of a multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex. Thus, phasing and pre-phasing events can be detected and monitored using binding of labeled multivalent molecules. In some embodiments, when conducting up to 30 sequencing cycles with detectably labeled multivalent molecules, the phasing and/or prephasing rate can be less than about 5%, or less than about 1%, or less than about 0.01%, or less than about 0.001%. By contrast, the phasing and/or pre-phasing rates for conducting up to 30 sequencing cycles using labeled chain terminator nucleotides can be about 5%. Methods for Conducting in situ RNA Batch Sequencing
[0645] The present disclosure provides methods for conducting in situ multiplex and multi-omics detection and identification using coded padlocks probes. The padlock probes are designed to selectively detect target RNA.
[0646] The RNA-specific padlock probes selectively hybridize to cDNA that corresponds to target RNA. The RNA-specific probes carry barcodes that uniquely identify the cDNA. In some embodiments, the RNA-specific padlock probes also carry batch-specific sequencing primer binding sites.
[0647] Both types of padlock probes are used to generate concatemers which having multiple copies of batch-specific sequencing binding sites and barcodes. The concatemers can collapse into DNA nanoballs having compact shape and size that produce increased signal intensity and color differentiation during sequencing.
[0648] For in situ sequencing, the limit of optical resolution impedes the ability to perform highly multiplex sequencing. The batch-specific sequencing primer binding sites on the padlock probes enables sequencing a desired subset (e.g., a batch) of the concatemers using selected batch-specific sequencing primers to reduce over-crowding signals and images. The use of batch-specific sequencing primers produces optical images that are intense and resolvable. By conducting multiple rounds of sequencing on the same cellular sample using different batch-specific sequencing primers enables multiplex sequencing to reveal numerous target RNAs.
[0649] The batch-specific sequencing methods described herein have many uses. For example, the number of spots that are imaged and associated with sequencing can be counted. The counted spots can be used as a measure of RNA levels in a cellular sample.
[0650] The present disclosure provides methods for detecting in situ at least two different target RNA molecules, comprising step (a): providing a cellular sample deposited on a solid support, wherein the cellular sample harbors (i) a first plurality of DNA amplicons (e.g., first concatemers) that correspond to a first target cDNA or RNA molecule, and (ii) a second plurality of DNA amplicons (e.g., second concatemers) that correspond to a second target cDNA or RNA molecule.
[0651] In some embodiments, the method further comprises step (b): sequencing the first plurality of DNA amplicons inside the cellular sample under a condition that inhibits sequencing the second plurality of DNA amplicons, wherein sequencing the first plurality of DNA amplicons inside the cellular sample comprises generating a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample. In some embodiments, the first amplicons can be reiteratively sequenced by conducting no more than 2-30 sequencing cycles, or can be reiteratively sequenced by conducting 1-250 sequencing cycles.
[0652] In some embodiments, the method further comprises step (c): sequencing the second plurality of DNA amplicons inside the cellular sample under a condition that inhibits sequencing the first plurality of DNA amplicons, wherein sequencing the second plurality of DNA amplicons inside the cellular sample comprises generating a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample. In some embodiments, the second amplicons can be reiteratively sequenced by conducting no more than 2-30 sequencing cycles, or can be reiteratively sequenced by conducting 1-250 sequencing cycles.
[0653] The present disclosure provides methods for detecting in situ at least two different target RNA molecules, comprising step (a): providing a cellular sample deposited on a solid support, wherein the cellular sample harbors a first plurality of target RNA and a second plurality of target RNA. In some embodiments, the first plurality of target RNA encode a first polypeptide. In some embodiments, the second plurality of target RNA encode a second polypeptide. In some embodiments, the cellular sample is fixed and permeabilized.
[0654] In some embodiments, the cellular sample harbors 2-25 different target RNA molecules, or harbors 25-50 different target RNA molecules, or harbors 50-75 different target RNA molecules, or harbors 75-100 different target RNA molecules. In some embodiments, the cellular sample harbors more than 100 different target RNA molecules, or more than 250 different target RNA molecules, or more than 500 different target molecules, or more than 1000 different target RNA molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target RNA molecules. In some embodiments, the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor. In some embodiments, the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, or an FFPE cellular sample. In some embodiments, the cellular sample is deposited onto a solid support. In some embodiments, the cellular sample is deposited onto a solid support which is passivated with a coating that promotes cell adhesion. In some embodiments, the cellular sample is deposited on a support that lacks immobilized capture oligonucleotides. In some embodiments, the cellular sample is cultured prior to conducting step (b) which is described below.
[0655] In some embodiments, the cellular sample harbors 2-25 different target polypeptide molecules, or harbors 25-50 different target polypeptide molecules, or harbors 50-75 different target polypeptide molecules, or harbors 75-100 different target polypeptide molecules. In some embodiments, the cellular sample harbors more than 100 different target polypeptide molecules, or more than 250 different target polypeptide molecules, or more than 500 different target molecules, or more than 1000 different target polypeptide molecules, or more. In some embodiments, the cellular sample harbors more than 10,000 different target polypeptide molecules. The target polypeptide molecules are encoded by the target RNA molecules.
[0656] In some embodiments, the methods comprise step (b): generating inside the cellular sample a plurality of cDNA by (i) generating at least a first plurality of target cDNA from the first plurality of target RNA, and (ii) generating at least a second plurality of target cDNA from the second plurality of target RNA (e.g., FIG. 13). In some embodiments, the first target cDNAs correspond to the first target RNA molecules. In some embodiments, the second target cDNAs correspond to the second target RNA molecules. In some embodiments, the method comprises generating at least 2-10,000 different target cDNA molecules that correspond to 2-10,000 different target RNA molecules. In some embodiments, the generating of step (b) comprises contacting the plurality of RNA inside the cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample. In some embodiments, the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA, and/or comprises a second sub-population of target-specific reverse transcription primers that hybridize selectively to the second target RNA. In some embodiments, the plurality of reverse transcription primers comprises a first sub-population of random-sequence reverse transcription primers that hybridize to the first target RNA, and/or comprises a second sub-population of random-sequence reverse transcription primers that hybridize to the second target RNA.
[0657] In some embodiments, .e.g., in FIG. 13, the first padlock probe comprises (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA, (ii) a first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof). The second padlock probe comprises (i) a second target barcode sequence (target BC-2) that uniquely identifies the second target RNA, (ii) a second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
[0658] In some embodiments, the methods comprise step (c): generating inside the cellular sample a plurality of DNA concatemers which correspond to the first and second plurality of target RNA molecules, comprising: (1) generating a first plurality of covalently closed circular padlock probes by contacting the first plurality of target cDNA with a first plurality of padlock probes, wherein the contacting is conducted under a condition suitable for hybridizing the first and second binding arms of the first padlock probes to proximal positions on their respective first target cDNA molecules to form a first plurality of circularized padlock probes each having a nick or gap between the hybridized first and second binding arms, wherein the first padlock probes include a (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA or cDNA, (ii) a first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), and (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof) (e.g., FIG. 13, left side); (2) enzymatically closing the nick or gap in the first plurality of covalently closed circular padlock probes to form a first plurality of covalently closed padlock probes; and (3) conducting rolling circle amplification inside the cellular sample using the first covalently closed circular padlock probes as template molecules, thereby generating a first plurality of concatemer molecules that correspond to the first plurality of target RNA or cDNA molecules. In some embodiments, the rolling circle amplification reaction can be conducted in the presence or absence of a plurality of compaction oligonucleotides. In some embodiments, the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes. In some embodiments, the first padlock probe further comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof). In some embodiments, the closing the nick in the first circularized padlock probes comprises conducting an enzymatic ligation reaction. In some embodiments, closing the gap in the first circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the first target cDNA molecule as a template, and conducting an enzymatic ligation reaction. In some embodiments, the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample. In some embodiments, each concatemer molecule in the first plurality comprises tandem repeat units, wherein a unit comprises the sequence of the first target cDNA and (i) the first target barcode sequence (target BC-1) that uniquely identifies the first target RNA, (ii) the first batch-specific sequencing primer binding site (Batch Seq-1) (or a complementary sequence thereof), and (iii) the universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof). In some embodiments, the unit further comprises the universal compaction oligonucleotide binding site (or a complementary sequence thereof).
[0659] In some embodiments, step (c) further comprises: generating inside the cellular sample a plurality of DNA concatemers which correspond to the second plurality of target RNA molecules, comprising: (1) generating a second plurality of covalently closed circular padlock probes by contacting the second plurality of target cDNA with a second plurality of padlock probes, wherein the contacting is conducted under a condition suitable for hybridizing the first and second binding arms of the second padlock probes to proximal positions on their respective second target cDNA molecules to form a second plurality of circularized padlock probes each having a nick or gap between the hybridized first and second binding arms, wherein the second padlock probes include a (i) a second barcode sequence (target BC-2) that uniquely identifies the second target cDNA or RNA, (ii) a second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof) wherein the sequence of the second batch-specific sequencing primer binding site differs from the sequence of the first batch-specific sequencing primer binding site, and (iii) the universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof) (e.g., FIG. 13, right side); (2) enzymatically closing the nick or gap in the second plurality of covalently closed circular padlock probes to form a second plurality of covalently closed padlock probes; and (3) conducting rolling circle amplification inside the cellular sample using the second covalently closed circular padlock probes as template molecules, thereby generating a second plurality of concatemer molecules that correspond to the second plurality of target RNA molecules. In some embodiments, the rolling circle amplification reaction can be conducted in the presence or absence of a plurality of compaction oligonucleotides. In some embodiments, the method comprises contacting the plurality of cDNA molecule in the cellular sample with at least 2-10,000 different target-specific padlock probes. In some embodiments, the second padlock probe further comprises a universal compaction oligonucleotide binding site (or a complementary sequence thereof). In some embodiments, the closing the nick in the second circularized padlock probes comprises conducting an enzymatic ligation reaction. In some embodiments, closing the gap in the second circularized padlock probes comprises conducting a polymerase-catalyzed fill-in reaction using the second target cDNA molecule as a template, and conducting an enzymatic ligation reaction. In some embodiments, the method comprises closing the nick or gap in at least 2-10,000 circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least 2-10,000 covalently closed circular padlock probes inside the cellular sample. In some embodiments, each concatemer molecule in the second plurality comprises tandem repeat units, wherein a unit comprises the sequence of the second target cDNA and (i) the second target barcode sequence (target BC-2) that uniquely identifies the second target cDNA or RNA, (ii) the second batch-specific sequencing primer binding site (Batch Seq-2) (or a complementary sequence thereof), and (iii) the universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof). In some embodiments, the unit further comprises the universal compaction oligonucleotide binding site (or a complementary sequence thereof).
[0660] In some embodiments, the methods further comprise step (d): sequencing the first plurality of concatemer molecules inside the cellular sample under a condition that inhibits sequencing the second plurality of concatemers (e.g., FIG. 14). In some embodiments, step (d) comprises sequencing the first plurality of concatemers inside the cellular sample comprises conducting no more than 2-30 sequencing cycles to generate a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample. In some embodiments, step (d) comprises sequencing the first plurality of concatemers inside the cellular sample comprises conducting 1-250 sequencing cycles to generate a plurality of first sequencing read products, wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm the presence of the first target RNA in the cellular sample.
[0661] In some embodiments, e.g., in FIG. 14, the first and second concatemers are subjected to a first sequencing workflow using first batch-specific sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents. The first concatemers undergo reiterative sequencing but the second concatemers do not. The first and second concatemers are subjected to a second sequencing workflow using second batch-specific sequencing primers, sequencing polymerases, and a plurality of nucleotide reagents. The second concatemers undergo reiterative sequencing but the first concatemers do not.
[0662] In some embodiments in step (d), in the first concatemer molecules, only the first target barcode region (target BC-1) is sequenced. In some embodiments, in the first concatemer molecules, at least a portion or the full length of the first target barcode (target BC-1) is sequenced. In some embodiments, in the first concatemer molecules, the first target barcode (target BC-1) is sequenced and a portion of the first cDNA region is sequenced.
[0663] In some embodiments, the sequencing the first concatemers of step (d) comprises step (1) contacting the first plurality of concatemer molecules inside the cellular sample with (i) a plurality of first batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of first batch-specific sequencing primers to their respective first batch-specific sequencing primer binding sites on the first concatemers. In some embodiments, the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the first concatemers as template molecules. [0664] In some embodiments, the sequencing of step (d) comprises sequencing at least a portion of the first nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm2.
[0665] In some embodiments, in the sequencing of step (d), the plurality of first sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of first sequencing read products from the images obtained during the no more than 2-30 sequencing cycles, or from the images obtained during the 1-250 sequence cycles.
[0666] In some embodiments, the methods further comprise step (e): removing the plurality of first sequencing read products from the first concatemer molecules and retaining the first concatemer molecules inside the cellular sample. In some embodiments, a 3’ blocking moiety can be added to the first sequencing read products to inhibit further sequencing reactions. For example, a nucleotide analog can be incorporated where the nucleotide analog inhibits incorporation of a subsequent nucleotide. Exemplary blocking nucleotide analogs include dideoxynucleotide or a nucleotide having a 2’ or 3’ chain terminating moiety.
[0667] In some embodiments, the methods further comprise step (f): reiteratively sequencing the plurality of first concatemers by repeating steps (d) and (e) at least once. In some embodiments, reiterative sequencing of step (f) is optional.
[0668] In some embodiments, the sequencing the first concatemers of step (f) comprises step (1) contacting the first plurality of concatemer molecules inside the cellular sample with (i) a plurality of first batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of first batch-specific sequencing primers to their respective first batch-specific sequencing primer binding sites on the first concatemers. In some embodiments, the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the first concatemers as template molecules. In some embodiments, the sequencing further comprises step (3) removing the first plurality of sequencing read products from the first concatemers and retaining the plurality of first concatemers inside the cellular sample. In some embodiments, the sequencing further comprises step (4) repeating steps (1) - (3) at least once (e.g., FIG. 14). In some embodiments, step (4) comprises repeating steps (1) - (3) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times. In some embodiments, step (4) comprises repeating steps (1) - (3) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
[0669] In some embodiments, the reiterative sequencing of the first concatemers of step (f) can be conducting using a sequencing-by-binding procedure, labeled and/or nonlabeled chain-terminating nucleotides, or multivalent molecules. Descriptions of these three sequencing methods is described below.
[0670] In some embodiments, the plurality of universal sequencing primers can be hybridized to concatemer template molecules with a hybridization reagent comprising an SSC buffer (e.g., 2X saline-sodium citrate) buffer with formamide (e.g., 10-20% formamide). The hybridization conditions comprise a temperature of about 20-30 °C, for about 10-60 minutes.
[0671] In some embodiments, the plurality of sequencing read products can be removed from the concatemers and the plurality of concatemers can be retained inside the cellular sample using a de-hybridization reagent comprising an SSC buffer (e.g., saline-sodium citrate) buffer, with or without formamide, at a temperature that promotes nucleic acid denaturation such as for example 30 - 90 °C.
[0672] In some embodiments, the methods further comprise step (g): sequencing the second plurality of concatemer molecules inside the cellular sample under a condition that inhibits sequencing the first plurality of concatemers (e.g., FIG. 14). In some embodiments, step (g) comprises sequencing the second plurality of concatemers inside the cellular sample comprises conducting no more than 2-30 sequencing cycles to generate a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample. In some embodiments, step (g) comprises sequencing the second plurality of concatemers inside the cellular sample comprises conducting 1-250 sequencing cycles to generate a plurality of second sequencing read products, wherein the sequences of the second sequencing read products are aligned with a second target reference sequence to confirm the presence of the second target RNA in the cellular sample.
[0673] In some embodiments in step (g), in the second concatemer molecules, only the second target barcode region (target BC-2) is sequenced. In some embodiments, in the second concatemer molecules, at least a portion or the full length of the second target barcode (target BC-2) is sequenced. In some embodiments, in the second concatemer molecules, the second target barcode (target BC-2) is sequenced and a portion of the second cDNA region is sequenced.
[0674] In some embodiments, the sequencing the second concatemers of step (g) comprises step (1) contacting the second plurality of concatemer molecules inside the cellular sample with (i) a plurality of second batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of second batch-specific sequencing primers to their respective second batch-specific sequencing primer binding sites on the second concatemers. In some embodiments, the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a second plurality of sequencing read products using the second concatemers as template molecules.
[0675] In some embodiments, the sequencing of step (g) comprises sequencing at least a portion of the second nucleic acid concatemers using an optical imaging system comprising a field-of-view (FOV) greater than 1.0 mm2.
[0676] In some embodiments, in the sequencing of step (g), the plurality of second sequencing read products are detectable by imaging, and wherein the sequencing comprises decoding the plurality of second sequencing read products from the images obtained during the no more than 2-30 sequencing cycles, or from the images obtained during the 1-250 sequencing cycles.
[0677] In some embodiments, the methods further comprise step (h): removing the plurality of second sequencing read products from the second concatemer molecules and retaining the second concatemer molecules inside the cellular sample. In some embodiments, a 3’ blocking moiety can be added to the second sequencing read products to inhibit further sequencing reactions. For example, a nucleotide analog can be incorporated where the nucleotide analog inhibits incorporation of a subsequent nucleotide. Exemplary blocking nucleotide analogs include dideoxynucleotide or a nucleotide having a 2’ or 3’ chain terminating moiety.
[0678] In some embodiments, the methods further comprise step (i): reiteratively sequencing the plurality of second concatemers by repeating steps (g) and (h) at least once. In some embodiments, reiterative sequencing of step (i) is optional.
[0679] In some embodiments, the sequencing the second concatemers of step (i) comprises step (1) contacting the second plurality of concatemer molecules inside the cellular sample with (i) a plurality of second batch-specific sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of second batch-specific sequencing primers to their respective second batch-specific sequencing primer binding sites on the second concatemers. In some embodiments, the sequencing further comprises step (2) conducting no more than 2-30 sequencing cycles to generate a first plurality of sequencing read products using the second concatemers as template molecules. In some embodiments, the sequencing further comprises step (3) removing the first plurality of sequencing read products from the second concatemers and retaining the plurality of second concatemers inside the cellular sample. In some embodiments, the sequencing further comprises step (4) repeating steps (1) - (3) at least once (e.g., FIG. 14). In some embodiments, step (4) comprises repeating steps (1) - (3) at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 6 times, at least 7 times, at least 8 times, at least 9 times, or at least 10 times. In some embodiments, step (4) comprises repeating steps (1) - (3) up to 10 times, up to 20 times, up to 30 time, up to 40 times, or up to 50 times.
[0680] In some embodiments, the reiterative sequencing of the second concatemers of step (i) can be conducting using a sequencing-by-binding procedure, labeled and/or nonlabeled chain-terminating nucleotides, or multivalent molecules. Descriptions of these three sequencing methods is described below.
[0681] In some embodiments, the plurality of nucleotide reagents of steps (d) and (g) comprise a plurality of nucleotides that are detectably labeled or non-labeled. In some embodiments, individual nucleotides are linked to a detectable reporter moiety. In some embodiments, the detectable reporter moiety comprises a fluorophore. In some embodiments, the plurality of detectably labeled nucleotide analogs comprise a plurality of chain terminating nucleotides, where the chain terminating moiety is linked to the 3’ nucleotide sugar position to form a 3’ blocked nucleotide analog. In some embodiments, the chain terminating moiety can be removed to convert the 3’ blocked nucleotide analog to an extendible nucleotide having a 3’ OH group on the sugar. In some embodiments, the labeled nucleotide analogs are linked to a different fluorophore that corresponds to the nucleo-bases adenine, cytosine, guanine, thymine or uracil, where the different fluorophores emit a fluorescent signal. In some embodiments, a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a sequencing polymerase and a detectably labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the detectably labeled chain terminating nucleotide into the terminal end of the sequencing primer, (2) detecting and imaging the fluorescent signal and color emitted by the incorporated chain terminating nucleotide, and (3) removing the chain terminating moiety (e.g., unblocking) and the fluorophore from the incorporated nucleotide and retaining the concatemer/sequencing primer duplex. In some embodiments, no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products. In some embodiments, the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample. In some embodiments, the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample.
[0682] In some embodiments, the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length. In some embodiments, the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample. The sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are co-located in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules. In some embodiments, when the sequencing is conducted using different fluorescently-labeled nucleotide reagents that correspond to different nucleo-bases (e.g., A, G, C, T/U), then the images can have different color fluorescent spots co-located in the same cellular sample at different sequencing cycles.
[0683] In some embodiments, out-of-sync phasing and/or pre-phasing events can occur during synchronized sequencing reactions on clonally amplified template amplicons, where the sequencing reactions comprise polymerase-catalyzed sequencing reactions employing detectably labeled chain terminator nucleotides. In some embodiments, a sequencing reaction on one template molecule in the clonally-amplified template molecules moves ahead (e.g., pre-phasing) or fall behind (e.g., phasing) of the sequencing of the other template molecules within the clonally-amplified template molecules. During sequencing, a fluorescent signal is typically detected which corresponds to incorporation of a labeled chain terminator nucleotide. Thus, phasing and pre-phasing events can be detected and monitored using incorporation of a labeled chain terminator nucleotide.
[0684] In some embodiments, the plurality of nucleotide reagents of steps (d) and (g) comprise a plurality of multivalent molecules each comprising a core attached to a plurality of nucleotide-arms, wherein the nucleotide-arms are attached to a nucleotide unit. In some embodiments, individual multivalent molecules are labeled with a detectably reporter moiety. In some embodiments, the detectable reporter moiety comprises a fluorophore. In some embodiments, the core of the multivalent molecule is labeled with a fluorophore, and wherein the fluorophore which is attached to a given core of the multivalent molecule corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm. In some embodiments, at least one of the nucleotide arms of the multivalent molecule comprises a linker and/or nucleotide base that is attached to a fluorophore, and wherein the fluorophore which is attached to a given nucleotide base corresponds to the nucleotide base (e.g., adenine, guanine, cytosine, thymine or uracil) of the nucleotide arm. In some embodiments, a sequencing cycle comprises (1) contacting the concatemer/sequencing primer duplex with a first sequencing polymerase to form a complexed polymerase, (2) contacting the complexed polymerase with a detectably labeled multivalent molecule under a condition suitable for binding a complementary nucleotide unit of the multivalent molecule to the complexed polymerase thereby forming a multivalent-binding complex, and the condition is suitable for inhibiting incorporation of the complementary nucleotide unit into the terminal end of the sequencing primer, (3) detecting and imaging the fluorescent signal and color emitted by the bound detectably labeled multivalent molecule, (4) removing the first sequencing polymerase and the bound detectably labeled multivalent molecule, and retaining the concatemer/sequencing primer duplex, (5) contacting the retained concatemer/sequencing primer duplex with a second sequencing polymerase and a non-labeled chain terminating nucleotide under a condition suitable for polymerase-catalyzed incorporation of the nonlabeled chain terminating nucleotide into the terminal end of the sequencing primer, and (6) removing the chain terminating moiety (e.g., unblocking) and retaining the concatemer/sequencing primer duplex. In some embodiments, no more than 2-30 sequencing cycles are conducted on the plurality of concatemers inside the cellular sample to generate a plurality of sequencing read products. In some embodiments, the sequence of the first sequencing read product can be determined and aligned with a first reference sequence to confirm the presence of the first target RNA molecules inside the cellular sample. In some embodiments, the sequence of the second sequencing read product can be determined and aligned with a second reference sequence to confirm the presence of the second target RNA molecules inside the cellular sample. In some embodiments, the sequences of the first and second sequencing read products can be aligned after each round of generating the first and second sequencing read products which are no more than 30 bases in length, or after generating a set of reiterative sequencing read products wherein the first and second sequencing read products which are no more than 30 bases in length. In some embodiments, the sequencing reactions are conducted on a sequencing apparatus having a detector that captures fluorescent signals from the sequencing reactions inside the cellular sample. The sequencing apparatus can be configured to relay the fluorescent signal data captured by the detector to a computer system that is programmed to display images of different fluorescent spots which are colocated in the cellular sample, where individual fluorescent spots correspond to different target RNA molecules. In some embodiments, individual cycle times can be achieved in less than 30 minutes. In some embodiments, the field of view (FOV) can exceed 1 mm2 and the cycle time for scanning large area (> 10 mm2) can be less than 5 minutes.
[0685] In any of the methods described herein, the plurality of RNA or cDNA inside the cellular sample can be amplified to generate amplicons of the RNA or cDNA where the amplicons comprise concatemers. In some embodiments, the plurality of RNA or cDNA molecules inside the cellular sample can be amplified by conducting a padlock probe circularization and rolling circle amplification workflow. In some embodiments, the methods comprise contacting the plurality of RNA or cDNA molecules inside the cellular sample with a plurality of padlock probes, including a first plurality of target-specific padlock probes that hybridize with first target RNA or cDNA molecules, and a second plurality of target-specific padlock probes that hybridize with second target RNA or cDNA molecules.
[0686] In some embodiments, the padlock probes comprise single-stranded oligonucleotides. In some embodiments, the padlock probes comprise DNA, RNA, or DNA and RNA. In some embodiments, individual padlock probes comprise an internal region between the first and second terminal regions, where the internal region comprises at least one universal adaptor sequence including a sample barcode sequence, an amplification primer binding site, a sequencing primer binding site, a compaction oligonucleotide binding site and/or a surface capture primer binding site (FIG. 6). In some embodiments, the padlock probes comprise at least one target barcode sequence that corresponds to a given target RNA or target cDNA to which the padlock probes binds. In some embodiments, the padlock probes comprise at least one unique identification sequence (e.g., unique molecular index (UMI)). In some embodiments, the padlock probes comprise at least one restriction enzyme recognition sequence.
[0687] In some embodiments, a padlock probe comprises a single-stranded nucleic acid molecule having two terminal regions (e.g., first and second binding arms) and an internal region. In some embodiments, the first terminal region of an individual padlock probe has a first target-specific sequence that selectively hybridizes to a first region of a target RNA or target cDNA molecule, and the second terminal region of the individual padlock probe has a second target-specific sequence that selectively hybridizes to a second region of the same target RNA or target cDNA molecule. In some embodiments, the internal region of a padlock comprises a target barcode sequence (e.g., Target BC-1 or Target BC-2, left and right schematics respectively) which corresponds to a given target RNA or target cDNA. In some embodiments, the target barcode sequence uniquely identifies the target RNA or target cDNA. In some embodiments, the internal region of a padlock comprises a universal primer binding site for a sequencing primer (or a complementary sequence thereof). In some embodiments, the internal region of a padlock comprises a universal primer binding site for a rolling circle amplification primer (or a complementary sequence thereof). In some embodiments, the internal region of a padlock comprises a universal binding site for a compaction oligonucleotide binding (or a complementary sequence thereof). In some embodiments, the internal region of a padlock probe includes a target barcode sequence and at least one universal primer binding site (e.g., for binding a sequencing primer, for binding a rolling circle amplification primer and/or for binding a compaction oligonucleotide) in any arrangement and orientation (FIG. 6, top and bottom).
[0688] In some embodiments, individual padlock probes comprise first and second terminal regions (e.g., first and second binding arms) that hybridize to portions of target RNA or target cDNA molecules to form a plurality of RNA-padlock probe complexes or a plurality of cDNA-padlock probe complexes, wherein individual complexes have the first and second terminal probe regions hybridized to proximal regions of an RNA or cDNA molecule to form a nick or gap between the first and second terminal probe ends. In some embodiments, the first terminal region of an individual padlock probe has a first target-specific sequence that selectively hybridizes to a first region of a target RNA or cDNA molecule, and the second terminal region of the individual padlock probe has a second target-specific sequence that selectively hybridizes to a second region of the same target RNA or cDNA molecule, where a nick or gap is formed between the hybridized first and second terminal regions, thereby circularizing the padlock probe (e.g., FIG. 7). [0689] As shown in FIG. 7, the first padlock probe comprises (i) a first target barcode sequence (target BC-1) that uniquely identifies the first target RNA or the first target cDNA, (ii) a first sequencing primer binding site (or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof). The second padlock probe comprises (i) a second target barcode sequence (target BC-2) that uniquely identifies the second target RNA or the second target cDNA, (ii) a second sequencing primer binding site(or a complementary sequence thereof), (iii) a universal binding site for an amplification primer (universal RCA) (or a complementary sequence thereof), and (iv) a universal binding site for a compaction oligonucleotide (or a complementary sequence thereof).
[0690] In some embodiments, the padlock probes comprise canonical nucleotides and/or nucleotide analogs. In some embodiments, the padlock probes are modified to confer resistance to nuclease degradation (e.g., ribonuclease degradation). For example, the padlock probes comprise at least one phosphorothioate diester bond at their 5’ ends which can render the padlock probes resistant to nuclease degradation. In some embodiments, the padlock probes comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends. In some embodiments, the padlock probes comprise at least one ribonucleotide and/or at least one 2’-O-methyl, 2’-O-methoxyethyl (MOE), 2’ fluoro-base nucleotide. In some embodiments, the padlock probes comprise phosphorylated 3’ ends. In some embodiments, the padlock probes comprise at least one locked nucleic acid (LNA) base. In some embodiments, the padlock probes comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase). [0691] In some embodiments, individual padlock probes in a set of padlock probes (e.g., a plurality of padlock probes) comprise first and second terminal regions that hybridize to the same target regions of the target RNA or cDNA molecules to form a plurality of RNA-padlock probe complexes or a plurality of cDNA-padlock probe complexes having the same RNA or cDNA sequence.
[0692] In some embodiments, a set of padlock probes (e.g., a plurality of padlock probes) comprise at least two sub-sets of padlock probes. In some embodiments, individual padlock probes in a first sub-set of padlock probes comprise first and second terminal regions that hybridize to the same target regions (e.g., a first target region) of the target RNA or cDNA molecules to form a first plurality of RNA-padlock probe complexes or a first plurality of cDNA-padlock probe complexes having the same RNA or cDNA sequence. In some embodiments, individual padlock probes in a second sub-set of padlock probes comprise first and second terminal regions that hybridize to the same target regions (e.g., a second target region) of the target RNA or cDNA molecules to form a second plurality of RNA-padlock probe complexes or a second plurality of cDNA- padlock probe complexes having the same cDNA sequence. In some embodiments, the first and second sub-sets of padlock probes hybridize to different target regions of the same target RNA or cDNA molecules. In some embodiments, the first and second subsets of padlock probes hybridize to different target regions of different target RNA or cDNA molecules. In some embodiments, the set of padlock probes comprise 2-10 subsets of padlock probes, or 10-25 sub-sets of padlock probes, or 25-50 sub-sets of padlock probes, or up to 100 sub-sets of padlock probes. In some embodiments, the set of padlock probes comprise at least 100 sub-sets of padlock probes, at least 500 sub-sets of padlock probes, at least 1000 sub-sets of padlock probes, at least 10,000 sub-sets of padlock probes, or more sub-sets of padlock probes.
[0693] In some embodiments, the nicks can be enzymatically ligated to generate covalently closed circular padlock probes. In some embodiments, the ligase enzyme can discriminate between matched and mis-matched hybridized ends to ensure target-specific hybridization. In some embodiments, the ligation reaction comprises use of a ligase enzyme, including a T3, T4, T7 or Taq DNA ligase enzyme.
[0694] In some embodiments, the size of the gap between the hybridized first and second terminal regions is 1-25 bases. The 3 ’OH end of hybridized padlock probe can serve as an initiation site for a polymerase-catalyzed fill-in reaction (e.g., gap fill-in reaction) using the target cDNA molecule (or the target RNA molecule) as a template. After the fill-in reaction, the remaining nick can be enzymatically ligated to generate covalently closed circular padlock probes.
[0695] In some embodiments, the gap-filling reaction comprises contacting the circularized padlock probe with a DNA polymerase and a plurality of nucleotides. In some embodiments, the DNA polymerase comprises E. coli DNA polymerase I, Klenow fragment of E. coli DNA polymerase I, T7 DNA polymerase, or T4 DNA polymerase. In some embodiments, the ligase enzyme can discriminate between matched and mismatched hybridized ends to ensure target-specific hybridization. In some embodiments, the ligation reaction comprises use of a ligase enzyme, including a T3, T4, T7 or Taq DNA ligase enzyme.
[0696] In any of the methods described herein, the plurality of covalently closed circular padlock probes can be subjected to a rolling circle amplification reaction to generate a plurality of concatemer molecules each having two or more tandem copies of a unit wherein the unit comprises a target sequence that corresponds to a target RNA molecules and any additional sequence(s) carried by the padlock probes including universal adaptor sequence(s), unique molecular index sequence(s) and/or restriction enzyme recognition sequence(s).
[0697] In some embodiments, the rolling circle amplification reaction comprises contacting the covalently closed circularized padlock probes with an amplification primer (e.g., a universal rolling circle amplification primer), a strand-displacing DNA polymerase, and a plurality of nucleotides, under a condition suitable for hybridizing individual amplification primers to a covalently closed padlock probe, and under a condition suitable for conducting primer extension using the covalently closed padlock probe as a template molecule to generate a nucleic acid concatemer. In some embodiments, the plurality of nucleotides in the rolling circle amplification reaction comprise any mixture of two or more of dATP, dGTP, dCTP, dTTP and/or dUTP. In some embodiments, any of the rolling circle amplification reactions described herein can be conducted in the presence or in the absence of a plurality of compaction oligonucleotides.
[0698] In some embodiments, when the rolling circle amplification reaction includes a plurality of nucleotide which includes dUTP, the resulting concatemer can be cross-linked to a cross-linking reactive group by treating the cellular sample with a succinimide ester (NHS), maleimide (Sulfo-SMCC), imidoester (DMP), carbodiimide (DCC, EDC) or phenyl azide. In some embodiments, polymerization of the cross-linking reactive group can be initiated with light or UV light. In some embodiments, the resulting concatemer can be cross-linked to a matrix by treating the cellular sample with a cross-linked agarose, cross-linked dextran or cross-linked polyethylene glycol (PEG), polyacrylamide, cellulose alginate or polyamide. In some embodiments, the PEG comprises a sulfo-NHS ester moiety at one or both ends, for example a PEGylated bis(sulfosuccinimidyl)suberate) (e.g., BS(PEG)9 from Thermo Fisher Scientific, catalog No. 21582).
[0699] In some embodiments, the rolling circle amplification reaction can be conducted at a constant temperature (e.g., isothermal) wherein the constant temperature is at room temperature to about 30 °C, or about 30 - 40 °C, or about 40 - 50 °C, or about 50 - 65 °C.
[0700] In some embodiments, the DNA polymerase having a strand displacing activity can be selected from a group consisting of phi29 DNA polymerase, large fragment of Bst DNA polymerase, large fragment of Bsu DNA polymerase, and Bea (exo-) DNA polymerase, Klenow fragment of E. coli DNA polymerase, T5 polymerase, M-MuLV reverse transcriptase, HIV viral reverse transcriptase, or Deep Vent DNA polymerase. In some embodiments, the phi29 DNA polymerase can be wild type phi29 DNA polymerase (e.g., MagniPhi from Expedeon), or variant EquiPhi29 DNA polymerase (e.g., from Thermo Fisher Scientific), and chimeric QualiPhi DNA polymerase (e.g., from 4basebio).
[0701] In some embodiments, the rolling circle amplification primers can be modified to increase resistance to nuclease degradation. In some embodiments, the rolling circle amplification primers comprise at least one phosphorothioate diester bond at their 5’ ends which can render the amplification primers resistant to exonuclease degradation. In some embodiments, the rolling circle amplification primers comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends. In some embodiments, the rolling circle amplification primers comprise at least one ribonucleotide and/or at least one 2’-O- methyl or 2’-O-methoxyethyl (MOE) nucleotide.
[0702] In some embodiments, the rolling circle amplification reaction can be conducted in the presence of a plurality of compaction oligonucleotides which, when hybridized to a concatemer molecule, compacts the size and/or shape of the concatemer to form a compact nanoball. In some embodiments, the compaction oligonucleotides comprise single stranded oligonucleotides having a first region at one end that hybridizes to a portion of a concatemer molecule and a second region at the other end that hybridizes to another portion of the same concatemer molecule, where hybridization of the compaction oligonucleotide to a given concatemer compacts the size and/or shape of the concatemer.
[0703] The compaction oligonucleotides include a 5’ region, an optional internal region (intervening region), and a 3’ region. The 5’ and 3’ regions of the compaction oligonucleotide can hybridize to any portions of the concatemer. The 5’ and 3’ regions of the compaction oligonucleotide can hybridize to different portions of the concatemer to pull together distal portions of the concatemer causing compaction of the concatemer to form a DNA nanoball. For example, the 5’ region of the compaction oligonucleotide is designed to hybridize to a first portion of the concatemer molecule (e.g., a universal compaction oligonucleotide binding site), and the 3’ region of the compaction oligonucleotide is designed to hybridized to a second portion of the concatemer molecule (e.g., a universal compaction oligonucleotide binding site). Inclusion of compaction oligonucleotides during RCA can promote formation of DNA nanoballs having tighter size and shape compared to concatemers generated in the absence of the compaction oligonucleotides. The compact and stable characteristics of the DNA nanoballs improves in situ sequencing accuracy by increasing signal intensity and the nanoballs retain their shape and size during multiple sequencing cycles.
[0704] In some embodiments, the compaction oligonucleotides comprise single stranded oligonucleotides comprising DNA, RNA, or a combination of DNA and RNA. The compaction oligonucleotides can be any length, including 20-150 nucleotides, or 30-100 nucleotides, or 40-80 nucleotides in length.
[0705] In some embodiments, the compaction oligonucleotides comprises a 5’ region and a 3’ region, and optionally an intervening region between the 5’ and 3’ regions. The intervening region can be any length, for example about 2-20 nucleotides in length. The intervening region comprises a homopolymer having consecutive identical bases (e.g., AAA, GGG, CCC, TTT or UUU). The intervening region comprises a non-homopolymer sequence.
[0706] The 5’ region of the compaction oligonucleotides can be wholly complementary or partially complementary along its length to a first portion of a concatemer molecule. The 3’ region of the compaction oligonucleotides can be wholly complementary or partially complementary along its length to a second portion of a concatemer molecule. The 5’ region of the compaction oligonucleotides can hybridize to a first universal sequence portion of a concatemer molecule. The 3’ region of the compaction oligonucleotides can hybridize to a second universal sequence portion of a concatemer molecule.
[0707] In some embodiments, the 5’ region of the compaction oligonucleotide can have the same sequence as the 3’ region. The 5’ region of the compaction oligonucleotide can have a sequence that is different from the 3’ region. In some embodiments, the 3’ region of the compaction oligonucleotide can have a sequence that is a reverse sequence of the 5’ region. In some embodiments, the 5’ region of the compaction oligonucleotide can have a sequence that is a reverse sequence of the 3’ region.
[0708] In some embodiments, the 3’ region of any of the compaction oligonucleotides can include an additional three bases at the terminal 3’ end which comprises 2’-O-methyl RNA bases (e.g., designated mUmUmU) or the terminal 3’ end lacks additional 2’-O- methyl RNA bases.
[0709] In some embodiments, the compaction oligonucleotides comprise one or more modified bases or linkages at their 5’ or 3’ ends to confer certain functionalities. In some embodiments, the compaction oligonucleotides comprise at least one phosphorothioate linkages at their 5’ and/or 3’ ends to confer exonuclease resistance. In some embodiments, at least one nucleotide at or near the 3’ end comprises a 2’ fluoro base which confers exonuclease resistance. In some embodiments, the 3’ end of the compaction oligonucleotides comprise at least one 2’-O-methyl RNA base which blocks polymerase-catalyzed extension. For example, the 3’ end of the compaction oligonucleotide comprises three bases comprising 2’-O-methyl RNA base (e.g., designated mUmUmU). In some embodiments, the compaction oligonucleotides comprise a 3’ inverted dT at their 3’ ends which blocks polymerase-catalyzed extension. In some embodiments, the compaction oligonucleotides comprise 3’ phosphorylation which blocks polymerase-catalyzed extension. In some embodiments, the internal region of the compaction oligonucleotides comprise at least one locked nucleic acid (LNA) which increases the thermal stability of duplexes formed by hybridizing a compaction oligonucleotide to a concatemer molecule. In some embodiments, the compaction oligonucleotides comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
[0710] In some embodiments, the compaction oligonucleotide comprises the sequence 5 ’ -C ATGT AATGC ACGT ACTTTC AGGGT AAAC ATGT AATGC ACGT ACTTT
[0711] CAGGGT-3’ (SEQ ID NO: 1). In some embodiments, the compaction oligonucleotides includes an additional three bases at the terminal 3’ end which comprises 2’-0-methyl RNA bases (e.g., designated mUmUmU) or the terminal 3’ end lacks additional 2’-O-methyl RNA bases.
[0712] In some embodiments, the compaction oligonucleotides can include at least one region having consecutive guanines. For example, the compaction oligonucleotides can include at least one region having 2, 3, 4, 5, 6 or more consecutive guanines. In some embodiments, the compaction oligonucleotides comprise four consecutive guanines which can form a guanine tetrad structure (see FIG. 25). The guanine tetrad structure can be stabilized via Hoogsteen hydrogen bonding. The guanine tetrad structure can be stabilized by a central cation including potassium, sodium, lithium, rubidium or cesium.
[0713] At least one compaction oligonucleotide can form a guanine tetrad (FIG. 25) and hybridize to the universal binding sequences in a concatemer which can cause the concatemer to fold to form an intramolecular G-quadruplex structure (FIG. 26). The concatemers can self-collapse to form compact nanoballs. Formation of the guanine tetrads and G-quadruplexes in the nanoballs may increase the stability of the nanoballs to retain their compact size and shape which can withstand changes in pH, temperature and/or repeated flows of reagents during sequencing inside the cellular sample.
[0714] In some embodiments, the plurality of compaction oligonucleotides in the rolling circle amplification reaction have the same sequence. Alternatively, the plurality of compaction oligonucleotides in the rolling circle amplification reaction comprise a mixture of two or more different populations of compaction oligonucleotides having different sequences.
[0715] In some embodiment, the immobilized concatemer template molecule can selfcollapse into a compact nucleic acid nanoball. The nanoballs can be imaged and a FWHM measurement can be obtained to give the shape/size of the nanoballs.
[0716] In some embodiments, inclusion of compaction oligonucleotides in the rolling circle amplification reaction can promote collapsing of a concatemer into a DNA nanoball. Conducting RCA with compaction oligonucleotides helps retain the compact size and shape of a DNA nanoball during multiple sequencing cycles which can improve FWHM (full width half maximum) of a spot image of the DNA nanoball inside a cellular sample. In some embodiments, the DNA nanoball does not unravel during multiple sequencing cycles. In some embodiments, the spot image of the DNA nanoball does not enlarge during multiple sequencing cycles. In some embodiments, the spot image of the DNA nanoball remains a discrete spot during multiple sequencing cycles. The spot image can be represented as a Gaussian spot and the size can be measured as a FWHM. A smaller spot size as indicated by a smaller FWHM typically correlates with an improved image of the spot. In some embodiments, the FWHM of a nanoball spot can be about 10 um or smaller.
[0717] The single-stranded concatemers collapse into compact DNA nanoballs, where each nanoball carries numerous tandem copies of a polynucleotide unit along their lengths, where the polynucleotide unit includes a sequence-of-interest (e.g., that corresponds to target RNA or target cDNA) and at least a universal sequencing primer binding site. Each polynucleotide unit can bind a sequencing primer, a sequencing polymerase and a detectably-labeled nucleotide reagent (e.g., detectably labeled multivalent molecules), to form a detectable sequencing complex (e.g., a detectable ternary complex). Each nanoball carries numerous detectable sequencing complexes. Thus, the compact nature of the nanoballs increases the local concentration of detectably- labeled nucleotide reagents that are used during the sequencing workflow which increases the signal intensity emitted from a nanoball to give a discrete detectable signal which can be imaged as a fluorescent spot inside the cellular sample. Each spot corresponds to a concatemer and each concatemer corresponds to a target RNA molecule in the cellular sample. Multiple spots can be detected and imaged simultaneously in the cellular sample. The DNA nanoballs having compact shape and size that produce increased signal intensity and color differentiation during sequencing.
[0718] In any of the methods described herein, the cellular sample comprises a whole cell, a plurality of whole cells, an intact tissue or an intact tumor. In some embodiments, the cellular sample comprises a fresh cellular sample, a freshly-frozen cellular sample, a sectioned cellular sample, or an FFPE cellular sample. In some embodiments, the cellular sample comprise one or more living cells or non-living cells.
[0719] In some embodiments, the cellular sample can be obtained from a virus, fungus, prokaryote or eukaryote. In some embodiments, the cellular sample can be obtained from an animal, insect or plant. In some embodiments, the cellular sample comprises one or more virally-infected cells.
[0720] In some embodiments, the cellular sample can be obtained from any organism including human, simian, ape, canine, feline, bovine, equine, murine, porcine, caprine, lupine, ranine, piscine, plant, insect or bacteria. [0721] In some embodiments, the cellular sample can be obtained from any organ including head, neck, brain, breast, ovary, cervix, colon, rectum, endometrium, gallbladder, intestines, bladder, prostate, testicles, liver, lung, kidney, esophagus, pancreas, thyroid, pituitary, thymus, skin, heart, larynx, or other organs.
[0722] In any of the methods described herein, the cellular sample harbors a plurality of RNA which include target RNA and non-target RNA. In some embodiments, cells typically produce RNA by gene expression which includes transcription of DNA (e.g., genomic DNA) into RNA molecules. The transcribed RNA can undergo splicing or may not be spliced. The transcribed RNA can be translated into a polypeptide (e.g., coding RNA), or do not undergo translation but can be processed into tRNA or rRNA (e.g., noncoding RNA).
[0723] In some embodiments, the plurality of RNA harbored by the cellular sample includes target and non-target RNA. In some embodiments, the plurality of RNA harbored by the cellular sample comprises wild type RNA, mutant RNA or splice variant RNA. In some embodiments, the plurality of RNA harbored by the cellular sample comprises pre-spliced RNA, partially spliced RNA, or fully spliced RNA. In some embodiments, the plurality of RNA harbored by the cellular sample comprises coding RNA, non-coding RNA, mRNA, tRNA, rRNA, microRNA (miRNA), mature microRNA, or immature microRNA. In some embodiments, the plurality of RNA harbored by the cellular sample comprises housekeeping RNA, cell-specific RNA, tissue-specific RNA or disease-specific RNA. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA expressed by one or more cells in response to a stimulus such as heat, light, a chemical or a drug. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA found in healthy cells or diseased cells. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA transcribed from transgenic DNA sequences that are introduced into the cellular sample using recombinant DNA procedures. For example, the RNA can be transcribed from a transgenic DNA sequence that is controlled by an inducible or constitutive promoter sequence. In some embodiments, the plurality of RNA harbored by the cellular sample comprises RNA that is transcribed from DNA sequences that are not transgenic.
[0724] In any of the methods described herein, the cellular sample can be cultured on the support. In some embodiments, the methods comprise culturing the cellular sample on the support under a condition suitable for expanding the cellular sample for 2-10 generations or more. The cultured cellular sample can generate a colony of cells. In some embodiments, the methods comprise culturing the cellular sample to confluence or nonconfluence. In some embodiments, the methods comprise culturing the cellular sample on the support in a simple or complex cell culture media. For example, the cell culture media comprises D-MEM high glucose (e.g., from Thermo Fisher Scientific, catalog No. 11965118), fetal bovine serum (e.g., 10% FBS; for example from Thermo Fisher Scientific, catalog No. A3160402), MEM non-essential amino acids (e.g., 0.1 mM MEM, for example from Thermo Fisher Scientific, catalog No. 11140050), L-glutamine (e.g., 6 mM L-glutamine, for example from Thermo Fisher Scientific, catalog No. A2916801), MEM sodium pyruvate (e.g., 1 mM sodium pyruvate, for example from Thermo Fisher Scientific, catalog No. 11360070), and an antibiotic (e.g., 1% penicillin-streptomycin- glutamine, for example from Thermo Fisher, catalog No. 10378016). In some embodiments, the methods comprise culturing the cellular sample at a humidity and temperature that is suitable for culturing the cell(s) on the support. Exemplary suitable conditions comprise approximately 37 °C with a humidified atmosphere of approximately 5-10% carbon dioxide in air. The cellular sample can be cultured with suitable aeration with oxygen and/or nitrogen.
[0725] In any of the methods described herein, the term “simple cell media” or related terms refers to a cell media that typically lacks ingredients to support cell growth and/or proliferation in culture. Simple cell media can be used for example to wash, suspend, or dilute the cellular sample. Simple cell media can be mixed with certain ingredients to prepare a cell media that can support cell growth and/or proliferation in culture. A simple cell media comprises any one or any combination of two or more of a buffer, a phosphate compound, a sodium compound, a potassium compound, a calcium compound, a magnesium compound and/or glucose. In some embodiments, the simple cell media comprises PBS (phosphate buffered saline), DPBS (Dulbecco’s phosphate-buffered saline), HBSS (Hank’s balanced salt solution), DMEM (Dulbecco’s Modified Eagle’s Medium), EMEM (Eagle’s Minimum Essential Medium), and/or EBSS. In some embodiments, the cellular sample can be placed in a simple cell media prior to or during the step of conducting any of the nucleic acid methods described herein.
[0726] In any of the methods described herein, the term “complex cell media” or related terms refers to a cell media that can be used to support cell growth and/or proliferation in culture without supplementation or additives. Complex cell media can include any combination of two or more of a buffering system (e.g., HEPES), inorganic salt(s), amino acid(s), protein(s), polypeptide(s), carbohydrate(s), fatty acid(s), lipid(s), purine(s) and their derivatives (e.g., hypoxanthine), pyrimidine(s) and their derivatives, and/or trace element(s). Complex cell media includes fluids obtained from a fluid or tissue extract. Complex cell media includes artificial cell media. In some embodiments, complex cell media can be a serum-containing media, for example complex cell media includes fluids such as fetal bovine serum, blood plasma, blood serum, lymph fluid, human placental cord serum and amniotic fluid. In some embodiments, complex cell media can be a serum-free media, which are typically (but not necessarily) defined cell culture media. In some embodiments, complex cell media can be a chemically-defined media which typically (but not necessarily) include recombinant polypeptides, and ultra-pure inorganic and/or organic compounds. In some embodiments, complex cell media can be a protein- free media which include for example MEM (minimal essential media) and RPMI-1640 (Roswell Park Memorial Institute). In some embodiments, the complex cell media comprises IMDM (Iscove’s Modified Dulbecco’s Medium. In some embodiments, the complex cell media comprises DMEM (Dulbecco’s Modified Eagle’s Medium). In some embodiments, the cellular sample can be placed in a complex cell media prior to or during the step of conducting any of the nucleic acid methods described herein.
[0727] In any of the methods described herein, the cellular sample comprises a fixed cellular sample. In some embodiments, the cellular sample can be treated with a fixation reagent (e.g., a fixing reagent) that preserves the cell and its contents to inhibit degradation and can inhibit cell lysis. For example, the fixation reagent can preserve RNA harbored by the cellular sample. In some embodiments, the fixation reagent inhibits loss of nucleic acids from the cellular sample.
[0728] In some embodiments, the fixation reagent can cross-link the RNA to prevent the RNA from escaping the cellular sample. In some embodiments, a cross-linking fixation reagent comprises any combination of an aldehyde, formaldehyde, paraformaldehyde, formalin, glutaraldehyde, imidoesters, N-hydroxysuccinimide esters (NHS) and/or glyoxal (a bifunctional aldehyde).
[0729] In some embodiments, the fixation reagent comprises at least one alcohol, including methanol or ethanol. In some embodiments, the fixation reagent comprises at least one ketone, including acetone. In some embodiments, the fixation reagent comprises acetic acid, glacial acetic acid and/or picric acid. In some embodiments, the fixation reagent comprises mercuric chloride. In some embodiments, the fixation reagent comprises a zinc salt comprising zinc sulphate or zinc chloride. In some embodiments, the fixation reagent can denature polypeptides.
[0730] In some embodiments, the fixation reagent comprises 4% w/v of paraformaldehyde to water/PBS. In some embodiments, the fixation reagent comprises 10% of 35% formaldehyde at a neutral pH. In some embodiments, the fixation reagent comprises 2% v/v of glutaraldehyde to water/PBS. In some embodiments, the fixation reagent comprises 25% of 37% formaldehyde solution, 70% picric acid and 5% acetic acid.
[0731] In some embodiments, the cellular sample can be fixed on the support with 4% paraformaldehyde for about 30-60 minutes and washed with PBS.
[0732] In some embodiments, the cellular sample can be stained, de-stained or unstained.
[0733] In any of the methods described herein, the cellular sample comprises a permeabilized cellular sample. In some embodiments, the methods comprise treating the cellular sample with a permeabilization reagent that alters the cell membrane to permit penetration of experimental reagents into the cells. For example, the permeabilization reagent removes membrane lipids from the cell membrane. In some embodiments, the cellular sample can be treated with a permeabilization reagent which comprises any combination of an organic solvent, detergent, chemical compound, cross-linking agent and/or enzyme. In some embodiments, the organic solvents comprise acetone, ethanol, and methanol. In some embodiments, the detergents comprise saponin, Triton X-100, Tween-20, sodium dodecyl sulfate (SDS), an N-lauroylsarcosine sodium salt solution, or a nonionic polyoxyethylene surfactant (e.g., NP40). In some embodiments, the crosslinking agent comprises paraformaldehyde. In some embodiments, the enzyme comprises trypsin, pepsin or protease (e.g. proteinase K). In some embodiments, the cells can be permeabilized using an alkaline condition, or an acidic condition with a protease enzyme. In some embodiments, the permeabilization reagent comprises water and/or PBS.
[0734] For example, the fixed cells can be permeabilized with 70% ethanol for about 30- 60 minutes, and the permeabilizing reagent can be exchanged with PBS-T (e.g., PBS with 0.05% Tween-20). In some embodiments, the cells can be post-fixed with 3% paraformaldehyde and 0.1% glutaraldehyde for about 30-60 minutes, and washed with PBS-T multiple times. [0735] In any of the methods described herein, the cellular sample is infused with a swellable polyelectrolyte hydrogel (U.S. patent No. 10,309,879 and Chen 2015 Science 347:543, the contents of these documents are incorporated by reference in their entireties). In some embodiments, a fixed and permeabilized cellular sample can be infused with sodium acrylate, acrylamide and a cross-linker N-N’- methylenebisacrylamide. In some embodiments, ammonium persulfate (APS) initiator and tetramethylethylenediamine (TEMED) accelerator were infused to achieve polymerization. In some embodiments, the cellular sample can be infused with proteinase K for proteolysis and incubated in a digestion buffer. In some embodiments, the gel inside the cellular sample can be swelled by addition of water.
[0736] In any of the methods described herein, the plurality of RNAs inside cellular sample can be converted to cDNA. In some embodiments, the methods comprise contacting the plurality of RNA inside the fixed and permeabilized cellular sample with (i) a plurality of reverse transcription primers, (ii) a plurality of reverse transcriptase enzymes, and (iii) a plurality of nucleotides, under a condition suitable for conducting a reverse transcription reaction to generate a plurality of cDNA molecules (e.g., a plurality of first strand cDNA molecules) in the cellular sample. In some embodiments, synthesis of second strand cDNA molecules is omitted. In some embodiments, the RNA inside the cellular sample is not converted into cDNA, where the RNA is hybridized to targetspecific padlock probes.
[0737] In some embodiments, the reverse transcriptase enzyme exhibits RNA-dependent DNA polymerase activity. In some embodiments, the reverse transcriptase enzyme comprises a reverse transcriptase enzyme from AMV (avian myeloblastosis virus), M- MuLV (moloney murine leukemia virus), or HIV (human immunodeficiency virus). In some embodiment, the reverse transcriptase enzyme comprises a recombinant enzyme that exhibits reduced RNase H activity, for example REVERTAID (e.g., from Thermo Fisher Scientific, catalog No. EP0441). In some embodiments, the reverse transcriptase can be a commercially-available enzyme, including MULTISCRIBE (e.g., from Thermo Fisher Scientific, catalog # 4311235), THERMOSCRIPT (e.g., from Thermo Fisher Scientific, catalog # 12236-014), or ARRAYSCRIPT (e.g., from Ambion, catalog No. AM2048). In some embodiments, the reverse transcriptase enzyme comprises SUPERSCRIPT II (e.g., catalog No. 18064014), SUPERSCRIPT III (e g., catalog No. 18080044), or SUPERSCRIPT IV enzymes (e.g., catalog No. 18090010 ) (all SUPERSCRIPT enzymes from Invitrogen). In some embodiments, the reverse transcription reaction can include an RNase inhibitor.
[0738] In some embodiments, the reverse transcription primers comprise a singlestranded oligonucleotide comprising DNA, RNA, or chimeric DNA/RNA. In some embodiments, the reverse transcription primers Any combination of adenine (A), thymine (T), guanine (G), cytosine (C), uracil (U) and/or inosine (I). In some embodiments, the reverse transcription primers can be any length, for example 5-25 bases, or 25-50 bases, or 50-75 bases, or 75-100 bases in length or longer. The reverse transcription primers each comprise a 5’ end and 3’ end. In some embodiments, the 3’ end of the reverse transcription primers can include a 3’ OH moiety which serves as a nucleotide polymerization initiation site in a polymerase-catalyzed primer extension reaction. In some embodiments, the 3’ end of the reverse transcription primers have a chain terminating moiety which blocks a polymerase-catalyzed primer extension reaction. The chain terminating moiety can be removed to convert the 3’ sugar position to an extendible 3 ’OH.
[0739] In some embodiments, the reverse transcription primers are modified to confer resistance to nuclease degradation (e.g., ribonuclease degradation). For example, the reverse transcription primers comprise at least one phosphorothioate diester bond at their 5’ ends which can render the reverse transcription primers resistant to nuclease degradation. In some embodiments, the reverse transcription primers comprise 2-5 or more consecutive phosphorothioate diester bonds at their 5’ ends. In some embodiments, the plurality of reverse transcription primers comprise at least one ribonucleotide and/or at least one 2’-O-methyl, 2’ -O-m ethoxy ethyl (MOE), 2’ fluoro-base nucleotide. In some embodiments, the reverse transcription primers comprise phosphorylated 3’ ends. In some embodiments, the reverse transcription primers comprise locked nucleic acid (LNA) bases. In some embodiments, the reverse transcription primers comprise a phosphorylated 5’ end (e.g., using a polynucleotide kinase).
[0740] In some embodiments, the entire length of a reverse transcription primer can hybridize to a portion of an RNA molecule. In some embodiments, individual reverse transcription primers comprise a 3’ region having a sequence that hybridizes to a portion of an RNA molecule and a 5’ region that carries a tail that does not hybridize to an RNA molecule. In some embodiments, the 5’ tail comprises a universal adaptor sequence including any one or any combination of two or more of a sample barcode sequence, an amplification primer binding site, a sequencing primer binding site, a compaction oligonucleotide binding site and/or a surface capture primer binding site. In some embodiments, the 5’ tail comprises a unique identification sequence (e.g., unique molecular index (UMI). In some embodiments, the 5’ tail comprises a restriction enzyme recognition sequence. In some embodiments, individual reverse transcription primers comprise at least a portion of the 3’ region having a homopolymer sequence, for example poly-A, poly-T, poly-C, poly-G or poly-U. In some embodiments, the reverse transcription primers can hybridize to any portion of an RNA molecule, including the 5’ or the 3’ end of the RNA molecule, or an internal portion of the RNA molecule.
[0741] In some embodiments, the plurality of reverse transcription primers comprises a first sub-population of target-specific reverse transcription primers that hybridize selectively to the first target RNA (e.g., targeted transcriptomics). In some embodiments, the plurality of reverse transcription primers further comprise a second sub-population of target-specific reverse transcription primers that hybridize selectively to the second target RNA. In some embodiments, the target-specific reverse transcription primers comprise a pre-determined sequence at the 3’ region which hybridizes to a target RNA molecule. In some embodiments, the pre-determined sequence portion of the reverse transcription primers can be 4-20 bases, or 20-40 bases, or 40-50 bases in length.
[0742] In some embodiments, the first sub-population of target-specific reverse transcription primers can selectively hybridize to an RNA transcribed in the cellular sample by a housekeeping gene. In some embodiments, selection of the housekeeping gene may be dependent upon the type of cellular sample to be used for the in situ methods described herein. Exemplary housekeeping genes include glyceraldehyde-3 -phosphate dehydrogenase (GAPDH), beta-actins (ACTB), tubulins, PPIA (peptidyl-prolyl cis-trans isomerase), NME4 (NME/NM23 nucleoside diphosphate kinase 4), SMARCAL1 (SWI/SNF related matrix associated actin dependent regulator of chromatin, subfamily A like 1), and POMK (protein-O-mannose kinase). The skilled artisan can design the first sub-population of target-specific reverse transcription primers to hybridize to RNA transcripts from any of the numerous housekeeping genes.
[0743] In some embodiments, the second sub-population of target-specific reverse transcription primers can selectively hybridize to an RNA transcribed from a gene that is expressed in the cellular sample being examined (e.g., a cell-specific or tissue-specific RNA). [0744] In some embodiments, the plurality of reverse transcription primers comprises a first sub-population of random-sequence reverse transcription primers that hybridize to the first target RNA (e.g., whole transcriptomics). In some embodiments, the plurality of reverse transcription primers further comprises a second sub-population of randomsequence reverse transcription primers that hybridize to the second target RNA. In some embodiments, the reverse transcription primers comprise a random and/or degenerate sequence at the 3’ region which hybridizes to an RNA molecule. In some embodiments, the random-sequence or the degenerate-sequence portion of the reverse transcription primers can be 4-20 bases, or 20-40 bases, or 40-50 bases in length.
Sequencing Polymerases
[0745] In any of the methods described herein, sequencing polymerases can be used for conducting sequencing reactions. In some embodiments, the sequencing polymerase(s) is/are capable of binding and incorporating a complementary nucleotide opposite a nucleotide in a concatemer template molecule. In some embodiments, the sequencing polymerase(s) is/are capable of binding a complementary nucleotide unit of a multivalent molecule opposite a nucleotide in a concatemer template molecule. In some embodiments, the plurality of sequencing polymerases comprise recombinant mutant polymerases.
[0746] Examples of suitable polymerases for use in sequencing with nucleotides and/or multivalent molecules include but are not limited to: Klenow DNA polymerase; Thermus aquaticus DNA polymerase I (Taq polymerase); KlenTaq polymerase; Candidatus altiarchaeales archaeon; Candidatus Hadarchaeum Yellowstonense; Hadesarchaea archaeon; Euryarchaeota archaeon; Thermoplasmata archaeon; Thermococcus polymerases such as Thermococcus litoralis, bacteriophage T7 DNA polymerase; human alpha, delta and epsilon DNA polymerases; bacteriophage polymerases such as T4, RB69 and phi29 bacteriophage DNA polymerases; Pyrococcus furiosus DNA polymerase (Pfu polymerase); Bacillus subtilis DNA polymerase III; E. coli DNA polymerase III alpha and epsilon; 9 degree N polymerase; reverse transcriptases such as HIV type M or O reverse transcriptases; avian myeloblastosis virus reverse transcriptase; Moloney Murine Leukemia Virus (MMLV) reverse transcriptase; or telomerase. Further non-limiting examples of DNA polymerases include those from various Archaea genera, such as, Aeropyrum, Archaeglobus, Desulfurococcus, Pyrobaculum, Pyrococcus, Pyrolobus, Pyrodictium, Staphylothermus, Stetteria, Sulfolobus, Thermococcus, and Vulcanisaeta and the like or variants thereof, including such polymerases as are known in the art such as 9 degrees N, VENT, DEEP VENT, THERMINATOR, Pfu, KOD, Pfx, Tgo and RB69 polymerases.
Sequencing-by-Binding
[0747] In any of the methods described herein, the sequencing comprises conducting sequencing-by-binding (SBB) reactions inside the cellular sample, where the cDNA amplicons are the concatemer molecules. In some embodiments, the sequencing-by- binding (SBB) procedure employs non-labeled chain-terminating nucleotides. In some embodiments, a cycle of sequencing-by-binding (SBB) comprises the steps of (a) sequentially contacting a primed concatemer (e.g., a concatemer annealed to a plurality of sequencing primers) with at least two separate mixtures under ternary complex stabilizing conditions, wherein the at least two separate mixtures each include a polymerase and a nucleotide, whereby the sequentially contacting results in the primed concatemer being contacted, under the ternary complex stabilizing conditions, with nucleotide cognates for first, second and third base type base types in the template; (b) examining the at least two separate mixtures to determine whether a ternary complex formed; and (c) identifying the next correct nucleotide for the primed concatemer, wherein the next correct nucleotide is identified as a cognate of the first, second or third base type if ternary complex is detected in step (b), and wherein the next correct nucleotide is imputed to be a nucleotide cognate of a fourth base type based on the absence of a ternary complex in step (b); (d) adding a next correct nucleotide to the primer of the primed concatemer after step (b), thereby producing an extended primer; and (e) repeating steps (a) through (d) at least once on the primed concatemer that comprises the extended primer. Exemplary sequencing-by- binding methods are described in U.S. patent Nos. 10,246,744 and 10,731,141 (where the contents of both patents are hereby incorporated by reference in their entireties).
Nucleotides and Chain-Terminating Nucleotides
[0748] In any of the methods described herein, any of the sequencing methods described herein can employ at least one nucleotide. The nucleotides comprise a base, sugar and at least one phosphate group. In some embodiments, at least one nucleotide in the plurality comprises an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and one or more phosphate groups (e.g., 1-10 phosphate groups). The plurality of nucleotides can comprise at least one type of nucleotide selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP. The plurality of nucleotides can comprise at a mixture of any combination of two or more types of nucleotides selected from a group consisting of dATP, dGTP, dCTP, dTTP and/or dUTP. In some embodiments, at least one nucleotide in the plurality is not a nucleotide analog. In some embodiments, at least one nucleotide in the plurality comprises a nucleotide analog.
[0749] In some embodiments, in any of the methods for sequencing described herein, at least one nucleotide in the plurality of nucleotides comprise a chain of one, two or three phosphorus atoms where the chain is typically attached to the 5’ carbon of the sugar moiety via an ester or phosphoramide linkage. In some embodiments, at least one nucleotide in the plurality is an analog having a phosphorus chain in which the phosphorus atoms are linked together with intervening O, S, NH, methylene or ethylene. In some embodiments, the phosphorus atoms in the chain include substituted side groups including O, S or BH3. In some embodiments, the chain includes phosphate groups substituted with analogs including phosphoramidate, phosphorothioate, phosphordithioate, and O-methylphosphoroamidite groups.
[0750] In some embodiments, in any of the methods for sequencing described herein, at least one nucleotide in the plurality of nucleotides comprises a terminator nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position. In some embodiments, the chain terminating moiety can inhibit polymerase-catalyzed incorporation of a subsequent nucleotide unit or free nucleotide in a nascent strand during a primer extension reaction. In some embodiments, the chain terminating moiety is attached to the 3’ sugar hydroxyl position where the sugar comprises a ribose or deoxyribose sugar moiety. In some embodiments, the chain terminating moiety is removable/cleavable from the 3’ sugar hydroxyl position to generate a nucleotide having a 3 ’OH sugar group which is extendible with a subsequent nucleotide in a polymerase-catalyzed nucleotide incorporation reaction. In some embodiments, the chain terminating moiety comprises an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group. In some embodiments, the chain terminating moiety is cleavable/removable from the nucleotide, for example by reacting the chain terminating moiety with a chemical agent, pH change, light or heat. In some embodiments, the chain terminating moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3-Dichloro-5,6-dicyano-l,4-benzo-quinone (DDQ). In some embodiments, the chain terminating moieties aryl and benzyl are cleavable with H2 Pd/C. In some embodiments, the chain terminating moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including betamercaptoethanol or dithiothritol (DTT). In some embodiments, the chain terminating moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH). In some embodiments, the chain terminating moieties urea and silyl are cleavable with tetrabutyl ammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
[0751] In some embodiments, in any of the methods for sequencing described herein, at least one nucleotide in the plurality of nucleotides comprises a terminator nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position. In some embodiments, the chain terminating moiety comprises an azide, azido or azidomethyl group. In some embodiments, the chain terminating moiety comprises a 3’-O-azido or 3’-O-azidomethyl group. In some embodiments, the chain terminating moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound. In some embodiments, the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety. In some embodiments, the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP). In some embodiments, the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
[0752] In some embodiments, in any of the methods for sequencing described herein, the nucleotide comprises a chain terminating moiety which is selected from a group consisting of 3’-deoxy nucleotides, 2’,3’-dideoxynucleotides, 3’-methyl, 3’-azido, 3’- azidom ethyl, 3’-O-azidoalkyl, 3’-O-ethynyl, 3’-O-aminoalkyl, 3’-O-fluoroalkyl, 3’- fluorom ethyl, 3 ’-difluoromethyl, 3 ’-trifluoromethyl, 3 ’-sulfonyl, 3 ’-malonyl, 3 ’-amino, 3’-O-amino, 3’-sulfhydral, 3 ’-aminomethyl, 3’-ethyl, 3’butyl, 3" -tert butyl, 3’- Fluorenylmethyloxycarbonyl, 3’ tert-Butyloxycarbonyl, 3’-O-alkyl hydroxylamino group, 3’-phosphorothioate, and 3-O-benzyl, or derivatives thereof. [0753] In some embodiments, in any of the methods for sequencing described herein, the plurality of nucleotides comprises a plurality of nucleotides labeled with detectable reporter moiety. The detectable reporter moiety comprises a fluorophore. In some embodiments, the fluorophore is attached to the nucleotide base. In some embodiments, the fluorophore is attached to the nucleotide base with a linker which is cleavable/removable from the base. In some embodiments, at least one of the nucleotides in the plurality is not labeled with a detectable reporter moiety. In some embodiments, a particular detectable reporter moiety (e.g., fluorophore) that is attached to the nucleotide can correspond to the nucleotide base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) to permit detection and identification of the nucleotide base.
[0754] In some embodiments, in any of the methods for sequencing nucleic acid molecules described herein, the cleavable linker on the nucleotide base comprises a cleavable moiety comprising an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group. In some embodiments, the cleavable linker on the base is cleavable/removable from the base by reacting the cleavable moiety with a chemical agent, pH change, light or heat. In some embodiments, the cleavable moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3-Dichloro-5,6-dicyano-l,4-benzo-quinone (DDQ). In some embodiments, the cleavable moieties aryl and benzyl are cleavable with H2 Pd/C. In some embodiments, the cleavable moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including beta-mercaptoethanol or dithiothritol (DTT). In some embodiments, the cleavable moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH). In some embodiments, the cleavable moieties urea and silyl are cleavable with tetrabutylammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
[0755] In some embodiments, in any of the methods for sequencing described herein, the cleavable linker on the nucleotide base comprises cleavable moiety including an azide, azido or azidomethyl group. In some embodiments, the cleavable moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound. In some embodiments, the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety. In some embodiments, the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP). In some embodiments, the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
[0756] In some embodiments, in any of the methods for sequencing described herein, the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the cleavable linker on the nucleotide base have the same or different cleavable moieties. In some embodiments, the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the detectable reporter moiety linked to the base are chemically cleavable/removable with the same chemical agent. In some embodiments, the chain terminating moiety (e.g., at the sugar 2’ and/or sugar 3’ position) and the detectable reporter moiety linked to the base are chemically cleavable/removable with different chemical agents.
Multivalent Molecules
[0757] In any of the methods described herein, the sequencing employs at least one multivalent molecule which comprises a plurality of nucleotide arms attached to a core and having any configuration including a starburst, helter skelter, or bottle brush configuration (e.g., FIG. 16). The multivalent molecule comprises: (1) a core; and (2) a plurality of nucleotide arms which comprise (i) a core attachment moiety, (ii) a spacer comprising a PEG moiety, (iii) a linker, and (iv) a nucleotide unit, wherein the core is attached to the plurality of nucleotide arms, wherein the spacer is attached to the linker, wherein the linker is attached to the nucleotide unit. In some embodiments, the nucleotide unit comprises a base, sugar and at least one phosphate group, and the linker is attached to the nucleotide unit through the base. In some embodiments, the linker comprises an aliphatic chain or an oligo ethylene glycol chain where both linker chains having 2-6 subunits. In some embodiments, the linker also includes an aromatic moiety. An exemplary nucleotide arm is shown in FIG. 20. Exemplary multivalent molecules are shown in FIGS. 16-19. An exemplary spacer is shown in FIG. 21 (top) and exemplary linkers are shown in FIG. 21 (bottom) and FIG. 22. Exemplary nucleotides attached to a linker are shown in FIGS. 23 A-23D. An exemplary biotinylated nucleotide arm is shown in FIG. 24.
[0758] In some embodiments, a multivalent molecule comprises a core attached to multiple nucleotide arms, and wherein the multiple nucleotide arms have the same type of nucleotide unit which is selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP.
[0759] In some embodiments, a multivalent molecule comprises a core attached to multiple nucleotide arms, where each arm includes a nucleotide unit. The nucleotide unit comprises an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and one or more phosphate groups (e.g., 1-10 phosphate groups). The plurality of multivalent molecules can comprise one type multivalent molecule having one type of nucleotide unit selected from a group consisting of dATP, dGTP, dCTP, dTTP and dUTP. The plurality of multivalent molecules can comprise at a mixture of any combination of two or more types of multivalent molecules, where individual multivalent molecules in the mixture comprise nucleotide units selected from a group consisting of dATP, dGTP, dCTP, dTTP and/or dUTP.
[0760] In some embodiments, the nucleotide unit comprises a chain of one, two or three phosphorus atoms where the chain is typically attached to the 5’ carbon of the sugar moiety via an ester or phosphoramide linkage. In some embodiments, at least one nucleotide unit is a nucleotide analog having a phosphorus chain in which the phosphorus atoms are linked together with intervening O, S, NH, methylene or ethylene. In some embodiments, the phosphorus atoms in the chain include substituted side groups including O, S or BH3. In some embodiments, the chain includes phosphate groups substituted with analogs including phosphoramidate, phosphorothioate, phosphordithioate, and O-methylphosphoroamidite groups.
[0761] In some embodiments, the multivalent molecule comprises a core attached to multiple nucleotide arms, and wherein individual nucleotide arms comprise a nucleotide unit which is a nucleotide analog having a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position. In some embodiments, the nucleotide unit comprises a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position. In some embodiments, the chain terminating moiety can inhibit polymerase- catalyzed incorporation of a subsequent nucleotide unit or free nucleotide in a nascent strand during a primer extension reaction. In some embodiments, the chain terminating moiety is attached to the 3’ sugar hydroxyl position where the sugar comprises a ribose or deoxyribose sugar moiety. In some embodiments, the chain terminating moiety is removable/cleavable from the 3’ sugar hydroxyl position to generate a nucleotide having a 3 ’OH sugar group which is extendible with a subsequent nucleotide in a polymerase- catalyzed nucleotide incorporation reaction. In some embodiments, the chain terminating moiety comprises an alkyl group, alkenyl group, alkynyl group, allyl group, aryl group, benzyl group, azide group, amine group, amide group, keto group, isocyanate group, phosphate group, thio group, disulfide group, carbonate group, urea group, or silyl group. In some embodiments, the chain terminating moiety is cleavable/removable from the nucleotide unit, for example by reacting the chain terminating moiety with a chemical agent, pH change, light or heat. In some embodiments, the chain terminating moieties alkyl, alkenyl, alkynyl and allyl are cleavable with tetrakis(triphenylphosphine)palladium(0) (Pd(PPhs)4) with piperidine, or with 2,3- Dichl oro-5, 6-di cyano- 1,4-benzo-quinone (DDQ). In some embodiments, the chain terminating moieties aryl and benzyl are cleavable with H2 Pd/C. In some embodiments, the chain terminating moieties amine, amide, keto, isocyanate, phosphate, thio, disulfide are cleavable with phosphine or with a thiol group including beta-mercaptoethanol or dithiothritol (DTT). In some embodiments, the chain terminating moiety carbonate is cleavable with potassium carbonate (K2CO3) in MeOH, with triethylamine in pyridine, or with Zn in acetic acid (AcOH). In some embodiments, the chain terminating moieties urea and silyl are cleavable with tetrabutyl ammonium fluoride, pyridine-HF, with ammonium fluoride, or with triethylamine trihydrofluoride.
[0762] In some embodiments, the nucleotide unit comprises a chain terminating moiety (e.g., blocking moiety) at the sugar 2’ position, at the sugar 3’ position, or at the sugar 2’ and 3’ position. In some embodiments, the chain terminating moiety comprises an azide, azido or azidomethyl group. In some embodiments, the chain terminating moiety comprises a 3’-O-azido or 3 ’-0 -azidomethyl group. In some embodiments, the chain terminating moieties azide, azido and azidomethyl group are cleavable/removable with a phosphine compound. In some embodiments, the phosphine compound comprises a derivatized tri-alkyl phosphine moiety or a derivatized tri-aryl phosphine moiety. In some embodiments, the phosphine compound comprises Tris(2-carboxyethyl)phosphine (TCEP) or bis-sulfo triphenyl phosphine (BS-TPP) or Tri(hydroxyproyl)phosphine (THPP). In some embodiments, the cleaving agent comprises 4-dimethylaminopyridine (4-DMAP).
[0763] In some embodiments, the nucleotide unit comprising a chain terminating moiety which is selected from a group consisting of 3’-deoxy nucleotides, 2’,3’- dideoxynucleotides, 3 ’-methyl, 3 ’-azido, 3 ’-azidomethyl, 3’-O-azidoalkyl, 3’-O-ethynyl, 3’-O-aminoalkyl, 3’-O-fluoroalkyl, 3 ’-fluoromethyl, 3 ’-difluoromethyl, 3’- trifluoromethyl, 3 ’-sulfonyl, 3 ’-malonyl, 3 ’-amino, 3’-O-amino, 3’-sulfhydral, 3’- aminomethyl, 3’-ethyl, 3’butyl, 3" -tert butyl, 3’- Fluorenylmethyloxy carbonyl, 3’ tertButyloxycarbonyl, 3’-O-alkyl hydroxylamino group, 3’-phosphorothioate, and 3-0- benzyl, or derivatives thereof.
[0764] In some embodiments, the multivalent molecule comprises a core attached to multiple nucleotide arms, wherein the nucleotide arms comprise a spacer, linker and nucleotide unit, and wherein the core, linker and/or nucleotide unit is labeled with detectable reporter moiety. In some embodiments, the detectable reporter moiety comprises a fluorophore. In some embodiments, a particular detectable reporter moiety (e.g., fluorophore) that is attached to the multivalent molecule can correspond to the base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) of the nucleotide unit to permit detection and identification of the nucleotide base.
[0765] In some embodiments, at least one nucleotide arm of a multivalent molecule has a nucleotide unit that is attached to a detectable reporter moiety. In some embodiments, the detectable reporter moiety is attached to the nucleotide base. In some embodiments, the detectable reporter moiety comprises a fluorophore. In some embodiments, a particular detectable reporter moiety (e.g., fluorophore) that is attached to the multivalent molecule can correspond to the base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) of the nucleotide unit to permit detection and identification of the nucleotide base.
[0766] In some embodiments, the core of a multivalent molecule comprises an avidin-like or streptavidin-like moiety and the core attachment moiety comprises biotin. In some embodiments, the core comprises a streptavidin-type or avidin-type moiety which includes an avidin protein, as well as any derivatives, analogs and other non-native forms of avidin that can bind to at least one biotin moiety. Other forms of avidin moieties include native and recombinant avidin and streptavidin as well as derivatized molecules, e.g. non-glycosylated avidin and truncated streptavidins . For example, avidin moiety includes de-glycosylated forms of avidin, bacterial streptavidin produced by Streptomyces (e.g., Streptomyces avidinii), as well as derivatized forms, for example, N-acyl avidins, e.g., N-acetyl, N-phthalyl and N-succinyl avidin, and the commercially- available products EXTRAVIDIN, CAPTAVIDIN, NEUTRAVIDIN and NEUTRALITE AVIDIN. [0767] In some embodiments, any of the methods for sequencing nucleic acid molecules described herein can include forming a binding complex, where the binding complex comprises (i) a polymerase, a nucleic acid concatemer molecule duplexed with a primer, and a nucleotide, or the binding complex comprises (ii) a polymerase, a nucleic acid concatemer molecule duplexed with a primer, and a nucleotide unit of a multivalent molecule. In some embodiments, the binding complex has a persistence time of greater than about 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 or 1 second. The binding complex has a persistence time of greater than about 0.1-0.25 seconds, or about 0.25-0.5 seconds, or about 0.5-0.75 seconds, or about 0.75-1 second, or about 1-2 seconds, or about 2-3 seconds, or about 3-4 second, or about 4-5 seconds, and/or wherein the method is or may be carried out at a temperature of at or above 15 °C, at or above 20 °C, at or above 25 °C, at or above 35 °C, at or above 37 °C, at or above 42 °C at or above 55 °C at or above 60 °C, or at or above 72 °C, or at or above 80 °C, or within a range defined by any of the foregoing. The binding complex (e.g., ternary complex) remains stable until subjected to a condition that causes dissociation of interactions between any of the polymerase, template molecule, primer and/or the nucleotide unit or the nucleotide. For example, a dissociating condition comprises contacting the binding complex with any one or any combination of a detergent, EDTA and/or water. In some embodiments, the present disclosure provides said method wherein the binding complex is deposited on, attached to, or hybridized to, a surface showing a contrast to noise ratio in the detecting step of greater than 20. In some embodiments, the present disclosure provides said method wherein the contacting is performed under a condition that stabilizes the binding complex when the nucleotide or nucleotide unit is complementary to a next base of the template nucleic acid, and destabilizes the binding complex when the nucleotide or nucleotide unit is not complementary to the next base of the template nucleic acid.
[0768] In some embodiments, in any of the sequencing methods that employ multivalent molecules, the binding of the plurality of first complexed polymerases with the plurality of multivalent molecules forms at least one avidity complex, the method comprising the steps: (a) binding a first nucleic acid primer, a first sequencing polymerase, and a first multivalent molecule to a first portion of a concatemer template molecule thereby forming a first binding complex, wherein a first nucleotide unit of the first multivalent molecule binds to the first sequencing polymerase; and (b) binding a second nucleic acid primer, a second sequencing polymerase, and the first multivalent molecule to a second portion of the same concatemer template molecule thereby forming a second binding complex, wherein a second nucleotide unit of the first multivalent molecule binds to the second sequencing polymerase, wherein the first and second binding complexes which include the same multivalent molecule forms an avidity complex. In some embodiments, the first sequencing polymerase comprises any wild type or mutant polymerase described herein. In some embodiments, the second sequencing polymerase comprises any wild type or mutant polymerase described herein. The concatemer template molecule comprises tandem repeat sequences of a sequence of interest and at least one universal sequencing primer binding site. The first and second nucleic acid primers can bind to a sequencing primer binding site along the concatemer template molecule. Exemplary multivalent molecules are shown in FIGS. 16-19.
[0769] In some embodiments, in any of the sequencing methods that employ multivalent molecules, the method includes binding the plurality of first complexed polymerases with the plurality of multivalent molecules to form at least one avidity complex, the method comprising the steps: (a) contacting the plurality of sequencing polymerases and the plurality of nucleic acid primers with different portions of a concatemer nucleic acid concatemer molecule to form at least first and second complexed polymerases on the same concatemer template molecule; (b) contacting a plurality of multivalent molecules to the at least first and second complexed polymerases on the same concatemer template molecule, under conditions suitable to bind a single multivalent molecule from the plurality to the first and second complexed polymerases, wherein at least a first nucleotide unit of the single multivalent molecule is bound to the first complexed polymerase which includes a first primer hybridized to a first portion of the concatemer template molecule thereby forming a first binding complex (e.g., first ternary complex), and wherein at least a second nucleotide unit of the single multivalent molecule is bound to the second complexed polymerase which includes a second primer hybridized to a second portion of the concatemer template molecule thereby forming a second binding complex (e.g., second ternary complex), wherein the contacting is conducted under a condition suitable to inhibit polymerase-catalyzed incorporation of the bound first and second nucleotide units in the first and second binding complexes, and wherein the first and second binding complexes which are bound to the same multivalent molecule forms an avidity complex; and (c) detecting the first and second binding complexes on the same concatemer template molecule, and (d) identifying the first nucleotide unit in the first binding complex thereby determining the sequence of the first portion of the concatemer template molecule, and identifying the second nucleotide unit in the second binding complex thereby determining the sequence of the second portion of the concatemer template molecule. In some embodiments, the plurality of sequencing polymerases comprise any wild type or mutant sequencing polymerase described herein. The concatemer template molecule comprises tandem repeat sequences of a sequence of interest and at least one universal sequencing primer binding site. The plurality of nucleic acid primers can bind to a sequencing primer binding site along the concatemer template molecule. Exemplary multivalent molecules are shown in FIGS. 16-19.
[0770] FIG. 16 is a schematic of various exemplary configurations of multivalent molecules. Left (Class I): schematics of multivalent molecules having a “starburst” or “helter-skelter” configuration. Center (Class II): a schematic of a multivalent molecule having a dendrimer configuration. Right (Class III): a schematic of multiple multivalent molecules formed by reacting streptavidin with 4-arm or 8-arm PEG-NHS with biotin and dNTPs. Nucleotide units are designated ‘N’, biotin is designated ‘B’, and streptavidin is designated ‘ SA’ .
[0771] FIG. 17 is a schematic of an exemplary multivalent molecule comprising a generic core attached to a plurality of nucleotide-arms.
[0772] FIG. 18 is a schematic of an exemplary multivalent molecule comprising a dendrimer core attached to a plurality of nucleotide-arms.
[0773] FIG. 19 shows a schematic of an exemplary multivalent molecule comprising a core attached to a plurality of nucleotide-arms, where the nucleotide arms comprise biotin, spacer, linker and a nucleotide unit.
[0774] FIG. 20 is a schematic of an exemplary nucleotide-arm comprising a core attachment moiety, spacer, linker and nucleotide unit.
[0775] FIG. 21 shows the chemical structure of an exemplary spacer (TOP), and the chemical structures of various exemplary linkers, including an 11 -atom Linker, 16-atom Linker, 23-atom Linker and an N3 Linker (BOTTOM).
[0776] FIG. 22 shows the chemical structures of various exemplary linkers, including Linkers 1-9.
[0777] FIG. 23 A shows the chemical structures of various exemplary linkers joined/attached to nucleotide units. [0778] FIG. 23B shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
[0779] FIG. 23 C shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
[0780] FIG. 23D shows the chemical structures of various exemplary linkers joined/attached to nucleotide units.
[0781] FIG. 24 shows the chemical structure of an exemplary biotinylated nucleotide- arm. In this example, the nucleotide unit is connected to the linker via a propargyl amine attachment at the 5 position of a pyrimidine base or the 7 position of a purine base.
[0782] FIG. 25 is a schematic of a guanine tetrad (e.g., G-tetrad).
[0783] FIG. 26 is a schematic of an exemplary intramolecular G-quadruplex structure.
Flow cells
[0784] In any of the methods described herein, the cellular sample can be deposited onto a solid support (e.g., a flow cell). In some embodiments, the cellular sample is deposited onto a flow cell having walls (e.g., top or first wall, and bottom or second wall) and a gap in-between, where the gap can be filled with a fluid, where the flow cell is positioned in a fluorescence optical imaging system. The cellular sample has a thickness that may require using the imaging system to focus separately on the first and second surfaces of the flow cell, when using a traditional imaging system. For improved imaging of the sequencing reaction of the concatemers in the cellular sample, the flow cell can be positioned in a high performance fluorescence imaging system, which comprises two or more tube lenses which are designed to provide optimal imaging performance for the first and second surfaces of the flow cell at two or more fluorescence wavelengths. In some embodiments, the high-performance imaging system further comprises a focusing mechanism configured to refocus the optical system between acquiring images of the first and second surfaces of the flow cell. In some embodiments, the high performance imaging system is configured to image two or more fields-of-view on at least one of the first flow cell surface or the second flow cell surface.
Optical systems
[0785] The imager 116 in FIG. 1 can include one or more optical systems. Further disclosed herein are optical system design guidelines and high-performance fluorescence imaging methods and systems that provide improved optical resolution and image quality for fluorescence imaging-based genomics applications. The disclosed optical imaging system designs provide for larger fields-of-view, increased spatial resolution, improved modulation transfer, contrast-to-noise ratio, and image quality, higher spatial sampling frequency, faster transitions between image capture when repositioning the sample plane to capture a series of images (e.g., of different fields-of-view), and improved imaging system duty cycle, and thus enable higher throughput image acquisition and analysis.
[0786] In some instances, improvements in imaging performance, e.g., for dual-side (flow cell) imaging applications, may be achieved by using an electro-optical phase plate in combination with an objective lens to compensate for the optical aberrations induced by the layer of fluid separating the upper (near) and lower (far) interior surfaces of a flow cell. In some instances, this design approach may also compensate for vibrations introduced by, e.g., a motion-actuated compensator that is moved in or out of the optical path depending on which surface of the flow cell is being images.
[0787] In some instances, improvements in imaging performance, e.g., for dual-side (flow cell) imaging applications comprising the use of thick flow cell walls (e.g., wall (or coverslip) thickness > 700 pm) and fluid channels (e.g., fluid channel height or thickness of 50 - 200 pm) may be achieved even when using commercially-available, off-the-shelf objectives by using a tube lens design that corrects for the optical aberrations induced by the thick flow cell walls and/or intervening fluid layer in combination with the objective.
[0788] In some instances, improvements in imaging performance, e.g., for multichannel (e.g., two-color or four-color) imaging applications, may be achieved by using multiple tube lenses, one for each imaging channel, where each tube lens design has been optimized for the specific wavelength range used in that imaging channel.
[0789] Exemplary embodiments disclosed herein may comprise fluorescence imaging systems, said systems comprising: a) at least one light source configured to provide excitation light within one or more specified wavelength ranges; b) an objective lens configured to collect fluorescence arising from within a specified field-of-view of a sample plane upon exposure of the sample plane to the excitation light, wherein a numerical aperture of the objective lens is at least 0.1, at least 0.2, at least 0.3, at least 0.4, at least 0.5, at least 0.6, at least 0.7, at least 0.8, or at least 0.9 or a numerical aperture value falling within a range defined by any two of the foregoing; wherein a working distance of the objective lens is at least 400 pm, at least 500 pm, at least 600 pm, at least 700 m, at least 800 pm, at least 900 pm, at least 1000 pm, or a working distance falling within a range defined by any two of the foregoing; and wherein the field-of-view has an area of at least 0.1 mm2, at least 0.2 mm2, at least 0.5 mm2, at least 0.7 mm2, at least 1 mm2, at least 2 mm2, at least 3 mm2, at least 5 mm2, or at least 10 mm2, or a field of view falling within a range defined by any two of the foregoing; and c) at least one image sensor, wherein the fluorescence collected by the objective lens is imaged onto the image sensor, and wherein a pixel dimension for the image sensor is chosen such that a spatial sampling frequency for the fluorescence imaging system is at least twice an optical resolution of the fluorescence imaging system.
[0790] In some embodiments, the numerical aperture may be at least 0.75. In some embodiments, the numerical aperture is at least 1.0. In some embodiments, the working distance is at least 850 pm. In some embodiments, the working distance is at least 1,000 pm. In some embodiments, the field-of-view may have an area of at least 2.5 mm2. In some embodiments, the field-of-view may have an area of at least 3 mm2. In some embodiments, the spatial sampling frequency may be at least 2.5 times the optical resolution of the fluorescence imaging system. In some embodiments, the spatial sampling frequency may be at least 3 times the optical resolution of the fluorescence imaging system. In some embodiments, the system may further comprise an X-Y-Z translation stage such that the system is configured to acquire a series of two or more fluorescence images in an automated fashion, wherein each image of the series is or can be acquired for a different field-of-view. In some embodiments, a position of the sample plane may be simultaneously adjusted in an X direction, a Y direction, and a Z direction to match the position of an objective lens focal plane in between acquiring images for different fields-of-view. In some embodiments, the time required for the simultaneous adjustments in the X direction, Y direction, and Z direction may be less than 0.3 seconds, less than 0.4 seconds, less than 0.5 seconds, less than 0.7 seconds, or less than 1 second, or a time falling within a range defined by any two of the foregoing. In some embodiments, the system further comprises an autofocus mechanism configured to adjust the focal plane position prior to acquiring an image of a different field-of-view if an error signal indicates that a difference in the position of the focal plane and the sample plane in the Z direction is greater than a specified error threshold. In some embodiments, the specified error threshold is 100 nm or greater. In some embodiments, the specified error threshold is 50 nm or less. In some embodiments, the system comprises three or more image sensors, and wherein the system is configured to image fluorescence in each of three or more wavelength ranges onto a different image sensor. In some embodiments, a difference in the position of a focal plane for each of the three or more image sensors and the sample plane is less than 100 nm. In some embodiments, a difference in the position of a focal plane for each of the three or more image sensors and the sample plane is less than 50 nm. In some embodiments, the total time required to reposition the sample plane, adjust focus if necessary, and acquire an image is less than 0.4 seconds per field-of-view. In some embodiments, the total time required to reposition the sample plane, adjust focus if necessary, and acquire an image is less than 0.3 seconds per field-of-view.
[0791] Also discloser herein are fluorescence imaging systems for dual-side imaging of a flow cell comprising: a) an objective lens configured to collect fluorescence arising from within a specified field-of-view of a sample plane within the flow cell; b) at least one tube lens positioned between the objective lens and at least one image sensor, wherein the at least one tube lens is configured to correct an imaging performance metric for a combination of the objective lens, the at least one tube lens, and the at least one image sensor when imaging an interior surface of the flow cell, and wherein the flow cell has a wall thickness of at least 700 pm and a gap between an upper interior surface and a lower interior surface of at least 50 pm; wherein the imaging performance metric is substantially the same for imaging the upper interior surface or the lower interior surface of the flow cell without moving an optical compensator into or out of an optical path between the flow cell and the at least one image sensor, without moving one or more optical elements of the tube lens along the optical path, and without moving one or more optical elements of the tube lens into or out of the optical path.
[0792] In some embodiments, the objective lens may be a commercially-available microscope objective. In some embodiments, the commercially-available microscope objective may have a numerical aperture of at least 0.3. In some embodiments, the objective lens may have a working distance of at least 700 pm. In some embodiments, the objective lens may be corrected to compensate for a cover slip thickness (or flow cell wall thickness) of 0.17 mm or of greater or lesser thickness than 0.17mm. In some embodiments, the optical system may be corrected to compensate for cover slip thickness, flow cell thickness, or distance between desired focal planes. In some embodiments, said correction may be made by inserting a corrective optic, such as a lens or optical assembly into the light path of the optical system. In some embodiments, said correction may be made without inserting a corrective optic, such as a lens or optical assembly into the light path of the optical system. In some embodiments, the fluorescence imaging system may further comprise an electro-optical phase plate positioned adjacent to the objective lens and between the objective lens and the tube lens, wherein the electro-optical phase plate may provide correction for optical aberrations caused by a fluid filling the gap between the upper interior surface and the lower interior surface of the flow cell. In some embodiments, the at least one tube lens may be a compound lens comprising three or more optical components. In some embodiments, the at least one tube lens is a compound lens comprising four optical components, which may comprise one or more of a first asymmetric convex-convex lens, a second convex-piano lens, a third asymmetric concave-concave lens, and a fourth asymmetric convex-concave lens which may be present in the order as listed above, or in any alternate order. In some embodiments, the at least one tube lens is configured to correct an imaging performance metric for a combination of the objective lens, the at least one tube lens, and the at least one image sensor when imaging an interior surface of a flow cell having a wall thickness of at least 1 mm. In some embodiments, the at least one tube lens is configured to correct an imaging performance metric for a combination of the objective lens, the at least one tube lens, and the at least one image sensor when imaging an interior surface of a flow cell having a gap of at least 100 pm. In some embodiments, the at least one tube lens is configured to correct an imaging performance metric for a combination of the objective lens, the at least one tube lens, and the at least one image sensor when imaging an interior surface of a flow cell having a gap of at least 200 pm. In some embodiments, the system comprises a single objective lens, two tube lenses, and two image sensors, and each of the two tube lenses is designed to provide optimal imaging performance at a different fluorescence wavelength. In some embodiments, the system comprises a single objective lens, three tube lenses, and three image sensors, and each of the three tube lenses is designed to provide optimal imaging performance at a different fluorescence wavelength. In some embodiments, the system comprises a single objective lens, four tube lenses, and four image sensors, and each of the four tube lenses is designed to provide optimal imaging performance at a different fluorescence wavelength. In some embodiments, the design of the objective lens or the at least one tube lens is configured to optimize the modulation transfer function in the mid to high spatial frequency range. In some embodiments, the imaging performance metric comprises a measurement of modulation transfer function (MTF) at one or more specified spatial frequencies, defocus, spherical aberration, chromatic aberration, coma, astigmatism, field curvature, image distortion, contrast-to- noise ratio (CNR), or any combination thereof. In some embodiments, the difference in the imaging performance metric for imaging the upper interior surface and the lower interior surface of the flow cell is less than 10%. In some embodiments, the difference in imaging performance metric for imaging the upper interior surface and the lower interior surface of the flow cell is less than 5%. In some embodiments, the use of the at least one tube lens provides for an at least equivalent or better improvement in the imaging performance metric for dual-side imaging compared to that for a conventional system comprising an objective lens, a motion-actuated compensator, and an image sensor. In some embodiments, the use of the at least one tube lens provides for an at least 10% improvement in the imaging performance metric for dual-side imaging compared to that for a conventional system comprising an objective lens, a motion-actuated compensator, and an image sensor.
[0793] Disclosed herein are illumination systems for use in imaging-based solid-phase genotyping and sequencing applications, the illumination system comprising: a) a light source; and b) a liquid light-guide configured to collect light emitted by the light source and deliver it to a specified field-of-illumination on a support surface comprising tethered biological macromolecules.
[0794] In some embodiments, the illumination system further comprises a condenser lens. In some embodiments, the specified field-of-illumination has an area of at least 2 mm2. In some embodiments, the light delivered to the specified field-of-illumination is of uniform intensity across a specified field-of-view for an imaging system used to acquire images of the support surface. In some embodiments, the specified field-of-view has an area of at least 2 mm2. In some embodiments, the light delivered to the specified field-of- illumination is of uniform intensity across the specified field-of-view when a coefficient of variation (CV) for light intensity is less than 10%. In some embodiments, the light delivered to the specified field-of-illumination is of uniform intensity across the specified field-of-view when a coefficient of variation (CV) for light intensity is less than 5%. In some embodiments, the light delivered to the specified field-of-illumination has a speckle contrast value of less than 0.1. In some embodiments, the light delivered to the specified field-of-illumination has a speckle contrast value of less than 0.05. [0795] Imaging modules and systems: It will be understood by those of skill in the art that the disclosed optical systems, imaging systems, or modules may, in some instances, be stand-alone optical systems designed for imaging a sample or substrate surface. In some instances, they may comprise one or more processors or computers. In some instances, they may comprise one or more software packages that provide instrument control functionality and/or image processing functionality. In some instances, in addition to optical components such as light sources (e.g., solid-state lasers, dye lasers, diode lasers, arc lamps, tungsten-halogen lamps, etc.), lenses, prisms, mirrors, dichroic reflectors, optical filters, optical bandpass filters, apertures, and image sensors (e.g., complementary metal oxide semiconductor (CMOS) image sensors and cameras, charge- coupled device (CCD) image sensors and cameras, etc.), they may also include mechanical and/or optomechanical components, such as an X-Y translation stage, an X- Y-Z translation stage, a piezoelectic focusing mechanism, and the like. In some instances, they may function as modules, components, sub-assemblies, or sub-systems of larger systems designed for genomics applications (e.g., genetic testing and/or nucleic acid sequencing applications). For example, in some instances, they may function as modules, components, sub-assemblies, or sub-systems of larger systems that further comprise lighttight and/or other environmental control housings, temperature control modules, fluidics control modules, fluid dispensing robotics, pick-and-place robotics, one or more processors or computers, one or more local and/or cloud-based software packages (e.g., instrument / system control software packages, image processing software packages, data analysis software packages), data storage modules, data communication modules (e.g., Bluetooth, WiFi, intranet, or internet communication hardware and associated software), display modules, or any combination thereof.
Methods for Sequencing using Nucleotide Analogs
[0796] The present disclosure provides methods for sequencing any of the immobilized template molecules described herein, the methods comprising step (a): contacting a sequencing polymerase to (i) a nucleic acid template molecule and (ii) a nucleic acid sequencing primer, wherein the contacting is conducted under a condition suitable to bind the sequencing polymerase to the nucleic acid template molecule which is hybridized to the nucleic acid primer, wherein the nucleic acid template molecule hybridized to the nucleic acid primer forms the nucleic acid duplex. In some embodiments, the sequencing polymerase comprises a recombinant mutant sequencing polymerase that can bind and incorporate nucleotide analogs.
[0797] In some embodiments, in the methods for sequencing template molecules, the sequencing primer comprises a 3’ extendible end or a 3’ non-extendible end. In some embodiments, the plurality of nucleic acid template molecules comprise amplified template molecules (e.g., clonally amplified template molecules). In some embodiments, the plurality of nucleic acid template molecules comprise one copy of a target sequence of interest. In some embodiments, the plurality of nucleic acid molecules comprise two or more tandem copies of a target sequence of interest (e.g., concatemers). In some embodiments, the plurality of nucleic acid template molecules comprise the same target sequence of interest or different target sequences of interest. In some embodiments, the plurality of nucleic acid primers are in solution or are immobilized to a support. In some embodiments, when the plurality of nucleic acid template molecules and/or the plurality of nucleic acid primers are immobilized to a support, the binding with the first sequencing polymerase generates a plurality of immobilized first complexed polymerases. In some embodiments, the plurality of nucleic acid template molecules and/or nucleic acid primers are immobilized to 102 - 1015 different sites on a support. In some embodiments, the binding of the plurality of template molecules and nucleic acid primers with the plurality of first sequencing polymerases generates a plurality of first complexed polymerases immobilized to 102 - 1015 different sites on the support. In some embodiments, the plurality of immobilized first complexed polymerases on the support are immobilized to pre-determined or to random sites on the support. In some embodiments, the plurality of immobilized first complexed polymerases are in fluid communication with each other to permit flowing a solution of reagents (e.g., enzymes including sequencing polymerases, multivalent molecules, nucleotides, and/or divalent cations) onto the support so that the plurality of immobilized complexed polymerases on the support are reacted with the solution of reagents in a massively parallel manner.
[0798] In some embodiments, the methods for sequencing further comprise step (b): contacting the sequencing polymerase with a plurality of nucleotides under a condition suitable for binding at least one nucleotide to the sequencing polymerase which is bound to the nucleic acid duplex and suitable for polymerase-catalyzed nucleotide incorporation which extends the sequencing primer by one nucleotide. In some embodiments, the sequencing polymerase is contacted with the plurality of nucleotides in the presence of at least one catalytic cation comprising magnesium and/or manganese. In some embodiments, the plurality of nucleotides comprises at least one nucleotide analog having a chain terminating moiety at the sugar 2’ or 3’ position. In some embodiments, the chain terminating moiety is removable from the sugar 2’ or 3’ position to convert the chain terminating moiety to an OH or H group. In some embodiments, the plurality of nucleotides comprises at least one nucleotide that lacks a chain terminating moiety. In some embodiments, at least on nucleotide is labeled with a detectable reporter moiety (e.g., fluorophore) that emits a detectable signal. The detectable reporter moiety comprises a fluorophore. In some embodiments, the fluorophore is attached to the nucleo- base. In some embodiments, the fluorophore is attached to the nucleo-base with a linker which is cleavable/removable from the base. In some embodiments, at least one of the nucleotides in the plurality is not labeled with a detectable reporter moiety. In some embodiments, a particular detectable reporter moiety (e.g., fluorophore) that is attached to the nucleotide can correspond to the nucleotide base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) to permit detection and identification of the nucleo-base. When the incorporated chain terminating nucleotide is detectably labeled, step (b) further comprises detecting the emitted signal from the incorporated chain terminating nucleotide. In some embodiments, step (b) further comprises identifying the nucleo-based of the incorporated chain terminating nucleotide.
[0799] In some embodiments, the methods for sequencing further comprise step (c): removing the chain terminating moiety from the incorporated chain terminating nucleotide to generate an extendible 3 ’OH group. In some embodiments, step (c) further comprises removing the detectable label from the incorporated chain terminating nucleotide. In some embodiments, the sequencing polymerase remains bound to the template molecule which is hybridized to the sequencing primer which is extended by one nucleo-base.
[0800] In some embodiments, the methods for sequencing further comprise step (d): repeating steps (b) and (c) at least once.
Methods for Sequencing using Phosphate-Chain Labeled Nucleotides
[0801] The present disclosure provides methods for sequencing using immobilized sequencing polymerases which bind non-immobilized template molecules, wherein the sequencing reactions are conducted with phosphate-chain labeled nucleotides. In some embodiments, the sequencing methods comprise step (a): providing a support having a plurality of sequencing polymerases immobilized thereon. In some embodiments, the sequencing polymerase comprises a processive DNA polymerase. In some embodiments, the sequencing polymerase comprises a wild type or mutant DNA polymerase, including for example a Phi29 DNA polymerase. In some embodiments, the support comprise a plurality of separate compartments and a sequencing polymerase is immobilized to the bottom of a compartment. In some embodiments, the separate compartments comprise a silica bottom through which light can penetrate. In some embodiments, the separate compartments comprise a silica bottom configured with a nanophotonic confinement structure comprising a hole in a metal cladding film (e.g., aluminum cladding film). In some embodiments, the hole in the metal cladding has a small aperture, for example, approximately 70 nm. In some embodiments, the height of the nanophotonic confinement structure is approximately 100 nm. In some embodiments, the nanophotonic confinement structure comprises a zero mode waveguide (ZMW). In some embodiments, the nanophotonic confinement structure contains a liquid.
[0802] In some embodiments, the sequencing method further comprises step (b): contacting the plurality of immobilized sequencing polymerases with a plurality of single stranded circular nucleic acid template molecules and a plurality of oligonucleotide sequencing primers, under a condition suitable for individual immobilized sequencing polymerases to bind a single stranded circular template molecule, and suitable for individual sequencing primers to hybridize to individual single stranded circular template molecules, thereby generating a plurality of polymerase/template/primer complexes. In some embodiments, the individual sequencing primers hybridize to a universal sequencing primer binding site on the single stranded circular template molecule.
[0803] In some embodiments, the sequencing method further comprises step (c): contacting the plurality of polymerase/template/primer complexes with a plurality of phosphate chain labeled nucleotides each comprising an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and phosphate chain comprising 3-20 phosphate groups, where the terminal phosphate group is linked to a detectable reporter moiety (e.g., a fluorophore). The first, second and third phosphate groups can be referred to as alpha, beta and gamma phosphate groups. In some embodiments, a particular detectable reporter moiety which is attached to the terminal phosphate group corresponds to the nucleotide base (e.g., dATP, dGTP, dCTP, dTTP or dUTP) to permit detection and identification of the nucleo-base. In some embodiments, the plurality of polymerase/template/primer complexes are contacted with the plurality of phosphate chain labeled nucleotides under a condition suitable for polymerase-catalyzed nucleotide incorporation. In some embodiments, the sequencing polymerases are capable of binding a complementary phosphate chain labeled nucleotide and incorporating the complementary nucleotide opposite a nucleotide in a template molecule. In some embodiment, the polymerase- catalyzed nucleotide incorporation reaction cleaves between the alpha and beta phosphate groups thereby releasing a multi-phosphate chain linked to a fluorophore.
[0804] In some embodiments, the sequencing method further comprises step (d): detecting the fluorescent signal emitted by the phosphate chain labeled nucleotide that is bound by the sequencing polymerase, and incorporated into the terminal end of the sequencing primer. In some embodiments, step (d) further comprises identifying the phosphate chain labeled nucleotide that is bound by the sequencing polymerase, and incorporated into the terminal end of the sequencing primer.
[0805] In some embodiments, the sequencing method further comprises step (d): repeating steps (c) - (d) at least once. In some embodiments, sequencing methods that employ phosphate chain labeled nucleotides can be conducted according to the methods described in U.S. patent Nos. 7,170,050; 7,302,146; and/or 7,405,281.
Supports and Coatings
[0806] In any of the methods described herein, the solid support comprises a flow cell having a coating that promotes cell adhesion. In some embodiments, the flow cell comprises a support which can be a planar or non-planar support. The support can be solid or semi-solid. In some embodiments, the support can be porous, semi-porous or non-porous. The support can be made of any material such as glass, plastic or a polymer material. In some embodiments, the surface of the support can be coated with one or more compounds to produce a passivated layer on the support (FIG. 15). In some embodiments, the passivated layer forms a porous or semi-porous layer. In some embodiments, the support is coated with a lysine compound, poly-lysine compound, arginine compound or an amino-terminated compound. The support can be coated with an unbranched compound, a branched compound, or a mixture of unbranched and branched compounds. In some embodiments, the support is coated with surface primers for capturing nucleic acids from the cellular sample. Alternatively, the support lacks surface primers. [0807] In any of the methods described herein, the solid support comprises a flow cell having a coating that promotes cell adhesion. In some embodiments, the flow cell comprises a support which can be a planar or non-planar support. The support can be solid or semi-solid. In some embodiments, the support can be porous, semi-porous or non-porous. The support can be made of any material such as glass, plastic or a polymer material. In some embodiments, the surface of the support can be coated with one or more compounds to produce a passivated layer on the support (FIG. 15). In some embodiments, the passivated layer forms a porous or semi-porous layer. In some embodiments, the support is coated with a lysine compound, poly-lysine compound, arginine compound or an amino-terminated compound. The support can be coated with an unbranched compound, a branched compound, or a mixture of unbranched and branched compounds. In some embodiments, the support is coated with surface primers for capturing nucleic acids from the cellular sample. Alternatively, the support lacks surface primers.
[0808] FIG. 15 is a schematic of an exemplary low binding support comprising a glass substrate and alternating layers of hydrophilic coatings which are covalently or non- covalently adhered to the glass, and which further comprises chemically-reactive functional groups that serve as attachment sites for oligonucleotide primers (e.g., capture oligonucleotides). In an alternative embodiment, the support can be made of any material such as glass, plastic or a polymer material.
[0809] The support can comprise one or more substrates. The support can include a glass or plastic substrate. The support can include a transparent top substrate that is closest to the objective lens of the optical system. The support can include one or more microfluidic channels and the concatemer molecules and the cellular sample are immobilized to a surface of the microfluidic channels. In some embodiments, the support is comprised in a flow cell device.
[0810] The low non-specific binding coating comprises one layer or multiple layers (FIG. 15). In some embodiments, the plurality of surface primers are immobilized to the low non-specific binding coating. In some embodiments, at least one surface primer is embedded within the low non-specific binding coating. The low non-specific binding coating enables improved nucleic acid hybridization and amplification performance. In general, the supports comprise a substrate (or support structure), one or more layers of a covalently or non-covalently attached low-binding, chemical modification layers, e.g., silane layers, polymer films, and one or more covalently or non-covalently attached surface primers that can be used for tethering single-stranded nucleic acid library molecules to the support. In some embodiments, the formulation of the coating, e.g., the chemical composition of one or more layers, the coupling chemistry used to cross-link the one or more layers to the support and/or to each other, and the total number of layers, may be varied such that non-specific binding of proteins, nucleic acid molecules, and other hybridization and amplification reaction components to the coating is minimized or reduced relative to a comparable monolayer. The formulation of the coating described herein may be varied such that non-specific hybridization on the coating is minimized or reduced relative to a comparable monolayer. The formulation of the coating may be varied such that non-specific amplification on the coating is minimized or reduced relative to a comparable monolayer. The formulation of the coating may be varied such that specific amplification rates and/or yields on the coating are maximized.
Amplification levels suitable for detection are achieved in no more than 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, or more than 30 amplification cycles in some cases disclosed herein. [0811] The support structure that comprises the one or more chemically-modified layers, e.g., layers of a low non-specific binding polymer, may be independent or integrated into another structure or assembly. For example, in some embodiments, the support structure may comprise one or more surfaces within an integrated or assembled microfluidic flow cell. The support structure may comprise one or more surfaces within a microplate format, e.g., the bottom surface of the wells in a microplate. In some embodiments, the support structure comprises the interior surface (such as the lumen surface) of a capillary. In some embodiments, the support structure comprises the interior surface (such as the lumen surface) of a capillary etched into a planar chip.
[0812] The attachment chemistry used to graft a first chemically-modified layer to the surface of the support will generally be dependent on both the material from which the surface is fabricated and the chemical nature of the layer. In some embodiments, the first layer may be covalently attached to the surface. In some embodiments, the first layer may be non-covalently attached, e.g., adsorbed to the support through non-covalent interactions such as electrostatic interactions, hydrogen bonding, or van der Waals interactions between the support and the molecular components of the first layer. In either case, the support may be treated prior to attachment or deposition of the first layer. Any of a variety of surface preparation techniques known to those of skill in the art may be used to clean or treat the surface. For example, glass or silicon surfaces may be acid- washed using a Piranha solution (a mixture of sulfuric acid (H2SO4) and hydrogen peroxide (H2O2)), base treatment in KOH and NaOH, and/or cleaned using an oxygen plasma treatment method.
[0813] Silane chemistries constitute non-limiting approaches for covalently modifying the silanol groups on glass or silicon surfaces to attach more reactive functional groups (e.g., amines or carboxyl groups), which may then be used in coupling linker molecules (e.g., linear hydrocarbon molecules of various lengths, such as C6, Cl 2, Cl 8 hydrocarbons, or linear polyethylene glycol (PEG) molecules) or layer molecules (e.g., branched PEG molecules or other polymers) to the surface. Examples of suitable silanes that may be used in creating any of the disclosed low binding coatings include, but are not limited to, (3 -Aminopropyl) trimethoxy silane (APTMS), (3 -Aminopropyl) tri ethoxy silane (APTES), any of a variety of PEG-silanes (e.g., comprising molecular weights of IK, 2K, 5K, 10K, 20K, etc.), amino-PEG silane (i.e., comprising a free amino functional group), maleimide-PEG silane, biotin-PEG silane, and the like.
[0814] Any of a variety of molecules known to those of skill in the art including, but not limited to, amino acids, peptides, nucleotides, oligonucleotides, other monomers or polymers, or combinations thereof may be used in creating the one or more chemically- modified layers on the support, where the choice of components used may be varied to alter one or more properties of the layers, e.g., the surface density of functional groups and/or tethered oligonucleotide primers, the hydrophilicity /hydrophobicity of the layers, or the three three-dimensional nature (i.e., “thickness”) of the layer. Examples of polymers that may be used to create one or more layers of low non-specific binding material in any of the disclosed coatings include, but are not limited to, polyethylene glycol (PEG) of various molecular weights and branching structures, streptavidin, polyacrylamide, polyester, dextran, poly-lysine, and poly-lysine copolymers, or any combination thereof. Examples of conjugation chemistries that may be used to graft one or more layers of material (e.g. polymer layers) to the surface and/or to cross-link the layers to each other include, but are not limited to, biotin-streptavidin interactions (or variations thereof), his tag - Ni/NTA conjugation chemistries, methoxy ether conjugation chemistries, carboxylate conjugation chemistries, amine conjugation chemistries, NHS esters, maleimides, thiol, epoxy, azide, hydrazide, alkyne, isocyanate, and silane.
[0815] The low non-specific binding surface coating may be applied uniformly across the support. Alternatively, the surface coating may be patterned, such that the chemical modification layers are confined to one or more discrete regions of the support. For example, the coating may be patterned using photolithographic techniques to create an ordered array or random pattern of chemically-modified regions on the support. Alternately or in combination, the coating may be patterned using, e.g., contact printing and/or ink-jet printing techniques. In some embodiments, an ordered array or random pattern of chemically-modified regions may comprise at least 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10,000 or more discrete regions.
[0816] In some embodiments, the low nonspecific binding coatings comprise hydrophilic polymers that are non-specifically adsorbed or covalently grafted to the support.
Typically, passivation is performed utilizing polyethylene glycol) (PEG, also known as polyethylene oxide (PEO) or polyoxyethylene) or other hydrophilic polymers with different molecular weights and end groups that are linked to a support using, for example, silane chemistry. The end groups distal from the surface can include, but are not limited to, biotin, methoxy ether, carboxylate, amine, NHS ester, maleimide, and bissilane. In some embodiments, two or more layers of a hydrophilic polymer, e.g., a linear polymer, branched polymer, or multi-branched polymer, may be deposited on the surface. In some embodiments, two or more layers may be covalently coupled to each other or internally cross-linked to improve the stability of the resulting coating. In some embodiments, surface primers with different nucleotide sequences and/or base modifications (or other biomolecules, e.g., enzymes or antibodies) may be tethered to the resulting layer at various surface densities. In some embodiments, for example, both surface functional group density and surface primer concentration may be varied to attain a desired surface primer density range. Additionally, surface primer density can be controlled by diluting the surface primers with other molecules that carry the same functional group. For example, amine-labeled surface primers can be diluted with amine- labeled polyethylene glycol in a reaction with an NHS-ester coated surface to reduce the final primer density. Surface primers with different lengths of linker between the hybridization region and the surface attachment functional group can also be applied to control surface density. Example of suitable linkers include poly-T and poly-A strands at the 5’ end of the primer (e.g., 0 to 20 bases), PEG linkers (e.g., 3 to 20 monomer units), and carbon-chain (e.g., C6, C12, C18, etc.). To measure the primer density, fluorescently- labeled primers may be tethered to the surface and a fluorescence reading then compared with that for a dye solution of known concentration.
[0817] In some embodiments, the low nonspecific binding coatings comprise a functionalized polymer coating layer covalently bound at least to a portion of the support via a chemical group on the support, a primer grafted to the functionalized polymer coating, and a water-soluble protective coating on the primer and the functionalized polymer coating. In some embodiments, the functionalized polymer coating comprises a poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide (PAZAM).
[0818] In order to scale primer surface density and add additional dimensionality to hydrophilic or amphoteric coatings, supports comprising multi-layer coatings of PEG and other hydrophilic polymers have been developed. By using hydrophilic and amphoteric surface layering approaches that include, but are not limited to, the polymer/co-polymer materials described below, it is possible to increase primer loading density on the support significantly. Traditional PEG coating approaches use monolayer primer deposition, which have been generally reported for single molecule applications, but do not yield high copy numbers for nucleic acid amplification applications. As described herein “layering” can be accomplished using traditional crosslinking approaches with any compatible polymer or monomer subunits such that a surface comprising two or more highly crosslinked layers can be built sequentially. Examples of suitable polymers include, but are not limited to, streptavidin, poly acrylamide, polyester, dextran, polylysine, and copolymers of poly-lysine and PEG. In some embodiments, the different layers may be attached to each other through any of a variety of conjugation reactions including, but not limited to, biotin-streptavidin binding, azide-alkyne click reaction, amine-NHS ester reaction, thiol-maleimide reaction, and ionic interactions between positively charged polymer and negatively charged polymer. In some embodiments, high primer density materials may be constructed in solution and subsequently layered onto the surface in multiple steps.
[0819] Examples of materials from which the support structure may be fabricated include, but are not limited to, glass, fused-silica, silicon, a polymer (e.g., polystyrene (PS), macroporous polystyrene (MPPS), polymethylmethacrylate (PMMA), polycarbonate (PC), polypropylene (PP), polyethylene (PE), high density polyethylene (HDPE), cyclic olefin polymers (COP), cyclic olefin copolymers (COC), polyethylene terephthalate (PET)), or any combination thereof. Various compositions of both glass and plastic support structures are contemplated.
[0820] The support structure may be rendered in any of a variety of geometries and dimensions known to those of skill in the art, and may comprise any of a variety of materials known to those of skill in the art. For example, the support structure may be locally planar (e.g., comprising a microscope slide or the surface of a microscope slide). Globally, the support structure may be cylindrical (e.g., comprising a capillary or the interior surface of a capillary), spherical (e.g., comprising the outer surface of a non- porous bead), or irregular (e.g., comprising the outer surface of an irregularly-shaped, non-porous bead or particle). In some embodiments, the surface of the support structure used for nucleic acid hybridization and amplification may be a solid, non-porous surface. In some embodiments, the surface of the support structure used for nucleic acid hybridization and amplification may be porous, such that the coatings described herein penetrate the porous surface, and nucleic acid hybridization and amplification reactions performed thereon may occur within the pores.
[0821] The support structure that comprises the one or more chemically-modified layers, e.g., layers of a low non-specific binding polymer, may be independent or integrated into another structure or assembly. For example, the support structure may comprise one or more surfaces within an integrated or assembled microfluidic flow cell. The support structure may comprise one or more surfaces within a microplate format, e.g., the bottom surface of the wells in a microplate. In some embodiments, the support structure comprises the interior surface (such as the lumen surface) of a capillary. In some embodiments the support structure comprises the interior surface (such as the lumen surface) of a capillary etched into a planar chip.
[0822] As noted, the low non-specific binding supports of the present disclosure exhibit reduced non-specific binding of proteins, nucleic acids, and other components of the hybridization and/or amplification formulation used for solid-phase nucleic acid amplification. The degree of non-specific binding exhibited by a given support surface may be assessed either qualitatively or quantitatively. For example, exposure of the surface to fluorescent dyes (e.g., cyanins such as Cy3, or Cy5, etc., fluoresceins, coumarins, rhodamines, etc. or other dyes disclosed herein), fluorescently-labeled nucleotides, fluorescently-labeled oligonucleotides, and/or fluorescently-labeled proteins (e.g. polymerases) under a standardized set of conditions, followed by a specified rinse protocol and fluorescence imaging may be used as a qualitative tool for comparison of non-specific binding on supports comprising different surface formulations. In some embodiments, exposure of the surface to fluorescent dyes, fluorescently-labeled nucleotides, fluorescently-labeled oligonucleotides, and/or fluorescently-labeled proteins (e.g. polymerases) under a standardized set of conditions, followed by a specified rinse protocol and fluorescence imaging may be used as a quantitative tool for comparison of non-specific binding on supports comprising different surface formulations — provided that care has been taken to ensure that the fluorescence imaging is performed under conditions where fluorescence signal is linearly related (or related in a predictable manner) to the number of fluorophores on the support surface (e.g., under conditions where signal saturation and/or self-quenching of the fluorophore is not an issue) and suitable calibration standards are used. In some embodiments, other techniques known to those of skill in the art, for example, radioisotope labeling and counting methods may be used for quantitative assessment of the degree to which non-specific binding is exhibited by the different support surface formulations of the present disclosure.
[0823] Some surfaces disclosed herein exhibit a ratio of specific to nonspecific binding of a fluorophore such as Cy3 of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 75, 100, or greater than 100, or any intermediate value spanned by the range herein. Some surfaces disclosed herein exhibit a ratio of specific to nonspecific fluorescence of a fluorophore such as Cy3 of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 75, 100, or greater than 100, or any intermediate value spanned by the range herein.
[0824] The degree of non-specific binding exhibited by the disclosed low-binding supports may be assessed using a standardized protocol for contacting the surface with a labeled protein (e.g., bovine serum albumin (BSA), streptavidin, a DNA polymerase, a reverse transcriptase, a helicase, a single-stranded binding protein (SSB), etc., or any combination thereof), a labeled nucleotide, a labeled oligonucleotide, etc., under a standardized set of incubation and rinse conditions, followed be detection of the amount of label remaining on the surface and comparison of the signal resulting therefrom to an appropriate calibration standard. In some embodiments, the label may comprise a fluorescent label. In some embodiments, the label may comprise a radioisotope. In some embodiments, the label may comprise any other detectable label known to one of skill in the art. In some embodiments, the degree of non-specific binding exhibited by a given support surface formulation may thus be assessed in terms of the number of non- specifically bound protein molecules (or nucleic acid molecules or other molecules) per unit area. In some embodiments, the low-binding supports of the present disclosure may exhibit non-specific protein binding (or non-specific binding of other specified molecules, (e.g., cyanins such as Cy3, or Cy5, etc., fluoresceins, coumarins, rhodamines, etc. or other dyes disclosed herein)) of less than 0.001 molecule per pm2, less than 0.01 molecule per pm2, less than 0.1 molecule per pm2, less than 0.25 molecule per pm2, less than 0.5 molecule per pm2, less than 1 molecule per pm2, less than 10 molecules per pm2, less than 100 molecules per pm2, or less than 1,000 molecules per pm2. Those of skill in the art will realize that a given support surface of the present disclosure may exhibit nonspecific binding falling anywhere within this range, for example, of less than 86 molecules per pm2. For example, some modified surfaces disclosed herein exhibit nonspecific protein binding of less than 0.5 molecule/pm2 following contact with a 1 pM solution of Cy3 labeled streptavidin (GE Amersham) in phosphate buffered saline (PBS) buffer for 15 minutes, followed by 3 rinses with deionized water. Some modified surfaces disclosed herein exhibit nonspecific binding of Cy3 dye molecules of less than 0.25 molecules per pm2. In independent nonspecific binding assays, 1 pM labeled Cy3 SA (ThermoFisher), 1 pM Cy5 SA dye (ThermoFisher), 10 pM Aminoallyl-dUTP-ATTO- 647N (Jena Biosciences), 10 pM Aminoallyl-dUTP-ATTO-Rhol 1 (Jena Biosciences), 10 pM Aminoallyl-dUTP-ATTO-Rhol 1 (Jena Biosciences), 10 pM 7-Propargylamino-7- deaza-dGTP-Cy5 (Jena Biosciences, and 10 pM 7-Propargylamino-7-deaza-dGTP-Cy3 (Jena Biosciences) were incubated on the low binding coated supports at 37° C. for 15 minutes in a 384 well plate format. Each well was rinsed 2-3 x with 50 ul deionized RNase/DNase Free water and 2-3 x with 25 mM ACES buffer pH 7.4. The 384 well plates were imaged on a GE Typhoon instrument using the Cy3, AF555, or Cy5 filter sets (according to dye test performed) as specified by the manufacturer at a PMT gain setting of 800 and resolution of 50-100 pm. For higher resolution imaging, images were collected on an Olympus 1X83 microscope (e.g., inverted fluorescence microscope) (Olympus Corp., Center Valley, Pa.) with a total internal reflectance fluorescence (TIRF) objective (100x, 1.5 NA, Olympus), a CCD camera (e.g., an Olympus EM-CCD monochrome camera, Olympus XM-10 monochrome camera, or an Olympus DP80 color and monochrome camera), an illumination source (e.g., an Olympus 100W Hg lamp, an Olympus 75W Xe lamp, or an Olympus U-HGLGPS fluorescence light source), and excitation wavelengths of 532 nm or 635 nm. Dichroic mirrors were purchased from Semrock (IDEX Health & Science, LLC, Rochester, N.Y.), e.g., 405, 488, 532, or 633 nm dichroic reflectors/beamsplitters, and band pass filters were chosen as 532 LP or 645 LP concordant with the appropriate excitation wavelength. Some modified surfaces disclosed herein exhibit nonspecific binding of dye molecules of less than 0.25 molecules per pm2. In some embodiments, the coated support was immersed in a buffer (e.g., 25 mM ACES, pH 7.4) while the image was acquired.
[0825] In some embodiments, the surfaces disclosed herein exhibit a ratio of specific to nonspecific binding of a fluorophore such as Cy3 of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 75, 100, or greater than 100, or any intermediate value spanned by the range herein. In some embodiments, the surfaces disclosed herein exhibit a ratio of specific to nonspecific fluorescence signals for a fluorophore such as Cy3 of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 75, 100, or greater than 100, or any intermediate value spanned by the range herein.
[0826] The low-background surfaces consistent with the disclosure herein may exhibit specific dye attachment (e.g., Cy3 attachment) to non-specific dye adsorption (e.g., Cy3 dye adsorption) ratios of at least 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 15: 1, 20: 1, 30: 1, 40:1, 50: 1, or more than 50 specific dye molecules attached per molecule nonspecifically adsorbed. Similarly, when subjected to an excitation energy, low-background surfaces consistent with the disclosure herein to which fluorophores, e.g., Cy3, have been attached may exhibit ratios of specific fluorescence signal (e.g., arising from Cy3-labeled oligonucleotides attached to the surface) to non-specific adsorbed dye fluorescence signals of at least 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 15:1, 20:1, 30: 1, 40: 1, 50: 1, or more than 50: 1.
[0827] In some embodiments, the degree of hydrophilicity (or “wettability” with aqueous solutions) of the disclosed support surfaces may be assessed, for example, through the measurement of water contact angles in which a small droplet of water is placed on the surface and its angle of contact with the surface is measured using, e.g., an optical tensiometer. In some embodiments, a static contact angle may be determined. In some embodiments, an advancing or receding contact angle may be determined. In some embodiments, the water contact angle for the hydrophilic, low-binding support surfaced disclosed herein may range from about 0 degrees to about 30 degrees. In some embodiments, the water contact angle for the hydrophilic, low-binding support surfaced disclosed herein may no more than 50 degrees, 40 degrees, 30 degrees, 25 degrees, 20 degrees, 18 degrees, 16 degrees, 14 degrees, 12 degrees, 10 degrees, 8 degrees, 6 degrees, 4 degrees, 2 degrees, or 1 degree. In many cases the contact angle is no more than 40 degrees. Those of skill in the art will realize that a given hydrophilic, low-binding support surface of the present disclosure may exhibit a water contact angle having a value of anywhere within this range.
[0828] In some embodiments, the hydrophilic surfaces disclosed herein facilitate reduced wash times for bioassays, often due to reduced nonspecific binding of biomolecules to the low-binding surfaces. In some embodiments, adequate wash steps may be performed in less than 60, 50, 40, 30, 20, 15, 10, or less than 10 seconds. For example, adequate wash steps may be performed in less than 30 seconds.
[0829] Some low-binding surfaces of the present disclosure exhibit significant improvement in stability or durability to prolonged exposure to solvents and elevated temperatures, or to repeated cycles of solvent exposure or changes in temperature. For example, the stability of the disclosed surfaces may be tested by fluorescently labeling a functional group on the surface, or a tethered biomolecule (e.g., an oligonucleotide primer) on the surface, and monitoring fluorescence signal before, during, and after prolonged exposure to solvents and elevated temperatures, or to repeated cycles of solvent exposure or changes in temperature. In some embodiments, the degree of change in the fluorescence used to assess the quality of the surface may be less than 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, or 25% over a time period of 1 minute, 2 minutes, 3 minutes, 4 minutes, 5 minutes, 10 minutes, 20 minutes, 30 minutes, 40 minutes, 50 minutes, 60 minutes, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 15 hours, 20 hours, 25 hours, 30 hours, 35 hours, 40 hours, 45 hours, 50 hours, or 100 hours of exposure to solvents and/or elevated temperatures (or any combination of these percentages as measured over these time periods). In some embodiments, the degree of change in the fluorescence used to assess the quality of the surface may be less than 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, or 25% over 5 cycles, 10 cycles, 20 cycles, 30 cycles, 40 cycles, 50 cycles, 60 cycles, 70 cycles, 80 cycles, 90 cycles, 100 cycles, 200 cycles, 300 cycles, 400 cycles, 500 cycles, 600 cycles, 700 cycles, 800 cycles, 900 cycles, or 1,000 cycles of repeated exposure to solvent changes and/or changes in temperature (or any combination of these percentages as measured over this range of cycles). [0830] In some embodiments, the surfaces disclosed herein may exhibit a high ratio of specific signal to nonspecific signal or other background. For example, when used for nucleic acid amplification, some surfaces may exhibit an amplification signal that is at least 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 75, 100, or greater than 100 fold greater than a signal of an adjacent unpopulated region of the surface. Similarly, some surfaces exhibit an amplification signal that is at least 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 75, 100, or greater than 100 fold greater than a signal of an adjacent amplified nucleic acid population region of the surface.
[0831] In some embodiments, fluorescence images of the disclosed low background surfaces when used in nucleic acid hybridization or amplification applications to create polonies of hybridized or clonally-amplified nucleic acid molecules (e.g., that have been directly or indirectly labeled with a fluorophore) exhibit contrast-to-noise ratios (CNRs) of at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 20, 210, 220, 230, 240, 250, or greater than 250.
[0832] One or more types of primer may be attached or tethered to the support surface. In some embodiments, the one or more types of adapters or primers may comprise spacer sequences, adapter sequences for hybridization to adapter-ligated target library nucleic acid sequences, forward amplification primers, reverse amplification primers, sequencing primers, and/or molecular barcoding sequences, or any combination thereof. In some embodiments, 1 primer or adapter sequence may be tethered to at least one layer of the surface. In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 different primer or adapter sequences may be tethered to at least one layer of the surface.
[0833] In some embodiments, the tethered adapter and/or primer sequences may range in length from about 10 nucleotides to about 100 nucleotides. In some embodiments, the tethered adapter and/or primer sequences may be at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 nucleotides in length. In some embodiments, the tethered adapter and/or primer sequences may be at most 100, at most 90, at most 80, at most 70, at most 60, at most 50, at most 40, at most 30, at most 20, or at most 10 nucleotides in length. Any of the lower and upper values described in this paragraph may be combined to form a range included within the present disclosure, for example, in some embodiments the length of the tethered adapter and/or primer sequences may range from about 20 nucleotides to about 80 nucleotides. Those of skill in the art will recognize that the length of the tethered adapter and/or primer sequences may have any value within this range, e.g., about 24 nucleotides.
[0834] In some embodiments, the resultant surface density of primers (e.g., capture primers) on the low binding support surfaces of the present disclosure may range from about 100 primer molecules per pm2 to about 100,000 primer molecules per pm2. In some embodiments, the resultant surface density of primers on the low binding support surfaces of the present disclosure may range from about 1,000 primer molecules per pm2 to about 1,000,000 primer molecules per pm2. In some embodiments, the surface density of primers may be at least 1,000, at least 10,000, at least 100,000, or at least 1,000,000 molecules per pm2. In some embodiments, the surface density of primers may be at most 1,000,000, at most 100,000, at most 10,000, or at most 1,000 molecules per pm2. Any of the lower and upper values described in this paragraph may be combined to form a range included within the present disclosure, for example, in some embodiments the surface density of primers may range from about 10,000 molecules per pm2 to about 100,000 molecules per pm2. Those of skill in the art will recognize that the surface density of primer molecules may have any value within this range, e.g., about 455,000 molecules per pm2. In some embodiments, the surface density of target library nucleic acid sequences initially hybridized to adapter or primer sequences on the support surface may be less than or equal to that indicated for the surface density of tethered primers. In some embodiments, the surface density of clonally-amplified target library nucleic acid sequences hybridized to adapter or primer sequences on the support surface may span the same range as that indicated for the surface density of tethered primers.
[0835] Local densities as listed above do not preclude variation in density across a surface, such that a surface may comprise a region having an oligo density of, for example, 500,000/pm2, while also comprising at least a second region having a substantially different local density.
[0836] In some embodiments, the performance of nucleic acid hybridization and/or amplification reactions using the disclosed reaction formulations and low-binding supports may be assessed using fluorescence imaging techniques, where the contrast-to- noise ratio (CNR) of the images provides a key metric in assessing amplification specificity and non-specific binding on the support. CNR is commonly defined as: CNR=(Signal-Background)/Noise. The background term is commonly taken to be the signal measured for the interstitial regions surrounding a particular feature (diffraction limited spot, DLS) in a specified region of interest (ROI). While signal-to-noise ratio (SNR) is often considered to be a benchmark of overall signal quality, it can be shown that improved CNR can provide a significant advantage over SNR as a benchmark for signal quality in applications that require rapid image capture (e.g., sequencing applications for which cycle times must be minimized), as shown in the example below. At high CNR the imaging time required to reach accurate discrimination (and thus accurate base-calling in the case of sequencing applications) can be drastically reduced even with moderate improvements in CNR. Improved CNR in imaging data on the imaging integration time provides a method for more accurately detecting features such as clonally-amplified nucleic acid colonies on the support surface.
[0837] In most ensemble-based sequencing approaches, the background term is typically measured as the signal associated with 'interstitial' regions. In addition to "interstitial" background (Binter ), "intrastitial" background (Bintra ) exists within the region occupied by an amplified DNA colony. The combination of these two background signals dictates the achievable CNR, and subsequently directly impacts the optical instrument requirements, architecture costs, reagent costs, run-times, cost/genome, and ultimately the accuracy and data quality for cyclic array-based sequencing applications. The Binter background signal arises from a variety of sources; a few examples include autofluorescence from consumable flow cells, non-specific adsorption of detection molecules that yield spurious fluorescence signals that may obscure the signal from the ROI, the presence of non-specific DNA amplification products (e.g., those arising from primer dimers). In typical next generation sequencing (NGS) applications, this background signal in the current field-of-view (FOV) is averaged over time and subtracted. The signal arising from individual DNA colonies (i.e., (Signal)-B(interstial) in the FOV) yields a discernable feature that can be classified. In some embodiments, the intrastitial background (B(intrastitial)) can contribute a confounding fluorescence signal that is not specific to the target of interest, but is present in the same ROI thus making it far more difficult to average and subtract.
[0838] Nucleic acid amplification on the low-binding coated supports described herein may decrease the B(interstitial) background signal by reducing non-specific binding, may lead to improvements in specific nucleic acid amplification, and may lead to a decrease in non-specific amplification that can impact the background signal arising from both the interstitial and intrastitial regions. In some embodiments, the disclosed low-binding coated supports, optionally used in combination with the disclosed hybridization and/or amplification reaction formulations, may lead to improvements in CNR by a factor of 2, 5, 10, 100, 250, 500 or 1000-fold over those achieved using conventional supports and hybridization, amplification, and/or sequencing protocols. Although described here in the context of using fluorescence imaging as the read-out or detection mode, the same principles apply to the use of the disclosed low-binding coated supports and nucleic acid hybridization and amplification formulations for other detection modes as well, including both optical and non-optical detection modes.
[0839] The headings provided herein are not limitations of the various aspects of the disclosure, which aspects can be understood by reference to the specification as a whole.
[0840] Unless defined otherwise, technical and scientific terms used herein have meanings that are commonly understood by those of ordinary skill in the art unless defined otherwise. Generally, terminologies pertaining to techniques of molecular biology, nucleic acid chemistry, protein chemistry, genetics, microbiology, transgenic cell production, and hybridization described herein are those well-known and commonly used in the art. Techniques and procedures described herein are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the instant specification. For example, see Sambrook et al., Molecular Cloning: A Laboratory Manual (Third ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 2000). See also Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates (1992). The nomenclatures utilized in connection with, and the laboratory procedures and techniques described herein are those well-known and commonly used in the art.
[0841] Unless otherwise required by context herein, singular terms shall include pluralities and plural terms shall include the singular. Singular forms “a”, “an” and “the”, and singular use of any word, include plural referents unless expressly and unequivocally limited on one referent.
[0842] It is understood the use of the alternative term (e.g., “or”) is taken to mean either one or both or any combination thereof of the alternatives.
[0843] The term “and/or” used herein is to be taken mean specific disclosure of each of the specified features or components with or without the other. For example, the term “and/or” as used in a phrase such as “A and/or B” herein is intended to include: “A and B”; “A or B”; “A” (A alone); and “B” (B alone). In a similar manner, the term “and/or” as used in a phrase such as “A, B, and/or C” is intended to encompass each of the following aspects: “A, B, and C”; “A, B, or C”; “A or C”; “A or B”; “B or C”; “A and B”; “B and C”; “A and C”; “A” (A alone); “B” (B alone); and “C” (C alone).
[0844] As used herein and in the appended claims, terms “comprising”, “including”, “having” and “containing”, and their grammatical variants, as used herein are intended to be non-limiting so that one item or multiple items in a list do not exclude other items that can be substituted or added to the listed items. It is understood that wherever aspects are described herein with the language “comprising,” otherwise analogous aspects described in terms of “consisting of’ and/or “consisting essentially of’ are also provided.
[0845] As used herein, the terms “about,” “approximately,” and “substantially” refer to a value or composition that is within an acceptable error range for the particular value or composition as determined by one of ordinary skill in the art, which will depend in part on how the value or composition is measured or determined, i.e., the limitations of the measurement system. For example, “about,” “approximately,” or “substantially ” can mean within one or more than one standard deviation per the practice in the art. Alternatively, “about” or “approximately” can mean a range of up to 10% (i.e., ±10%) or more depending on the limitations of the measurement system. For example, about 5 mg can include any number between 4.5 mg and 5.5 mg. Furthermore, particularly with respect to biological systems or processes, the terms can mean up to an order of magnitude or up to 5-fold of a value. When particular values or compositions are provided in the instant disclosure, unless otherwise stated, the meaning of “about,” “approximately,” “substantially” should be assumed to be within an acceptable error range for that particular value or composition. Also, where ranges and/or subranges of values are provided, the ranges and/or subranges can include the endpoints of the ranges and/or subranges.
[0846] The term “polony” used herein refers to a nucleic acid library molecule can be clonally amplified in-solution or on-support to generate an amplicon that can serve as a template molecule for sequencing. In some embodiments, a linear library molecule can be circularized to generate a circularized library molecule, and the circularized library molecule can be clonally amplified in-solution or on-support to generate a concatemer. In some embodiments, the concatemer can serve as a nucleic acid template molecule which can be sequenced. The concatemer is sometimes referred to as a polony. In some embodiments, a polony includes nucleotide strands. Although embodiments disclosed herein focuses on polonies in 2D or 3D samples immobilized on the flow cell and their flow cell images, such embodiments may also be applicable on clusters (e.g., generated by sequencing by synthesis (SBS)) in 2D or 3D samples immobilized on the flow cell and their corresponding flow cell images. In some embodiments, the polonies or clusters may look like bright spots with various sizes and shapes in flow cell images.
[0847] The terms "peptide", "polypeptide" and "protein" and other related terms used herein are used interchangeably and refer to a polymer of amino acids and are not limited to any particular length. Polypeptides may comprise natural and non-natural amino acids. Polypeptides include recombinant or chemically-synthesized forms. Polypeptides also include precursor molecules that have not yet been subjected to post-translation modification such as proteolytic cleavage, cleavage due to ribosomal skipping, hydroxylation, methylation, lipidation, acetylation, SUMOylation, ubiquitination, glycosylation, phosphorylation and/or disulfide bond formation. These terms encompass native and artificial proteins, protein fragments and polypeptide analogs (such as muteins, variants, chimeric proteins and fusion proteins) of a protein sequence as well as post- translationally, or otherwise covalently or non-covalently, modified proteins.
[0848] The term “polymerase” and its variants, as used herein, comprises any enzyme that can catalyze polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically but not necessarily such nucleotide polymerization can occur in a template-dependent fashion. Typically, a polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. In some embodiments, a polymerase includes other enzymatic activities, such as for example, 3' to 5' exonuclease activity or 5' to 3' exonuclease activity. In some embodiments, a polymerase has strand displacing activity. A polymerase can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze nucleotide polymerization (e.g., catalytically active fragment). In some embodiments, a polymerase can be isolated from a cell, or generated using recombinant DNA technology or chemical synthesis methods. In some embodiments, a polymerase can be expressed in prokaryote, eukaryote, viral, or phage organisms. In some embodiments, a polymerase can be post-translationally modified proteins or fragments thereof. A polymerase can be derived from a prokaryote, eukaryote, virus or phage. A polymerase comprises DNA-directed DNA polymerase and RNA-directed DNA polymerase.
[0849] As used herein, the term “fidelity” refers to the accuracy of DNA polymerization by template-dependent DNA polymerase. The fidelity of a DNA polymerase is typically measured by the error rate (the frequency of incorporating an inaccurate nucleotide, i.e., a nucleotide that is not complementary to the template nucleotide). The accuracy or fidelity of DNA polymerization is maintained by both the polymerase activity and the 3'-5' exonuclease activity of a DNA polymerase.
[0850] As used herein, the term “binding complex” refers to a complex formed by binding together a nucleic acid duplex, a polymerase, and a free nucleotide or a nucleotide unit of a multivalent molecule, where the nucleic acid duplex comprises a nucleic acid template molecule hybridized to a nucleic acid primer. In the binding complex, the free nucleotide or nucleotide unit may or may not be bound to the 3’ end of the nucleic acid primer at a position that is opposite a complementary nucleotide in the nucleic acid template molecule. A “ternary complex” is an example of a binding complex which is formed by binding together a nucleic acid duplex, a polymerase, and a free nucleotide or nucleotide unit of a multivalent molecule, where the free nucleotide or nucleotide unit is bound to the 3’ end of the nucleic acid primer (as part of the nucleic acid duplex) at a position that is opposite a complementary nucleotide in the nucleic acid template molecule.
[0851] The term “persistence time” and related terms refers to the length of time that a binding complex remains stable without dissociation of any of the components, where the components of the binding complex include a nucleic acid template and nucleic acid primer, a polymerase, a nucleotide unit of a multivalent molecule or a free (e.g., unconjugated) nucleotide. The nucleotide unit or the free nucleotide can be complementary or non-complementary to a nucleotide residue in the template molecule. The nucleotide unit or the free nucleotide can bind to the 3’ end of the nucleic acid primer at a position that is opposite a complementary nucleotide residue in the nucleic acid template molecule. The persistence time is indicative of the stability of the binding complex and strength of the binding interactions. Persistence time can be measured by observing the onset and/or duration of a binding complex, such as by observing a signal from a labeled component of the binding complex. For example, a labeled nucleotide or a labeled reagent comprising one or more nucleotides may be present in a binding complex, thus allowing the signal from the label to be detected during the persistence time of the binding complex. One exemplary label is a fluorescent label. The binding complex (e.g., ternary complex) remains stable until subjected to a condition that causes dissociation of interactions between any of the polymerase, template molecule, primer and/or the nucleotide unit or the nucleotide. For example, a dissociating condition comprises contacting the binding complex with any one or any combination of a detergent, EDTA and/or water.
[0852] The terms “nucleic acid”, "polynucleotide" and "oligonucleotide" and other related terms used herein are used interchangeably and refer to polymers of nucleotides and are not limited to any particular length. Nucleic acids include recombinant and chemically-synthesized forms. Nucleic acids include DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), analogs of the DNA or RNA generated using nucleotide analogs (e.g., peptide nucleic acids and non-naturally occurring nucleotide analogs), and chimeric forms containing DNA and RNA. Nucleic acids can be single-stranded or double-stranded. Nucleic acids comprise polymers of nucleotides, where the nucleotides include natural or non-natural bases and/or sugars. Nucleic acids comprise naturally-occurring internucleosidic linkages, for example phosphdiester linkages. Nucleic acids comprise non-natural intemucleoside linkages, including phosphorothioate, phosphorothiolate, or peptide nucleic acid (PNA) linkages. In some embodiments, nucleic acids comprise a one type of polynucleotides or a mixture of two or more different types of polynucleotides.
[0853] The term “primer” and related terms used herein refers to an oligonucleotide, either natural or synthetic, that is capable of hybridizing with a DNA and/or RNA polynucleotide template to form a duplex molecule. Primers may have any length, but typically range from 4-50 nucleotides. A typical primer comprises a 5’ end and 3’ end. The 3’ end of the primer can include a 3’ OH moiety which serves as a nucleotide polymerization initiation site in a polymerase-mediated primer extension reaction. Alternatively, the 3’ end of the primer can lack a 3’ OH moiety, or can include a terminal 3’ blocking group that inhibits nucleotide polymerization in a polymerase-mediated reaction. Any one nucleotide, or more than one nucleotide, along the length of the primer can be labeled with a detectable reporter moiety. A primer can be in solution (e.g., a soluble primer) or can be immobilized to a support (e.g., a capture primer). [0854] The term “template nucleic acid”, “template polynucleotide”, “target nucleic acid” “target polynucleotide”, “template strand” and other variations refer to a nucleic acid strand that serves as the basis nucleic acid molecule for generating a complementary nucleic acid strand. The template nucleic acid can be single-stranded or double-stranded, or the template nucleic acid can have single-stranded or double-stranded portions. The sequence of the template nucleic acid can be partially or wholly complementary to the sequence of the complementary strand. The template nucleic acid can be obtained from a naturally-occurring source, recombinant form, or chemically synthesized to include any type of nucleic acid analog. The template nucleic acid can be linear, circular, or other forms. The template nucleic acids can include an insert region having an insert sequence which is also known as a sequence of interest. The template nucleic acids can also include at least one adaptor sequence. The template nucleic acid can be a concatemer having two or tandem copies of a sequence of interest and at least one adaptor sequence. The insert region can be isolated in any form, including chromosomal, genomic, organellar (e.g., mitochondrial, chloroplast or ribosomal), recombinant molecules, cloned, amplified, cDNA, RNA such as precursor mRNA or mRNA, oligonucleotides, whole genomic DNA, obtained from fresh frozen paraffin embedded tissue, needle biopsies, cell free circulating DNA, or any type of nucleic acid library. The insert region can be isolated from any source including from organisms such as prokaryotes, eukaryotes (e.g., humans, plants and animals), fungus, viruses cells, tissues, normal or diseased cells or tissues, body fluids including blood, urine, serum, lymph, tumor, saliva, anal and vaginal secretions, amniotic samples, perspiration, semen, environmental samples, culture samples, or synthesized nucleic acid molecules prepared using recombinant molecular biology or chemical synthesis methods. The insert region can be isolated from any organ, including head, neck, brain, breast, ovary, cervix, colon, rectum, endometrium, gallbladder, intestines, bladder, prostate, testicles, liver, lung, kidney, esophagus, pancreas, thyroid, pituitary, thymus, skin, heart, larynx, or other organs. The template nucleic acid can be subjected to nucleic acid analysis, including sequencing and composition analysis.
[0855] When used in reference to nucleic acid molecules, the terms “hybridize” or “hybridizing” or “hybridization” or other related terms refers to hydrogen bonding between two different nucleic acids to form a duplex nucleic acid. Hybridization also includes hydrogen bonding between two different regions of a single nucleic acid molecule to form a self-hybridizing molecule having a duplex region. Hybridization can comprise Watson-Crick or Hoogstein binding to form a duplex double-stranded nucleic acid, or a double-stranded region within a nucleic acid molecule. The double-stranded nucleic acid, or the two different regions of a single nucleic acid, may be wholly complementary, or partially complementary. Complementary nucleic acid strands need not hybridize with each other across their entire length. The complementary base pairing can be the standard A-T or C-G base pairing, or can be other forms of base-pairing interactions. Duplex nucleic acids can include mismatched base-paired nucleotides.
[0856] The term “nucleotides” and related terms refers to a molecule comprising an aromatic base, a five carbon sugar (e.g., ribose or deoxyribose), and at least one phosphate group. Canonical or non-canonical nucleotides are consistent with use of the term. The phosphate in some embodiments comprises a monophosphate, diphosphate, or triphosphate, or corresponding phosphate analog. In some embodiments, the nucleotide comprises 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 phosphate groups. The term “nucleoside” refers to a molecule comprising an aromatic base and a sugar.
[0857] Nucleotides (and nucleosides) typically comprise a hetero cyclic base including substituted or unsubstituted nitrogen-containing parent heteroaromatic ring which are commonly found in nucleic acids, including naturally-occurring, substituted, modified, or engineered variants, or analogs of the same. The base of a nucleotide (or nucleoside) is capable of forming Watson-Crick and/or Hoogstein hydrogen bonds with an appropriate complementary base. Exemplary bases include, but are not limited to, purines and pyrimidines such as: 2-aminopurine, 2,6-diaminopurine, adenine (A), ethenoadenine, N6- A2-isopentenyladenine (6iA), N6-A2-isopentenyl-2-methylthioadenine (2ms6iA), N6- methyladenine, guanine (G), isoguanine, N2-dimethylguanine (dmG), 7-methylguanine (7mG), 2-thiopyrimidine, 6-thioguanine (6sG), hypoxanthine and O6-methylguanine; 7- deaza-purines such as 7-deazaadenine (7-deaza-A) and 7-deazaguanine (7-deaza-G); pyrimidines such as cytosine (C), 5-propynylcytosine, isocytosine, thymine (T), 4- thiothymine (4sT), 5,6-dihydrothymine, O4-methylthymine, uracil (U), 4-thiouracil (4sU) and 5,6-dihydrouracil (dihydrouracil; D); indoles such as nitroindole and 4-methylindole; pyrroles such as nitropyrrole; nebularine; inosines; hydroxymethylcytosines; 5- methycytosines; base (Y); as well as methylated, glycosylated, and acylated base moi eties; and the like. Additional exemplary bases can be found in Fasman, 1989, in “Practical Handbook of Biochemistry and Molecular Biology”, pp. 385-394, CRC Press, Boca Raton, Fla.
[0858] Nucleotides (and nucleosides) typically comprise a sugar moiety, such as carbocyclic moiety (Ferraro and Gotor 2000 Chem. Rev. 100: 4319-48), acyclic moieties (Martinez, et al., 1999 Nucleic Acids Research 27: 1271-1274; Martinez, et al., 1997 Bioorganic & Medicinal Chemistry Letters vol. 7: 3013-3016), and other sugar moieties (Joeng, et al., 1993 J. Med. Chem. 36: 2627-2638; Kim, et al., 1993 J. Med. Chem. 36: 30-7; Eschenmosser 1999 Science 284:2118-2124; and U.S. Pat. No. 5,558,991). The sugar moiety comprises: ribosyl; 2'-deoxyribosyl; 3 '-deoxyribosyl; 2', 3 '-dideoxyribosyl; 2',3'-didehydrodideoxyribosyl; 2'-alkoxyribosyl; 2'-azidoribosyl; 2'-aminoribosyl; 2'- fluororibosyl; 2'-mercaptoriboxyl; 2'-alkylthioribosyl; 3 '-alkoxyribosyl; 3 '-azidoribosyl; 3 '-aminoribosyl; 3 '-fluororibosyl; 3'-mercaptoriboxyl; 3 '-alkylthioribosyl carbocyclic; acyclic or other modified sugars.
[0859] In some embodiments, nucleotides comprise a chain of one, two or three phosphorus atoms where the chain is typically attached to the 5’ carbon of the sugar moiety via an ester or phosphoramide linkage. In some embodiments, the nucleotide is an analog having a phosphorus chain in which the phosphorus atoms are linked together with intervening O, S, NH, methylene or ethylene. In some embodiments, the phosphorus atoms in the chain include substituted side groups including O, S or BH3. In some embodiments, the chain includes phosphate groups substituted with analogs including phosphoramidate, phosphorothioate, phosphordithioate, and O-methylphosphoroamidite groups.
[0860] When used in reference to nucleic acids, the terms “extend”, “extending”, “extension” and other variants, refers to incorporation of one or more nucleotides into a nucleic acid molecule. Nucleotide incorporation comprises polymerization of one or more nucleotides into the terminal 3’ OH end of a nucleic acid strand, resulting in extension of the nucleic acid strand. Nucleotide incorporation can be conducted with natural nucleotides and/or nucleotide analogs. Typically, but not necessarily, nucleotide incorporation occurs in a template-dependent fashion. Any suitable method of extending a nucleic acid molecule may be used, including primer extension catalyzed by a DNA polymerase or RNA polymerase.
[0861] The term “reporter moiety”, “reporter moieties” or related terms refers to a compound that generates, or causes to generate, a detectable signal. A reporter moiety is sometimes called a “label”. Any suitable reporter moiety may be used, including luminescent, photoluminescent, electroluminescent, bioluminescent, chemiluminescent, fluorescent, phosphorescent, chromophore, radioisotope, electrochemical, mass spectrometry, Raman, hapten, affinity tag, atom, or an enzyme. A reporter moiety generates a detectable signal resulting from a chemical or physical change (e.g., heat, light, electrical, pH, salt concentration, enzymatic activity, or proximity events). A proximity event includes two reporter moieties approaching each other, or associating with each other, or binding each other. It is well known to one skilled in the art to select reporter moieties so that each absorbs excitation radiation and/or emits fluorescence at a wavelength distinguishable from the other reporter moieties to permit monitoring the presence of different reporter moieties in the same reaction or in different reactions. Two or more different reporter moieties can be selected having spectrally distinct emission profiles, or having minimal overlapping spectral emission profiles. Reporter moieties can be linked (e.g., operably linked) to nucleotides, nucleosides, nucleic acids, enzymes (e.g., polymerases or reverse transcriptases), or support (e.g., surfaces).
[0862] A reporter moiety (or label) comprises a fluorescent label or a fluorophore. Exemplary fluorescent moieties which may serve as fluorescent labels or fluorophores include, but are not limited to fluorescein and fluorescein derivatives such as carboxyfluorescein, tetrachlorofluorescein, hexachlorofluorescein, carboxynapthofluorescein, fluorescein isothiocyanate, NHS-fluorescein, iodoacetamidofluorescein, fluorescein maleimide, SAMSA-fluorescein, fluorescein thiosemicarbazide, carbohydrazinomethylthioacetyl-amino fluorescein, rhodamine and rhodamine derivatives such as TRITC, TMR, lissamine rhodamine, Texas Red, rhodamine B, rhodamine 6G, rhodamine 10, NHS-rhodamine, TMR-iodoacetamide, lissamine rhodamine B sulfonyl chloride, lissamine rhodamine B sulfonyl hydrazine, Texas Red sulfonyl chloride, Texas Red hydrazide, coumarin and coumarin derivatives such as AMCA, AMCA-NHS, AMCA-sulfo-NHS, AMCA-HPDP, DCIA, AMCE- hydrazide, BODIPY and derivatives such as BODIPY FL C3-SE, BODIPY 530/550 C3, BODIPY 530/550 C3-SE, BODIPY 530/550 C3 hydrazide, BODIPY 493/503 C3 hydrazide, BODIPY FL C3 hydrazide, BODIPY FL IA, BODIPY 530/551 IA, Br- BODIPY 493/503, Cascade Blue and derivatives such as Cascade Blue acetyl azide, Cascade Blue cadaverine, Cascade Blue ethylenediamine, Cascade Blue hydrazide, Lucifer Yellow and derivatives such as Lucifer Yellow iodoacetamide, Lucifer Yellow CH, cyanine and derivatives such as indolium based cyanine dyes, benzo-indolium based cyanine dyes, pyridium based cyanine dyes, thiozolium based cyanine dyes, quinolinium based cyanine dyes, imidazolium based cyanine dyes, Cy 3, Cy5, lanthanide chelates and derivatives such as BCPDA, TBP, TMT, BHHCT, BCOT, Europium chelates, Terbium chelates, Alexa Fluor dyes, DyLight dyes, Atto dyes, LightCycler Red dyes, CAL Flour dyes, JOE and derivatives thereof, Oregon Green dyes, WellRED dyes, IRD dyes, phycoerythrin and phycobilin dyes, Malachite green, stilbene, DEG dyes, NR dyes, nearinfrared dyes and others known in the art such as those described in Haugland, Molecular Probes Handbook, (Eugene, Oreg.) 6th Edition; Lakowicz, Principles of Fluorescence Spectroscopy, 2nd Ed., Plenum Press New York (1999), or Hermanson, Bioconjugate Techniques, 2nd Edition, or derivatives thereof, or any combination thereof. Cyanine dyes may exist in either sulfonated or non-sulfonated forms, and consist of two indolenin, benzo-indolium, pyridium, thiozolium, and/or quinolinium groups separated by a polymethine bridge between two nitrogen atoms. Commercially available cyanine fluorophores include, for example, Cy3, (which may comprise l-[6-(2,5-dioxopyrrolidin- l-yloxy)-6-oxohexyl]-2-(3-{ l-[6-(2,5-dioxopyrrolidin-l-yloxy)-6-oxohexyl]-3,3- dimethyl-l,3-dihydro-2H-indol-2-ylidene}prop-l-en-l-yl)-3,3-dimethyl-3H-indolium or l-[6-(2,5-dioxopyrrolidin-l-yloxy)-6-oxohexyl]-2-(3-{ l-[6-(2,5-dioxopyrrolidin-l- yloxy)-6-oxohexyl]-3,3-dimethyl-5-sulfo-l,3-dihydro-2H-indol-2-ylidene}prop-l-en-l- yl)-3,3-dimethyl-3H-indolium-5-sulfonate), Cy5 (which may comprise l-(6-((2,5- dioxopyrrolidin-l-yl)oxy)-6-oxohexyl)-2-((lE,3E)-5-((E)-l-(6-((2,5-dioxopyrrolidin-l- yl)oxy)-6-oxohexyl)-3,3-dimethyl-5-indolin-2-ylidene)penta-l,3-dien-l-yl)-3,3-dimethyl- 3H-indol-l-ium or l-(6-((2,5-dioxopyrrolidin-l-yl)oxy)-6-oxohexyl)-2-((lE,3E)-5-((E)-l- (6-((2,5-dioxopyrrolidin-l-yl)oxy)-6-oxohexyl)-3,3-dimethyl-5-sulfoindolin-2- ylidene)penta-l,3-dien-l-yl)-3,3-dimethyl-3H-indol-l-ium-5-sulfonate), and Cy7 (which may comprise l-(5-carboxypentyl)-2-[(lE,3E,5E,7Z)-7-(l-ethyl-l,3-dihydro-2H-indol-2- ylidene)hepta-l,3,5-trien-l-yl]-3H-indolium or l-(5-carboxypentyl)-2-[(lE,3E,5E,7Z)-7- (l-ethyl-5-sulfo-l,3-dihydro-2H-indol-2-ylidene)hepta-l,3,5-trien-l-yl]-3H-indolium-5- sulfonate), where “Cy” stands for 'cyanine', and the first digit identifies the number of carbon atoms between two indolenine groups. Cy2 which is an oxazole derivative rather than indolenin, and the benzo-derivatized Cy3.5, Cy5.5 and Cy7.5 are exceptions to this rule. [0863] In some embodiments, the reporter moiety can be a FRET pair, such that multiple classifications can be performed under a single excitation and imaging step. As used herein, FRET may comprise excitation exchange (Forster) transfers, or electron-exchange (Dexter) transfers.
[0864] The terms “linked”, “joined”, “attached”, and variants thereof comprise any type of fusion, bond, adherence or association between any combination of compounds or molecules that is of sufficient stability to withstand use in the particular procedure. The procedure can include but are not limited to: nucleotide transient-binding; nucleotide incorporation; de-blocking; washing; removing; flowing; detecting; imaging and/or identifying. Such linkage can comprise, for example, covalent, ionic, hydrogen, dipoledipole, hydrophilic, hydrophobic, or affinity bonding, bonds or associations involving van der Waals forces, mechanical bonding, and the like. In some embodiments, such linkage occurs intramolecularly, for example linking together the ends of a single-stranded or double-stranded linear nucleic acid molecule to form a circular molecule. In some embodiments, such linkage can occur between a combination of different molecules, or between a molecule and a non-molecule, including but not limited to: linkage between a nucleic acid molecule and a solid surface; linkage between a protein and a detectable reporter moiety; linkage between a nucleotide and detectable reporter moiety; and the like. Some examples of linkages can be found, for example, in Hermanson, G., “Bioconjugate Techniques”, Second Edition (2008); Aslam, M., Dent, A., “Bioconjugation: Protein Coupling Techniques for the Biomedical Sciences”, London: Macmillan (1998); Aslam, M., Dent, A., “Bioconjugation: Protein Coupling Techniques for the Biomedical Sciences”, London: Macmillan (1998).
[0865] The term “operably linked” and “operably joined” or related terms as used herein refers to juxtaposition of components. The juxtapositioned components can be linked together covalently. For example, two nucleic acid components can be enzymatically ligated together where the linkage that joins together the two components comprises phosphodiester linkage. A first and second nucleic acid component can be linked together, where the first nucleic acid component can confer a function on a second nucleic acid component. For example, linkage between a primer binding sequence and a sequence of interest forms a nucleic acid library molecule having a portion that can bind to a primer. In another example, a transgene (e.g., a nucleic acid encoding a polypeptide or a nucleic acid sequence of interest) can be ligated to a vector where the linkage permits expression or functioning of the transgene sequence contained in the vector. In some embodiments, a transgene is operably linked to a host cell regulatory sequence (e.g., a promoter sequence) that affects expression of the transgene. In some embodiments, the vector comprises at least one host cell regulatory sequence, including a promoter sequence, enhancer, transcription and/or translation initiation sequence, transcription and/or translation termination sequence, polypeptide secretion signal sequences, and the like. In some embodiments, the host cell regulatory sequence controls expression of the level, timing and/or location of the transgene.
[0866] The term “adaptor” and related terms refers to oligonucleotides that can be operably linked (appended) to a target polynucleotide, where the adaptor confers a function to the co-joined adaptor-target molecule. Adaptors comprise DNA, RNA, chimeric DNA/RNA, or analogs thereof. Adaptors can include at least one ribonucleoside residue. Adaptors can be single-stranded, double-stranded, or have single-stranded and/or double-stranded portions. Adaptors can be configured to be linear, stem-looped, hairpin, or Y-shaped forms. Adaptors can be any length, including 4-100 nucleotides or longer. Adaptors can have blunt ends, overhang ends, or a combination of both. Overhang ends include 5’ overhang and 3’ overhang ends. The 5’ end of a single-stranded adaptor, or one strand of a double-stranded adaptor, can have a 5’ phosphate group or lack a 5’ phosphate group. Adaptors can include a 5’ tail that does not hybridize to a target polynucleotide (e.g., tailed adaptor), or adaptors can be non-tailed. An adaptor can include a sequence that is complementary to at least a portion of a primer, such as an amplification primer, a sequencing primer, or a capture primer (e.g., soluble or immobilized capture primers). Adaptors can include a random sequence or degenerate sequence. Adaptors can include at least one inosine residue. Adaptors can include at least one phosphorothioate, phosphorothiolate and/or phosphoramidate linkage. Adaptors can include a barcode sequence which can be used to distinguish polynucleotides (e.g., insert sequences) from different sample sources in a multiplex assay. Adaptors can include a unique identification sequence (e.g., unique molecular index, UMI; or a unique molecular tag) that can be used to uniquely identify a nucleic acid molecule to which the adaptor is appended. In some embodiments, a unique identification sequence can be used to increase error correction and accuracy, reduce the rate of false-positive variant calls and/or increase sensitivity of variant detection. Adaptors can include at least one restriction enzyme recognition sequence, including any one or any combination of two or more selected from a group consisting of type I, type II, type III, type IV, type Hs or type IIB.
[0867] The term “universal sequence”, “universal adaptor sequences” and related terms refers to a sequence in a nucleic acid molecule that is common among two or more polynucleotide molecules. For example, adaptors having the same universal sequence can be joined to a plurality of polynucleotides so that the population of co-joined molecules carry the same universal adaptor sequence. Examples of universal adaptor sequences include an amplification primer sequence, a sequencing primer sequence or a capture primer sequence (e.g., soluble or support-immobilized capture primers).
[0868] The present disclosure provides a plurality (e.g., two or more) of nucleic acid templates immobilized to a support. In some embodiments, the immobilized plurality of nucleic acid templates have the same sequence or have different sequences. In some embodiments, individual nucleic acid template molecules in the plurality of nucleic acid templates are immobilized to a different site on the support. In some embodiments, two or more individual nucleic acid template molecules in the plurality of nucleic acid templates are immobilized to a site on the support. In some embodiments, the support comprises a plurality of sites arranged in an array. The term “array” refers to a support comprising a plurality of sites located at pre-determined locations on the support to form an array of sites. The sites can be discrete and separated by interstitial regions. In some embodiments, the pre-determined sites on the support can be arranged in one dimension in a row or a column, or arranged in two dimensions in rows and columns. In some embodiments, the plurality of pre-determined sites is arranged on the support in an organized fashion. In some embodiments, the plurality of pre-determined sites is arranged in any organized pattern, including rectilinear, hexagonal patterns, grid patterns, patterns having reflective symmetry, patterns having rotational symmetry, or the like. The pitch between different pairs of sites can be that same or can vary. In some embodiments, the support can have nucleic acid template molecules immobilized at a plurality of sites at a surface density of about 102 - 1015 sites per mm2, or more, to form a nucleic acid template array. In some embodiments, the support comprises at least 102 sites, at least 103 sites, at least 104 sites, at least 105 sites, at least 106 sites, at least 107 sites, at least 108 sites, at least 109 sites, at least 1010 sites, at least 1011 sites, at least 1012 sites, at least 1013 sites, at least 1014 sites, at least 1015 sites, or more, where the sites are located at pre-determined locations on the support. In some embodiments, a plurality of pre-determined sites on the support (e.g., 102 - 1015 sites or more) are immobilized with nucleic acid templates to form a nucleic acid template array. In some embodiments, the nucleic acid templates that are immobilized at a plurality of pre-determined sites by hybridization to immobilized surface capture primers, or the nucleic acid templates are covalently attached to the surface capture primers. In some embodiments, the nucleic acid templates that are immobilized at a plurality of pre-determined sites, for example immobilized at 102 - 1015 sites or more. In some embodiments, the nucleic acid templates that are immobilized at a plurality of sites on the support comprise linear or circular nucleic acid template molecules or a mixture of both linear and circular molecules. In some embodiments, the immobilized nucleic acid templates are clonally-amplified to generate immobilized nucleic acid polonies at the plurality of pre-determined sites. In some embodiments, individual immobilized nucleic acid template molecules comprise one copy of a target sequence of interest, or comprise concatemers having two or more tandem copies of a target sequence of interest.
[0869] In some embodiments, a support comprising a plurality of sites located at random locations on the support is referred to herein as a support having randomly located sites thereon. The location of the randomly located sites on the support are not pre-determined. The plurality of randomly-located sites is arranged on the support in a disordered and/or unpredictable fashion. In some embodiments, the support comprises at least 102 sites, at least 103 sites, at least 104 sites, at least 105 sites, at least 106 sites, at least 107 sites, at least 108 sites, at least 109 sites, at least 1010 sites, at least 1011 sites, at least 1012 sites, at least 1013 sites, at least 1014 sites, at least 1015 sites, or more, where the sites are randomly located on the support. In some embodiments, a plurality of randomly located sites on the support (e.g., 102 - 1015 sites or more) are immobilized with nucleic acid templates to form a support immobilized with nucleic acid templates. In some embodiments, the nucleic acid templates that are immobilized at a plurality of randomly located sites by hybridization to immobilized surface capture primers, or the nucleic acid templates are covalently attached to the surface capture primer. In some embodiments, the nucleic acid templates that are immobilized at a plurality of randomly located sites, for example immobilized at 102 - 1015 sites or more. In some embodiments, the nucleic acid templates that are immobilized at a plurality of sites on the support comprise linear or circular nucleic acid template molecules or a mixture of both linear and circular molecules. In some embodiments, the immobilized nucleic acid templates are clonally- amplified to generate immobilized nucleic acid polonies at the plurality of randomly located sites. In some embodiments, individual immobilized nucleic acid template molecules comprise one copy of a target sequence of interest, or comprise concatemers having two or more tandem copies of a target sequence of interest.
[0870] In some embodiments, with respect to nucleic acid template molecules immobilized to pre-determined or random sites on the support, the plurality of immobilized nucleic acid template molecules on the support are in fluid communication with each other to permit flowing a solution of reagents (e.g., enzymes including polymerases, multivalent molecules, nucleotides, divalent cations and/or buffers and the like) onto the support so that the plurality of immobilized nucleic acid template molecules on the support can be reacted with the reagents in a massively parallel manner. In some embodiments, the fluid communication of the plurality of immobilized nucleic acid template molecules can be used to conduct nucleotide binding assays and/or conduct nucleotide polymerization reactions (e.g., primer extension or sequencing) on the plurality of immobilized nucleic acid template molecules, and to conduct detection and imaging for massively parallel sequencing. In some embodiments, the term “immobilized” and related terms refer to nucleic acid molecules or enzymes (e.g., polymerases) that are attached to the support at pre-determined or random locations, where the nucleic acid molecules or enzymes are attached directly to a support through covalent bond or non-covalent interaction, or the nucleic acid molecules or enzymes are attached to a coating on the support.
[0871] When used in reference to a low binding surface coating, one or more layers of a multi-layered surface coating may comprise a branched polymer or may be linear. Examples of suitable branched polymers include, but are not limited to, branched PEG, branched poly(vinyl alcohol) (branched PVA), branched poly(vinyl pyridine), branched poly(vinyl pyrrolidone) (branched PVP), branched ), poly(acrylic acid) (branched PAA), branched polyacrylamide, branched poly(N-isopropylacrylamide) (branched PNIPAM), branched poly(methyl methacrylate) (branched PMA), branched poly(2 -hydroxylethyl methacrylate) (branched PHEMA), branched poly(oligo(ethylene glycol) methyl ether methacrylate) (branched POEGMA), branched polyglutamic acid (branched PGA), branched poly-lysine, branched poly-glucoside, and dextran.
[0872] In some embodiments, the branched polymers used to create one or more layers of any of the multi-layered surfaces disclosed herein may comprise at least 4 branches, at least 5 branches, at least 6 branches, at least 7 branches, at least 8 branches, at least 9 branches, at least 10 branches, at least 12 branches, at least 14 branches, at least 16 branches, at least 18 branches, at least 20 branches, at least 22 branches, at least 24 branches, at least 26 branches, at least 28 branches, at least 30 branches, at least 32 branches, at least 34 branches, at least 36 branches, at least 38 branches, or at least 40 branched.
[0873] Linear, branched, or multi-branched polymers used to create one or more layers of any of the multi-layered surfaces disclosed herein may have a molecular weight of at least 500, at least 1,000, at least 2,000, at least 3,000, at least 4,000, at least 5,000, at least 10,000, at least 15,000, at least 20,000, at least 25,000, at least 30,000, at least 35,000, at least 40,000, at least 45,000, or at least 50,000 daltons.
[0874] In some embodiments, e.g., wherein at least one layer of a multi-layered surface comprises a branched polymer, the number of covalent bonds between a branched polymer molecule of the layer being deposited and molecules of the previous layer may range from about one covalent linkage per molecule and about 32 covalent linkages per molecule. In some embodiments, the number of covalent bonds between a branched polymer molecule of the new layer and molecules of the previous layer may be at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 12, at least 14, at least 16, at least 18, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30, or at least 32 covalent linkages per molecule.
[0875] Any reactive functional groups that remain following the coupling of a material layer to the surface may optionally be blocked by coupling a small, inert molecule using a high yield coupling chemistry. For example, in the case that amine coupling chemistry is used to attach a new material layer to the previous one, any residual amine groups may subsequently be acetylated or deactivated by coupling with a small amino acid such as glycine.
[0876] The number of layers of low non-specific binding material, e.g., a hydrophilic polymer material, deposited on the surface, may range from 1 to about 10. In some embodiments, the number of layers is at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10. In some embodiments, the number of layers may be at most 10, at most 9, at most 8, at most 7, at most 6, at most 5, at most 4, at most 3, at most 2, or at most 1. Any of the lower and upper values described in this paragraph may be combined to form a range included within the present disclosure, for example, in some embodiments the number of layers may range from about 2 to about 4. In some embodiments, all of the layers may comprise the same material. In some embodiments, each layer may comprise a different material. In some embodiments, the plurality of layers may comprise a plurality of materials. In some embodiments at least one layer may comprise a branched polymer. In some embodiment, all of the layers may comprise a branched polymer.
[0877] One or more layers of low non-specific binding material may in some cases be deposited on and/or conjugated to the substrate surface using a polar protic solvent, a polar or polar aprotic solvent, a nonpolar solvent, or any combination thereof. In some embodiments the solvent used for layer deposition and/or coupling may comprise an alcohol (e.g., methanol, ethanol, propanol, etc.), another organic solvent (e.g., acetonitrile, dimethyl sulfoxide (DMSO), dimethyl formamide (DMF), etc.), water, an aqueous buffer solution (e.g., phosphate buffer, phosphate buffered saline, 3-(N- morpholino)propanesulfonic acid (MOPS), etc.), or any combination thereof. In some embodiments, an organic component of the solvent mixture used may comprise at least 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99% of the total, with the balance made up of water or an aqueous buffer solution. In some embodiments, an aqueous component of the solvent mixture used may comprise at least 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99% of the total, with the balance made up of an organic solvent. The pH of the solvent mixture used may be less than 6, about 6, 6.5, 7, 7.5, 8, 8.5, 9, or greater than pH 9.
[0878] The term “branched polymer” and related terms refers to a polymer having a plurality of functional groups that help conjugate a biologically active molecule such as a nucleotide, and the functional group can be either on the side chain of the polymer or directly attaches to a central core or central backbone of the polymer. The branched polymer can have linear backbone with one or more functional groups coming off the backbone for conjugation. The branched polymer can also be a polymer having one or more sidechains, wherein the side chain has a site suitable for conjugation. Examples of the functional group include but are limited to hydroxyl, ester, amine, carbonate, acetal, aldehyde, aldehyde hydrate, alkenyl, acrylate, methacrylate, acrylamide, active sulfone, hydrazide, thiol, alkanoic acid, acid halide, isocyanate, isothiocyanate, maleimide, vinylsulfone, dithiopyridine, vinylpyridine, iodoacetamide, epoxide, glyoxal, dione, mesylate, tosylate, and tresylate.
[0879] As used herein, the term “clonally amplified” and it variants refers to a nucleic acid template molecule that has been subjected to one or more amplification reactions either in-solution or on-support. In the case of in-solution amplified template molecules, the resulting amplicons are distributed onto the support. Prior to amplification, the template molecule comprises a sequence of interest and at least one universal adaptor sequence. In some embodiments, clonal amplification comprises the use of a polymerase chain reaction (PCR), multiple displacement amplification (MDA), transcription- mediated amplification (TMA), nucleic acid sequence-based amplification (NASBA), strand displacement amplification (SDA), real-time SDA, bridge amplification, isothermal bridge amplification, rolling circle amplification (RCA), circle-to-circle amplification, helicase-dependent amplification, recombinase-dependent amplification, single-stranded binding (SSB) protein-dependent amplification, or any combination thereof.
[0880] As used herein, the term “sequencing” and its variants comprise obtaining sequence information from a nucleic acid strand, typically by determining the identity of at least some nucleotides (including their nucleobase components) within the nucleic acid template molecule. While in some embodiments, “sequencing” a given region of a nucleic acid molecule includes identifying/reading each and every nucleotide within the region that is sequenced, in some embodiments “sequencing” comprises methods whereby the identity of only some of the nucleotides in the region is determined, while the identity of some nucleotides remains undetermined or incorrectly determined. Any suitable method of sequencing may be used. In an exemplary embodiment, sequencing can include label- free or ion based sequencing methods. In some embodiments, sequencing can include labeled or dye-containing nucleotide or fluorescent based nucleotide sequencing methods. In some embodiments, sequencing can include polony -based sequencing or bridge sequencing methods. In some embodiments, sequencing includes massively parallel sequencing platforms that employ sequence-by-synthesis, sequence-by-hybridization or sequence-by-binding procedures. Examples of massively parallel sequence-by-synthesis procedures include polony sequencing, pyrosequencing (e.g., from 454 Life Sciences; U.S. Patent Nos. 7,211,390, 7,244,559 and 7,264,929), chain-terminator sequencing (e.g., from Illumina; U.S. Patent No. 7,566,537; Bentley 2006 Current Opinion Genetics and Development 16:545-552; and Bentley, et al., 2008 Nature 456:53-59, ion-sensitive sequencing (e.g., from Ion Torrent), probe-anchor ligation sequencing (e.g., Complete Genomics), DNA nanoball sequencing, nanopore DNA sequencing. Examples of single molecule sequencing include Heliscope single molecule sequencing, and single molecule real time (SMRT) sequencing from Pacific Biosciences (Levene, et al., 2003 Science 299(5607):682-686; Eid, et al., 2009 Science 323(5910): 133-138; U.S. patent Nos. 7,170,050; 7,302,146; and 7,405,281). An example of sequence-by-hybridization includes SOLiD sequencing (e.g., from Life Technologies; WO 2006/084132). An example of sequence-by-binding includes Omniome sequencing (e.g., U.S patent No. 10,246,744).
Numbered embodiments
[0881] A sequencing system comprising: a first reconfigurable logic device comprising a first plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels; wherein the different combinations of the first plurality of data processing engines and the first reconfigurable routing channels are configured to perform one or more operations comprising:
(a) obtaining sensor data from one or more sensors of the sequencing system;
(b) processing the sensor data to generate a first plurality of flow cell images;
(c) predicting a second plurality of flow cell images using the neural network based on the sensor data or the first plurality of flow cell images;
(d) determining polonies from the second plurality of flow cell images; and
(e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[0882] A sequencing system comprising: a first reconfigurable logic device comprising a first plurality of data processing engines configured to perform data processing in parallel; first reconfigurable routing channels connecting at least some of the first plurality of data processing engines; a neural network deployed at least partly on the first reconfigurable logic device; a first processor that selectively activates or deactivates different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform one or more operations comprising:
(a) obtaining sensor data directly from one or more sensors of the sequencing system;
(b) processing the sensor data to generate a first plurality of flow cell images;
(c) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result;
(d) repetitively performing, for one or more times, down-sampling operations comprising:
(1) performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and
(2) performing a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result after (d);
(e) performing the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result;
(f) repetitively performing up sampling operations comprising:
(3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and
(4) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a sixth convolution result after (f);
(g) performing the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result;
(h) predicting a second plurality of flow cell images based on the seventh convolution result, wherein each of the second plurality of flow cell images corresponds to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is at least 2, 4, 6, 8, 10, 12, 16, or 32 times greater than the first resolution in one or more spatial dimensions;
(i) determining polonies in the second plurality of flow cell images;
(j) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and
(k) optionally forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor, or one or more hardware processors of the sequencing system.
[0883] A sequencing system comprising: a first reconfigurable logic device comprising a first plurality of data processing engines configured to perform data processing in parallel; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising:
(a) obtaining sensor data from one or more image sensors of the sequencing system;
(b) processing the sensor data to generate a first plurality of flow cell images; and
(c) communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit; a second processor or the first processor to control the integrated circuit to perform operations comprising:
(1) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device;
(2) predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both;
(3) determining polonies from the second plurality of flow cell images; and
(4) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[0884] A sequencing system comprising: a first reconfigurable logic device comprising a first plurality of data processing engines configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines; the different combinations of the first plurality of data processing engines configured to perform operations comprising:
(a) obtaining sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images; and
(b) communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit, wherein the integrated circuit performs operations comprising:
(1) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device; and
(2) predicting a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both; and
(3) communicating the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
[0885] A sequencing system comprising: a first reconfigurable logic device comprising a first plurality of data processing engines arranged in a first pipeline and configured to perform data processing in parallel with each other; an integrated circuit; a neural network deployed at least partly on the integrated circuit; a first processor of the first reconfigurable logic device to selectively activate or deactivate different combinations of the first plurality of data processing engines to perform operations comprising:
(a) obtaining sensor data from one or more sensors of the sequencing system;
(b) processing the sensor data to generate a first plurality of flow cell images; and
(c) communicating the sensor data, the first plurality of flow cell images, or both to the integrated circuit; wherein the integrated circuit performs operations comprising: (d) receiving the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device;
(e) performing a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result;
(f) repetitively performing, for one or more times, down-sampling operations comprising:
(1) performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and
(2) performing a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result;
(g) performing the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result;
(h) repetitively performing up sampling operations comprising:
(3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and
(4) performing the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a sixth convolution result;
(i) performing the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; and
(j) predicting a second plurality of flow cell images based on the seventh convolution result, wherein each of the second plurality of flow cell images corresponds to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is at least 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions.
[0886] A sequencing system comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising: (a) generating a training set comprising a plurality of training flow cell images or receiving the training set from one or more data storage devices of the sequencing system, the plurality of training flow cell images having a first spatial resolution;
(b) up-sampling the corresponding plurality of training flow cell images to generate a reference set comprising high resolution training flow cell images having a second resolution;
(c) generating a training output by inputting the training set to the neural network;
(d) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error; and
(e) generating a trained neural network with adjusted parameters.
[0887] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more electronic nodes that are programmable.
[0888] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more interconnects.
[0889] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more memory controllers.
[0890] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more network-on-chips (NoCs).
[0891] The sequencing system of any one of the embodiments, further comprising one or more direct data access (DMA) connections that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels.
[0892] The sequencing system of any one of the embodiments, further comprising one or more direct data access (DMA) units that are in data communication the first reconfigurable routing channels and the integrated circuit.
[0893] The sequencing system of any one of the embodiments, wherein the one or more DMA units are configured to actively request data from or actively sending data directly to: the first reconfigurable logic device; the first reconfigurable routing channels; the integrated circuit; or a combination thereof.
[0894] The sequencing system of any one of the embodiments, wherein the first reconfigurable logic device is configured to communicate data with one or more memory devices external thereto.
[0895] The sequencing system of any one of the embodiments, wherein the first reconfigurable logic device is configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels.
[0896] The sequencing system of any one of the embodiments, further comprising one or more memory devices electrically connected for data communication with one or more of the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; a second processor; and one or more processors of the sequencing system.
[0897] The sequencing system of any one of the embodiments, wherein the first reconfigurable logic device comprises a first integrated circuit forming a FPGA device.
[0898] The sequencing system of any one of the embodiments, wherein the integrated circuit comprises an application specific integrated circuit (ASIC) chip.
[0899] The sequencing system of any one of the embodiments, wherein the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip.
[0900] The sequencing system of any one of the embodiments, wherein one or more of the first reconfigurable logic device; the first reconfigurable routing channels; the first processor; and the one or more DMA connections are located on a first printed circuit board, and the integrated circuit is located on a second printed circuit board. [0901] The sequencing system of any one of the embodiments, wherein the integrated circuit comprises a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits.
[0902] The sequencing system of any one of the embodiments, wherein each of the first or second plurality of data processing engines comprises multiple digital logic circuits.
[0903] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
[0904] The sequencing system of any one of the embodiments, further comprising one or more direct memory access (DMA) connections.
[0905] The sequencing system of any one of the embodiments, wherein the DMA connections are configured to allow data communication based on a predetermined protocol.
[0906] The sequencing system of any one of the embodiments, wherein the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
[0907] The sequencing system of any one of the embodiments, wherein the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
[0908] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more of: a network-on-chip(NoC), and a memory controller.
[0909] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and an integrated circuit.
[0910] The sequencing system of any one of the embodiments, wherein the first reconfigurable routing channels and the one or more DMA connections are configured to allow data communication between the first reconfigurable logic device and an integrated circuit.
[0911] The sequencing system of any one of the embodiments, wherein the integrated circuit further comprises: second plurality of data processing engines. second routing channels, each connecting at least some of the second plurality of data processing engines.
[0912] The sequencing system of any one of the embodiments, wherein the first processor is configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations.
[0913] The sequencing system of any one of the embodiments, wherein the first processor or a second processor is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations.
[0914] The sequencing system of any one of the embodiments, wherein processing the sensor data to generate the first plurality of flow cell images comprises one or more of: registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
[0915] The sequencing system of any one of the embodiments, wherein the sequencing system further comprising: a housing that encloses the first reconfigurable logic device, the first reconfigurable routing channels, the one or more DMA connections, the integrated circuit, and the first processor therewithin.
[0916] The sequencing system of any one of the embodiments, wherein the sequencing system further comprising: a housing that encloses at least the first reconfigurable logic device therein and the integrated circuit is external to the housing.
[0917] The sequencing system of any one of the embodiments, wherein the sequencing system further comprising: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit. [0918] The sequencing system of any one of the embodiments, wherein a first power level to the first reconfigurable logic device is higher than a second power level to the integrated circuit.
[0919] The sequencing system of any one of the embodiments, wherein a maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers without the first reconfigurable logic device, the integrated circuit, or both.
[0920] The sequencing system of any one of the embodiments, wherein time consumption in performing a sequencing run and corresponding primary analysis thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run using a sequencer without the first reconfigurable logic device, the integrated circuit, or both.
[0921] The sequencing system of any one of the embodiments, wherein time consumption in performing a sequencing run and primary analysis of the sequencing run using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and primary analysis using a sequencer without the first reconfigurable logic device, the integrated circuit, or both.
[0922] The sequencing system of any one of the embodiments, wherein a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding sequencing analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts.
[0923] The sequencing system of any one of the embodiments, further comprising a power source configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts.
[0924] The sequencing system of any one of the embodiments, further comprising a power source configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
[0925] The sequencing system of any one of the embodiments, wherein the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs).
[0926] The sequencing system of any one of the embodiments, wherein the first processor or the second processor comprises a CPU.
[0927] The sequencing system of any one of the embodiments, wherein the one or more hardware processors of the sequencing system comprises a CPU. [0928] The sequencing system of any one of the embodiments, further comprising a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
[0929] The sequencing system of any one of the embodiments, wherein each of the one or more operations are performed in one or more sequencing cycles.
[0930] The sequencing system of any one of the embodiments, wherein each of the one or more operations are performed for each of one or more z levels in each cycle of the one or more cycles.
[0931] The sequencing system of any one of the embodiments, wherein each of the one or more operations are performed for a single z level in a single sequencing cycle in less than 1000 ms, 800 ms, 500 ms, 400 ms, 300 ms, or 200 ms.
[0932] The sequencing system of any one of the embodiments, wherein each of the one or more operations are performed in parallel while the sequencing run is in progress.
[0933] The sequencing system of any one of the embodiments, wherein each of the one or more operations are performed in parallel within a time window that sequencing, imaging, or both of a subsequent sequencing cycle is completed.
[0934] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are obtained from multiple z levels covering at least partly of an in situ sample.
[0935] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample.
[0936] The sequencing system of any one of the embodiments, wherein the sequencing system further comprising: one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support.
[0937] The sequencing system of any one of the embodiments, wherein the sequencing system further comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations comprising: 1) recording sensor data generated in the sequencing system in one or more flow cycles;
2) optionally processing the recorded sensor data;
3) sending the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit;
4) receiving outcome from the first reconfigurable logic device or integrated circuit; and
5) generating sequencing analysis results based on the received outcome.
[0938] The sequencing system of any one of the embodiments, wherein the sequencing system further comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations comprising:
6) receiving outcome from the first reconfigurable logic device or integrated circuit; and
7) generating sequencing analysis results based on the received outcome.
[0939] The sequencing system of any one of the embodiments, wherein the sequencing analysis results comprise a data file in a predetermined data format.
[0940] The sequencing system of any one of the embodiments, wherein the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support.
[0941] The sequencing system of any one of the embodiments, wherein the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support.
[0942] The sequencing system of any one of the embodiments, wherein the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
[0943] The sequencing system of any one of the embodiments, wherein the sequencing system further comprising: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors. [0944] The sequencing system of any one of the embodiments, wherein the optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images.
[0945] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are generated in one or more flow cycles of a sequence run.
[0946] The sequencing system of any one of the embodiments, wherein the sample comprises an in situ sample.
[0947] The sequencing system of any one of the embodiments, wherein the sample comprises a three-dimensional sample.
[0948] The sequencing system of any one of the embodiments, wherein the support is comprised in a flow cell device.
[0949] The sequencing system of any one of the embodiments, wherein the neural network comprises a convolutional neural network.
[0950] The sequencing system of any one of the embodiments, wherein the neural network comprises a 3D neural network.
[0951] The sequencing system of any one of the embodiments, wherein the neural network comprises a 2D neural network.
[0952] The sequencing system of any one of the embodiments, wherein the neural network comprises a U-Net.
[0953] The sequencing system of any one of the embodiments, wherein the neural network has been trained using the reconfigurable logic device or the integrated circuit.
[0954] The sequencing system of any one of the embodiments, wherein the first processor to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform operations using the neural network further comprising one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings;
[0955] The sequencing system of any one of the embodiments, wherein the first processor or a second processor of the sequencing system is to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform operations further comprising one or more of: generating quality measurements of the base callings; and generating a data output file based on the base callings;
[0956] The sequencing system of any one of the embodiments, wherein the output data comprises base calls of nucleotide bases in a sample immobilized on a support.
[0957] The sequencing system of any one of the embodiments, wherein the output data comprises identification of base calling locations in two dimensions.
[0958] The sequencing system of any one of the embodiments, wherein the output data comprises identification of base calling locations in three dimensions.
[0959] The sequencing system of any one of the embodiments, wherein the first convolution comprises a 3D convolution with a convolution kernel.
[0960] The sequencing system of any one of the embodiments, wherein the convolutional kernel have at least four dimension.
[0961] The sequencing system of any one of the embodiments, wherein the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[0962] The sequencing system of any one of the embodiments, wherein the first convolution comprises a 2D convolution with a convolution kernel.
[0963] The sequencing system of any one of the embodiments, wherein the convolutional kernel have at least three dimension.
[0964] The sequencing system of any one of the embodiments, wherein the convolutional kernel is m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[0965] The sequencing system of any one of the embodiments, wherein n is an integer from 1 to 16384.
[0966] The sequencing system of any one of the embodiments, wherein the second convolution in (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
[0967] The sequencing system of any one of the embodiments, wherein the second convolution in (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
[0968] The sequencing system of any one of the embodiments, wherein the first and second resolution is in 3D. [0969] The sequencing system of any one of the embodiments, wherein n is in a range from 4 to 1024.
[0970] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are from a single color channel.
[0971] The sequencing system of any one of the embodiments, wherein (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
[0972] The sequencing system of any one of the embodiments, wherein the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
[0973] The sequencing system of any one of the embodiments, wherein the fourth plurality of flow cell images comprises the second resolution.
[0974] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are from one or more color channels.
[0975] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are of unbalanced nucleotide diversity.
[0976] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles.
[0977] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles.
[0978] The sequencing system of any one of the embodiments, wherein two or more different concatemer molecules among the concatemer molecules have different insert sequences.
[0979] The sequencing system of any one of the embodiments, wherein different insert sequences correspond to different target RNA molecules or target cDNA molecules. [0980] The sequencing system of any one of the embodiments, wherein each location of the determined polonies corresponds to a location of the concatemer molecules.
[0981] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support.
[0982] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
[0983] The sequencing system of any one of the embodiments, wherein the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
[0984] The sequencing system of any one of the embodiments, wherein the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
[0985] The sequencing system of any one of the embodiments, wherein the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of
102 -1015 per mm2.
[0986] The sequencing system of any one of the embodiments, wherein the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of
103 -1010 per mm2.
[0987] The sequencing system of any one of the embodiments, wherein the first resolution is in a range of 0.1 um to 5 um.
[0988] The sequencing system of any one of the embodiments, wherein the second resolution in a range of 0.01 um to 2 um.
[0989] The sequencing system of any one of the embodiments, wherein the downsampling factor is 2, 4, or 8.
[0990] The sequencing system of any one of the embodiments, wherein the up-sampling factor is 2, 4, or 8. [0991] The sequencing system of any one of the embodiments, wherein one or more of operations of (a) to (k) are performed while a sequencing run is being performed.
[0992] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500.
[0993] The sequencing system of any one of the embodiments, wherein the one or more cycles comprises a current cycle N.
[0994] The sequencing system of any one of the embodiments, wherein N is in a range from 1 to 500.
[0995] The sequencing system of any one of the embodiments, wherein the one or more cycles comprises a single cycle ranging from 1 to 500.
[0996] The sequencing system of any one of the embodiments, wherein the one or more cycles comprises multiple cycles ranging from 1 to 500.
[0997] The sequencing system of any one of the embodiments, wherein one or more of operations (a) to (j) are performed while the sequencing reactions in cycles subsequent to the currently cycle N is yet to be performed or currently being performed.
[0998] The sequencing system of any one of the embodiments, wherein the training data set of flow cell images comprises z-stacks of flow cell images taken at different z- locations.
[0999] The sequencing system of any one of the embodiments, wherein the z-axis is orthogonal to image planes of the flow cell images.
[1000] The sequencing system of any one of the embodiments, wherein performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
[1001] The sequencing system of any one of the embodiments, wherein performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
[1002] The sequencing system of any one of the embodiments, wherein performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
[1003] The sequencing system of any one of the embodiments, wherein performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
[1004] The sequencing system of any one of the embodiments, wherein repetitively performing up sampling operations comprises:
(3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result;
(4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous downsampling repetition; and
(5) performing the second convolution in one or more dimensions of the first up- sampled result, thereby generating a fifth convolution result.
[1005] The sequencing system of any one of the embodiments, wherein the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
[1006] The sequencing system of any one of the embodiments, wherein the different combinations of the first plurality of data processing engines are configured to perform operations further comprising:
(c) receiving the second plurality of flow cell images from the integrated circuit;
(d) determining polonies from the second plurality of flow cell images; and
(e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and
(f) forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
[1007] The sequencing system of any one of the embodiments, wherein the one or more operations performed by the first reconfigurable logic device further comprises: forwarding the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
[1008] The sequencing system of any one of the embodiments, wherein the one or more operations performed by the integrated circuit further comprises: forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system.
[1009] The sequencing system of any one of the embodiments, wherein the operations performed by the integrated circuit further comprising: determining polonies from the second plurality of flow cell images; performing a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and forwarding the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
[1010] The sequencing system of any one of the embodiments, wherein the operations performed by the first reconfigurable logic device or the integrated circuit further comprising: registering the second plurality of flow cell images to a common coordinate system.
[ion] The sequencing system of any one of the embodiments, wherein the first plurality of flow cell images are acquired from a single color channel of the sequencing system.
[1012] The sequencing system of any one of the embodiments, wherein (d) or (i) determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies.
[1013] The sequencing system of any one of the embodiments, wherein generating a 3D polony map comprising spatial location of polonies based on the determined polonies further comprises: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus. [1014] The sequencing system of any one of the embodiments, wherein determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
[1015] The sequencing system of any one of the embodiments, wherein the support comprises a glass or plastic substrate.
[1016] A sequencing method comprising:
(a) obtaining, by a first reconfigurable logic device of a sequencing system, sensor data from one or more sensors of the sequencing system;
(b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images;
(c) predicting, by the first reconfigurable logic device, a second plurality of flow cell images using a neural network at least partly deployed on the first reconfigurable device and based on the sensor data or the first plurality of flow cell images;
(d) determining, by the first reconfigurable logic device, polonies from the second plurality of flow cell images;
(e) performing, by the first reconfigurable logic device, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and
(f) optionally forwarding, by the first reconfigurable logic device, the second plurality of flow cell images, the corresponding base calling, or both to one or more processors of the sequencing system.
[1017] A sequencing method comprising:
(a) obtaining, by the first reconfigurable logic device, sensor data from one or more image sensors of the sequencing system;
(b) processing, by the first reconfigurable logic device, the sensor data to generate a first plurality of flow cell images;
(c) communicating, by the first reconfigurable logic device to an integrated circuit, the sensor data, the first plurality of flow cell images, or both;
(d) receiving, by the integrated circuit and from the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both; (e) predicting, by the integrated circuit, a second plurality of flow cell images using the neural network based on the sensor data, the first plurality of flow cell images, or both;
(f) determining, by the integrated circuit, polonies from the second plurality of flow cell images; and
(g) performing, by the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[1018] A sequencing method comprising:
(a) obtaining, by the first reconfigurable logic device of a sequencing system, sensor data from one or more image sensors of the sequencing system to generate the first plurality of flow cell images;
(b) communicating, by the first reconfigurable logic device, the sensor data, the first plurality of flow cell images, or both to the integrated circuit;
(c) receiving, by the integrated circuit of the sequencing system, the sensor data, the first plurality of flow cell images, or both from the first reconfigurable logic device;
(d) predicting, by the by the integrated circuit, a second plurality of flow cell images using a neural network deployed at least partly on the integrated circuit and based on the sensor data, the first plurality of flow cell images, or both; and
(e) communicating, by the integrated circuit, the second plurality of flow cell images to the first reconfigurable logic device or one or more hardware processors of the sequencing system.
[1019] A sequencing method comprising:
(a) acquiring, by an imager of a sequencing system, a training set comprising a plurality of training flow cell images;
(b) up-sampling the corresponding plurality of training flow cell images to generate high resolution training flow cell images having a second resolution;
(c) generating, by the sequencing system, reference intensities corresponding to the intensities in the high resolution training flow cell images based on base calls of the high resolution training flow cell images.
(d) providing the reference intensities;
(e) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference intensities; and adjusting current values of parameters of the neural network based on the output error; and
(f) generating a trained neural network with adjusted parameters.
[1020] The method of any one of the embodiments, wherein the training flow cell images are from one or more color channels.
[1021] The method of any one of the embodiments, wherein the training flow cell images are of one or more samples immobilized on a flow cell device from one or more cycles.
[1022] The method of any one of the embodiments, wherein the one or more samples are in situ samples.
[1023] The method of any one of the embodiments, wherein at least part of the one or more samples comprises predetermined bases in the one or more cycles.
[1024] The method further comprising: determining, by the sequencing system, a location list of polonies in the plurality of flow cell images; and extracting, by the sequencing system, intensities in the plurality of flow cell images based on the location list.
[1025] The method further comprising: determining, by the sequencing system, a location list of polonies in the high resolution training flow cell images; and extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
[1026] The method of any one of the embodiments, wherein inputting the reference intensities to the neural network comprises: inputting the reference intensities and the location list to the neural network.
[1027] The method of any one of the embodiments, wherein generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity.
[1028] The method of any one of the embodiments, wherein determining an output error by comparing the training output and the reference intensities comprises: determining an output error by comparing the training output comprising predicted intensities and the reference intensities, wherein the predicted intensities are at locations in the location list.
[1029] The method of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more electronic nodes that are programmable.
[1030] The method of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more interconnects.
[1031] The method of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more memory controllers.
[1032] The method of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more network-on-chip (NoCs).
[1033] The method of any one of the embodiments, further comprising one or more direct data access (DMA) connections that are in data communication with the plurality of data processing engines and the first reconfigurable routing channels.
[1034] The method of any one of the embodiments, further comprising one or more direct data access (DMA) units that are in data communication the first reconfigurable routing channels and the integrated circuit.
[1035] The method of any one of the embodiments, wherein the one or more DMA units are configured to actively request data from or actively sending data to: the first reconfigurable logic device; the first reconfigurable routing channels; the one or more memory controllers; the integrated circuit; or a combination thereof.
[1036] The method of any one of the embodiments, wherein the first reconfigurable logic device is configured to communicate data with one or more memory devices external thereto.
[1037] The method of any one of the embodiments, wherein the first reconfigurable logic device is configured to communicate data with one or more memory devices external thereto via the first reconfigurable routing channels.
[1038] The method of any one of the embodiments, further comprising one or more memory devices electrically connected with one or more of: the first reconfigurable logic device; the integrated circuit; the first reconfigurable routing channels; the one or more memory controllers; the first processor; and one or more processors of the sequencing system.
[1039] The method of any one of the embodiments, wherein the first reconfigurable logic device comprises a first integrated circuit forming a FPGA device.
[1040] The method of any one of the embodiments, wherein the integrated circuit comprises an application specific integrated circuit (ASIC) chip.
[1041] The method of any one of the embodiments, wherein the integrated circuit comprises a neural processing unit (NPU) or an artificial intelligence (Al) chip.
[1042] The method of any one of the embodiments, wherein the integrated circuit comprises a second plurality of data processing engines, each data processing engine comprising multiple digital logic circuits.
[1043] The method of any one of the embodiments, wherein each of the first or second plurality of data processing engines comprises multiple digital logic circuits.
[1044] The method of any one of the embodiments, wherein the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
[1045] The method of any one of the embodiments, further comprising one or more direct memory access (DMA) connections.
[1046] The method of any one of the embodiments, wherein the DMA connections are configured to allow data communication based on a predetermined protocol.
[1047] The method of any one of the embodiments, wherein the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and one or more memory devices.
[1048] The method of any one of the embodiments, wherein the one or more DMA connections and the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and the integrated circuit.
[1049] The method of any one of the embodiments, wherein the first reconfigurable routing channels comprises one or more of a network-on-chip(NoC), and a memory controller. [1050] The method of any one of the embodiments, wherein the first reconfigurable routing channels are configured to allow data communication between the first reconfigurable logic device and an integrated circuit.
[1051] The method of any one of the embodiments, wherein the first reconfigurable routing channels and the one or more DMA connections are configured to allow data communication between the first reconfigurable logic device and an integrated circuit.
[1052] The method of any one of the embodiments, wherein the integrated circuit further comprises: second plurality of data processing engines. second routing channels, each connecting at least some of the second plurality of data processing engines.
[1053] The method of any one of the embodiments, wherein the first processor is configured to selectively activate or deactivate different combinations of the first plurality of data processing engines and the first reconfigurable routing channels to perform the operations.
[1054] The method of any one of the embodiments, wherein the first processor or a second processor is configured to selectively activate or deactivate different combinations of the second plurality of data processing engines and the second reconfigurable routing channels to perform the operations.
[1055] The method of any one of the embodiments, wherein processing the sensor data to generate the first plurality of flow cell images comprises: registering the first plurality of flow cell images to a reference coordinate system; adjusting image intensities of the first plurality of flow cell images; color correction of the first plurality of flow cell images; correcting phasing and prephasing of the first plurality of flow cell images; and subtracting background intensities from the first plurality of flow cell images.
[1056] The method of any one of the embodiments, wherein the sequencing system further comprising: a housing that encloses the first reconfigurable device, the first routing channels, the one or more DMA connections, the integrated circuit, and the first processor therein.
[1057] The method of any one of the embodiments, wherein the sequencing system further comprising: a housing that encloses the first reconfigurable logic device therein and the integrated circuit is external to the housing.
[1058] The method of any one of the embodiments, wherein the sequencing system further comprising: a power source that is configured to supply different power levels to the first reconfigurable logic device and the integrated circuit.
[1059] The method of any one of the embodiments, wherein the power source to the first reconfigurable logic device is higher than the power to the integrated circuit.
[1060] The method of any one of the embodiments, wherein a maximum power output of the power source of the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the maximum power output of the power source of sequencers without the first reconfigurable logic device, the integrated circuit, or both.
[1061] The method of any one of the embodiments, wherein time consumption in performing a sequencing run and corresponding primary analysis thereof using the sequencing system is 2x, 3x, 5x, 8x, lOx, or 20x lower than the time consumption in performing the same sequencing run and corresponding primary analysis thereof using a sequencer without the first reconfigurable logic device, the integrated circuit, or both.
[1062] The method of any one of the embodiments, wherein a maximum power output of the power source to the sequencing system in performing a sequencing run and corresponding primary analysis thereof is less than 900 Watts, 800 Watts, 700 Watts, 650 Watts, 600 Watts, 550 Watts, or 500 Watts.
[1063] The sequencing system of any one of the embodiments, further comprising a power source configured to supply a first power level to the first reconfigurable logic device, the first power level is less than 500 Watts, 400 Watts, 350 Watts, or 300 Watts.
[1064] The method of any one of the embodiments, further comprising a power source configured to supply a second power level to the integrated circuit, the second power level is less than 450 Watts, 400 Watts, 350 Watts, or 300 Watts.
[1065] The method of any one of the embodiments, wherein the sequencing systems lacks any graphic processing units (GPUs) or tensor processing units (TPUs).
[1066] The method of any one of the embodiments, wherein the first processor or the second processor comprises a CPU.
[1067] The method of any one of the embodiments, wherein the one or more hardware processors of the sequencing system comprises a CPU. [1068] The method of any one of the embodiments, further comprising a heat dissipator configured to maintain a system temperature in a range from 0 degrees to 120 degrees Celsius or less than 120 degrees Celsius.
[1069] The method of any one of the embodiments, wherein each of the one or more operations are performed in one or more sequencing cycles.
[1070] The method of any one of the embodiments, wherein each of the one or more operations are performed for each of one or more z levels in each cycle of the one or more cycles.
[1071] The method of any one of the embodiments, wherein each of the one or more operations are performed for a single z level in a single sequencing cycle in less than 1000 ms, 800 ms, 500 ms, 400 ms, 300 ms, or 200 ms.
[1072] The method of any one of the embodiments, wherein each of the one or more operations are performed in parallel while the sequencing run is in progress.
[1073] The method of any one of the embodiments, wherein each of the one or more operations are performed in parallel within a time window that sequencing, imaging, or both of a subsequent sequencing cycle is completed.
[1074] The method of any one of the embodiments, wherein the first plurality of flow cell images are obtained from multiple z levels covering at least partly of an in situ sample.
[1075] The method of any one of the embodiments, wherein the first plurality of flow cell images are obtained from one or more color channels at each z level of the multiple z levels covering at least partly of the in situ sample.
[1076] The method of any one of the embodiments, wherein the sequencing system further comprising: one or more image sensors configured to receive optical signals generated from sequencing reactions of a sample immobilized on a support.
[1077] The method of any one of the embodiments, further comprising: recording, by one or mor processors of the sequencing system, sensor data generated in the sequencing system in one or more flow cycles; optionally processing, by the one or mor processors of the sequencing system, the recorded sensor data; sending, by the one or mor processors of the sequencing system, the recorded sensor data or the optionally processed data to the first reconfigurable logic device or the integrated circuit; receiving, by the one or mor processors of the sequencing system, outcome from the first reconfigurable logic device or integrated circuit; and generating, by the one or mor processors of the sequencing system, sequencing analysis results based on the received outcome.
[1078] The method of any one of the embodiments, further comprising: receiving, by one or mor processors of the sequencing system, outcome from the first reconfigurable logic device or integrated circuit; and generating, by one or mor processors of the sequencing system, sequencing analysis results based on the received outcome.
[1079] The method of any one of the embodiments, further comprising: receiving, by one or mor processors of the sequencing system, outcome from the first reconfigurable logic device or integrated circuit; and generating, by one or mor processors of the sequencing system, sequencing analysis results based on the received outcome.
[1080] The method of any one of the embodiments, wherein receiving, by one or mor processors of the sequencing system, outcome from the first reconfigurable logic device or integrated circuit comprises: receiving, by one or mor processors of the sequencing system, outcome from the first reconfigurable logic device or integrated circuit directly.
[1081] The method of any one of the embodiments, wherein receiving, by one or mor processors of the sequencing system, outcome from the first reconfigurable logic device or integrated circuit comprises: receiving, by one or mor processors of the sequencing system, outcome from the first reconfigurable logic device or integrated circuit comprises: accessing a memory device to receive the outcome saved at the memory device by the first reconfigurable logic device or integrated circuit.
[1082] The method of any one of the embodiments, wherein the sequencing analysis results comprise a data file in a predetermined data format.
[1083] The method of any one of the embodiments, wherein the sequencing analysis results comprise base calls of nucleotide bases in a sample immobilized on a support.
[1084] The method of any one of the embodiments, wherein the sequencing analysis results comprises quality measurements of base calls of nucleotide bases in a sample immobilized on a support. [1085] The method of any one of the embodiments, wherein the sequencing analysis results comprises quality scores corresponding to base calls of nucleotide bases in a sample immobilized on a support.
[1086] The method of any one of the embodiments, wherein the sequencing system further comprising: a sample immobilized on a support; and an optical system comprising: an illumination system; an objective lens and the one or more image sensors;
[1087] The method of any one of the embodiments, wherein the optical system is configured to emit light to the sample and to collect optical signals emitted from the sample, thereby generating the first plurality of flow cell images.
[1088] The method of any one of the embodiments, wherein the first plurality of flow cell images are generated in one or more flow cycles of a sequence run.
[1089] The method of any one of the embodiments, wherein the sample comprises an in situ sample.
[1090] The method of any one of the embodiments, wherein the sample comprises a three-dimensional sample.
[1091] The method of any one of the embodiments, wherein the support is comprised in a flow cell device.
[1092] The method of any one of the embodiments, wherein the neural network comprises a convolutional neural network.
[1093] The method of any one of the embodiments, wherein the neural network comprises a U-Net.
[1094] The method of any one of the embodiments, wherein the neural network has been trained using the reconfigurable logic device or the integrated circuit.
[1095] The method of any one of the embodiments, wherein the neural network is 3D.
[1096] The method of any one of the embodiments, wherein the neural network has been trained using training data comprising z-stacks of flow cell images of 3D samples.
[1097] The method of any one of the embodiments, wherein the neural network is 2D.
[1098] The method of any one of the embodiments, wherein the neural network has been trained using training data comprising flow cell images at multiple z-levels of 3D samples.
[1099] The method of any one of the embodiments, further comprises one or more of: generating, by the reconfigurable logic device, quality measurements of the base callings; and generating, by the reconfigurable logic device, a data output file based on the base callings;
[1100] The method of any one of the embodiments, further comprising one or more of generating, by the integrated circuit, quality measurements of the base callings; and generating, by the integrated circuit, a data output file based on the base callings;
[HOI] The method of any one of the embodiments, wherein the output data comprises base calls of nucleotide bases in a sample immobilized on a support.
[1102] The method of any one of the embodiments, wherein the output data comprises identification of base calling locations in two dimensions.
[1103] The method of any one of the embodiments, wherein the output data comprises identification of base calling locations in three dimensions.
[1104] The method of any one of the embodiments, wherein the first convolution comprises a 3D convolution with a convolution kernel.
[1105] The method of any one of the embodiments, wherein the convolutional kernel have at least four dimension.
[1106] The method of any one of the embodiments, wherein the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[1107] The sequencing system of any one of the embodiments, wherein the first convolution comprises a 2D convolution with a convolution kernel.
[1108] The sequencing system of any one of the embodiments, wherein the convolutional kernel have at least three dimension.
[1109] The sequencing system of any one of the embodiments, wherein the convolutional kernel is m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[1110] The method of any one of the embodiments, wherein n is an integer from 1 to 16384.
[HU] The method of any one of the embodiments, wherein the second convolution in (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively. [1112] The method of any one of the embodiments, wherein the second convolution in (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
[1H3] The method of any one of the embodiments, wherein the first and second resolution is in 3D.
[1H4] The method of any one of the embodiments, wherein n is in a range from 4 to 1024.
[1H5] The method of any one of the embodiments, wherein the first plurality of flow cell images are from a single color channel.
[1H6] The method of any one of the embodiments, wherein (e) performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images and based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
[1H7] The method of any one of the embodiments, wherein the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
[1H8] The method of any one of the embodiments, wherein the fourth plurality of flow cell images comprises the second resolution.
[1H9] The method of any one of the embodiments, wherein the first plurality of flow cell images are from one or more color channels.
[1120] The method of any one of the embodiments, wherein the first plurality of flow cell images are of unbalanced nucleotide diversity.
[H21] The method of any one of the embodiments, wherein the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more flow cycles.
[1122] The method of any one of the embodiments, wherein the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles.
[1123] The method of any one of the embodiments, wherein two or more different concatemer molecules among the concatemer molecules have different insert sequences. [1124] The method of any one of the embodiments, wherein different insert sequences correspond to different target RNA molecules or target cDNA molecules.
[H25] The method of any one of the embodiments, wherein each location of the determined polonies corresponds to a location of the concatemer molecules.
[1126] The method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support.
[1127] The method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
[1128] The method of any one of the embodiments, wherein the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
[1129] The method of any one of the embodiments, wherein the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
[1130] The method of any one of the embodiments, wherein the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 102 -1015
2 per mm .
[H31] The method of any one of the embodiments, wherein the cellular sample comprises overloaded concatemer molecules with a spatial density in a range of 103 -1010 2 per mm .
[1132] The method of any one of the embodiments, wherein the first resolution is in a range of 0.1 um to 5 um.
[1133] The method of any one of the embodiments, wherein the second resolution in a range of 0.01 um to 2 um.
[H34] The method of any one of the embodiments, wherein the down-sampling factor is 2, 4, or 8. [1135] The method of any one of the embodiments, wherein the up-sampling factor is 2, 4, or 8.
[1136] The method of any one of the embodiments, wherein one or more of operations of (a) to (k) are performed while a sequencing run is being performed.
[1137] The method of any one of the embodiments, wherein the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to 500.
[1138] The method of any one of the embodiments, wherein the one or more cycles comprises a current cycle N.
[1139] The method of any one of the embodiments, wherein N is in a range from 1 to 500.
[1140] The method of any one of the embodiments, wherein the one or more cycles comprises a single cycle ranging from 1 to 500.
[H41] The method of any one of the embodiments, wherein the one or more cycles comprises multiple cycles ranging from 1 to 500.
[H42] The method of any one of the embodiments, wherein one or more of operations (a) to (j) are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
[H43] The method of any one of the embodiments, wherein the training data set of flow cell images comprises z-stacks of flow cell images taken at different z-locations.
[1144] The method of any one of the embodiments, wherein the z-axis is orthogonal to image planes of the flow cell images.
[H45] The method of any one of the embodiments, wherein performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result.
[1146] The method of any one of the embodiments, wherein performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 3D on the first convolution result, thereby generating a second convolution result. [1147] The method of any one of the embodiments, wherein performing the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
[1148] The method of any one of the embodiments, wherein performing a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result comprises: performing the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
[1149] The method of any one of the embodiments, wherein repetitively performing up sampling operations comprises:
(3) performing an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result;
(4) concatenating the first up-sampled result in a current up-sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous downsampling repetition; and
(5) performing the second convolution in one or more dimensions of the first up- sampled result, thereby generating a fifth convolution result.
[1150] The method of any one of the embodiments, wherein the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
[H51] The method of any one of the embodiments, further comprising:
(g) receiving, by the first reconfigurable logic device of a sequencing system, the second plurality of flow cell images from the integrated circuit;
(h) determining, by the first reconfigurable logic device, polonies from the second plurality of flow cell images; and
(i) performing by the first reconfigurable logic device, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images; and
(j) forwarding, by the first reconfigurable logic device, the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
[H52] The method of any one of the embodiments, further comprising: forwarding, by the first reconfigurable logic device, the second plurality of flow cell images, the determined polonies, the corresponding base callings, or a combination thereof to the first processor or one or more hardware processors of the sequencing system.
[1153] The method of any one of the embodiments, further comprising: forwarding, by the integrated circuit, the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable logic device, the first processor or one or more hardware processors of the sequencing system.
[H54] The method of any one of the embodiments, further comprising:
(k) determining, by the integrated circuit, polonies from the second plurality of flow cell images;
(l) performing, by the integrated circuit, a corresponding base call for each of the determined polonies based on the second plurality of flow cell images; and
(m) forwarding, by the integrated circuit, the second plurality of flow cell images, the corresponding base callings, or both to the first reconfigurable device, the first processor, or one or more hardware processors of the sequencing system.
[1155] The method of any one of the embodiments, further comprising: registering, by the reconfigurable logic device, the second plurality of flow cell images to a common coordinate system.
[1156] The method of any one of the embodiments, wherein the first plurality of flow cell images are acquired from a single color channel of the sequencing system.
[H57] The method of any one of the embodiments, wherein (d) or (i) determining polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies.
[1158] The method of any one of the embodiments, wherein generating a 3D polony map comprising spatial location of polonies based on the determined polonies further comprises: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus. [1159] The method of any one of the embodiments, wherein determining polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
[1160] The method of any one of the embodiments, wherein the support comprises a glass or plastic substrate.
[H61] The method of any one of the embodiments, further comprising: providing the cellular sample harboring a plurality of RNA which comprises the first target RNA molecule and the second target RNA molecule.
[1162] The method of any one of the embodiments, further comprising: generating inside the cellular sample a plurality of cDNA molecules which include a first target cDNA molecule that corresponds to the first target RNA molecule and a second target cDNA molecule that corresponds to the second target RNA molecule.
[1163] The method of any one of the embodiments, further comprising: contacting the plurality of cDNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of first targetspecific padlock probes and a second plurality of second target-specific padlock probes.
[1164] The method of any one of the embodiments, further comprising: contacting the plurality of RNA molecules in the cellular sample with a plurality of target-specific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
[1165] The method of any one of the embodiments, wherein individual padlock probes in the first plurality of first target-specific padlock probes comprise: first and second terminal regions, wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule or the first target RNA molecule, and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule or the first target RNA molecule.
[1166] The method of any one of the embodiments, wherein contacting the plurality of RNA molecules in the cellular sample with the plurality of target-specific padlock probes comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule or the first target RNA molecule to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions.
[H67] The method of any one of the embodiments, wherein the first target-specific padlock probe comprises a first target barcode sequence that corresponds to and uniquely identifies the first target cDNA sequence or the first target RNA sequence.
[1168] The method of any one of the embodiments, wherein the first target-specific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule or the first target RNA sequence.
[1169] The method of any one of the embodiments, wherein the first target-specific padlock probe comprises at least one universal adaptor sequence.
[1170] The method of any one of the embodiments, wherein the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer or a complementary sequence thereof.
[H71] The method of any one of the embodiments, wherein the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site or a complementary sequence thereof.
[1172] The method of any one of the embodiments, further comprising: closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the cellular sample.
[1173] The method of any one of the embodiments, further comprising: conducting a rolling circle amplification reaction inside the cellular sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least the first concatemer molecule that corresponds to the first target RNA molecule, and the second concatemer molecule that corresponds to the second target RNA molecule.
[H74] The method of any one of the embodiments, wherein the first concatemer comprises: tandem repeat units of: a first target barcode sequence that uniquely identifies the first target RNA or the first target cDNA sequence, a first insert sequences that corresponds to the first target RNA or the first target cDNA, and a first sequencing primer binding site or a complementary sequence thereof.
[H75] The method of any one of the embodiments, wherein the first concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
[H76] The method of any one of the embodiments, wherein the second concatemer comprises: tandem repeat units of: a second target barcode sequence that uniquely identifies the second target RNA or the second target cDNA sequence, a second insert sequences that corresponds to the second target RNA or the second target cDNA, and a second sequencing primer binding site or a complementary sequence thereof.
[1177] The method of any one of the embodiments, wherein the second concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
[1178] The method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: contacting the plurality of concatemer molecules inside the cellular sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers.
[1179] The method of any one of the embodiments, wherein the plurality of nucleotide reagents comprise: multivalent molecules, nucleotides, nucleotide analogs, or their combinations.
[1180] The method of any one of the embodiments, wherein individual nucleotides or nucleotide analogs are detectably labeled or non-labeled.
[H81] The method of any one of the embodiments, wherein the detectably labeled individual nucleotides or nucleotide analogs comprises a different detectable color label that corresponds with each different type of nucleotide base of A, G, C, and T/U. [1182] The method of any one of the embodiments, wherein an individual multivalent molecule comprise a core attached with multiple nucleotide arms and each arm of the individual multivalent molecule comprises the same type of nucleotide base.
[1183] The method of any one of the embodiments, wherein generating the first plurality of flow cell images comprises: in each cycle, imaging, by an optical system, optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
[1184] The method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
[1185] The method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises:
[1186] sequencing only the first target barcode sequence region of the first concatemer, thereby generating the first sequencing read product.
[1187] The method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing the first target barcode sequence region and at least a portion of the first insert sequence of the first concatemer, thereby generating the first sequencing read product.
[1188] The method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing only the second target barcode sequence region of the second concatemer, thereby generating the second sequencing read product.
[1189] The method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing the second target barcode sequence region and at least a portion of the second insert sequence of the second concatemer, thereby generating the second sequencing read product.
[1190] The method of any one of the embodiments, further comprising: removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample. [1191] The sequencing system of any one of the embodiments, further comprising: reiteratively sequencing the plurality of concatemers by repeating the following operations for at least once: generating the first plurality of flow cell images of a cellular sample immobilized on a support by conducting one or more cycles of sequencing reactions thereby generating the first sequencing read product and the second sequencing product, the cellular sample comprising a plurality of concatemer molecules therewithin, wherein a first concatemer molecule of the plurality of concatemer molecules corresponds to a first target RNA molecule of the cellular sample, and a second concatemer molecule of the plurality of concatemer molecules corresponds to a second target RNA molecule of the cellular sample, wherein the first plurality of flow cell images; and removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the cellular sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the cellular sample.
[1192] The method of any one of the embodiments, wherein the first sequencing read product comprises some or all of: a first target barcode sequence in one or more tandem units of the first concatemer molecule; a first insert sequence in one or more tandem units of the first concatemer molecule; or their combinations.
[1193] The method of any one of the embodiments, further comprising: confirming presence of the first target RNA molecule, the second target RNA molecule, or both molecules in the cellular sample based on the performed base calling of the second plurality of flow cell images at the base calling locations in the base calling template.
[1194] The method of any one of the embodiments, further comprising: generating, by the sequencing system, the second plurality of flow cell images of the cellular sample immobilized on the support by conducting subsequent cycles of sequencing reactions after the one or more cycles.
[H95] The method of any one of the embodiments, wherein generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the first concatemer inside the cellular sample under a condition that inhibits sequencing the second concatemer. [1196] The method of any one of the embodiments, wherein sequencing at least the first concatemer inside the cellular sample comprises: generating a plurality of first sequencing read products, and wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm presence of the first target RNA in the cellular sample.
[1197] The method of any one of the embodiments, wherein generating the first plurality of flow cell images of the cellular sample immobilized on the support comprises: sequencing at least the second concatemer inside the cellular sample under a condition that inhibits sequencing the first concatemer.
[1198] The method of any one of the embodiments, wherein sequencing at least the second concatemer inside the cellular sample comprises: generating a plurality of second sequencing read products, and wherein sequences of the second sequencing read products are aligned with a second target reference sequence to confirm presence of the second target RNA in the cellular sample.
[1199] A computer-implemented method comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, wherein the first plurality of flow cell images are acquired with a first resolution;
(ii) providing, by a processor or a first reconfigurable logic device, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images;
(iii) predicting, by the first reconfigurable device or an integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images corresponds to a corresponding image of the first plurality of flow cell images with a second resolution, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions;
(iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and
(v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[1200] A computer-implemented method comprising: (A) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, the sample comprising concatemer molecules therewithin, wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations or a single z location with a first resolution;
(B) predicting, by the first reconfigurable logic device or an integrated circuit, a second plurality of flow cell images from the first plurality of flow cell images using a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images, comprising: performing, by the first reconfigurable logic device or integrated circuit, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising:
(a) performing, by the first reconfigurable logic device or integrated circuit, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and
(b) performing, by the first reconfigurable logic device or integrated circuit, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result; performing, by the first reconfigurable logic device or integrated circuit, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising:
(c) performing, by the first reconfigurable logic device or integrated circuit, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and
(d) performing, by the first reconfigurable logic device or integrated circuit, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a sixth convolution result; performing, by the first reconfigurable logic device or integrated circuit, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; predicting, by the first reconfigurable logic device or integrated circuit, a second plurality of flow cell images based on the seventh convolution result, wherein each of the second plurality of flow cell images corresponds to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions; determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base call for each of the determined polonies based on the second plurality of flow cell images.
[1201] The computer-implemented method of any one of the embodiments, wherein the sample comprises concatemer molecules therewithin, and wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations.
[1202] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are acquired at a single z-location.
[1203] The computer-implemented method of any one of the embodiments, wherein the processor comprises one or more of: a CPU, a GPU, a TPU, a NPU, a FPGA, and an Al chip.
[1204] The computer-implemented method of any one of the embodiments, wherein the processor comprises one or more of: a CPU, a NPU, and a FPGA.
[1205] The computer-implemented method of any one of the embodiments, wherein the first reconfigurable logic device comprises one or more FPGA units.
[1206] The computer-implemented method of any one of the embodiments, wherein the integrated circuit comprises one or more NPUs.
[1207] The computer-implemented method of any one of the embodiments, wherein the integrated circuit is in data communication with the first reconfigurable logic device.
[1208] The computer-implemented method of any one of the embodiments, wherein the convolutional network comprises a U-Net. [1209] The computer-implemented method of any one of the embodiments, wherein the first convolution comprises a 3D convolution with a convolution kernel.
[1210] The computer-implemented method of any one of the embodiments, wherein the convolutional kernel have at least four dimension.
[12H] The computer-implemented method of any one of the embodiments, wherein the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[1212] The sequencing system of any one of the embodiments, wherein the first convolution comprises a 2D convolution with a convolution kernel.
[1213] The sequencing system of any one of the embodiments, wherein the convolutional kernel have at least three dimension.
[1214] The sequencing system of any one of the embodiments, wherein the convolutional kernel is m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[1215] The sequencing system of any one of the embodiments, wherein the neural network is a 3D convolutional neural network.
[1216] The sequencing system of any one of the embodiments, wherein the neural network is a 2D convolutional neural network.
[1217] The computer-implemented method of any one of the embodiments, wherein n is an integer from 1 to 16384.
[1218] The computer-implemented method of any one of the embodiments, wherein the second convolution in (a) comprises a corresponding number of n, 2*n, 4*n, and 8*n filters in a first, second, third, and fourth repetition, respectively.
[1219] The computer-implemented method of any one of the embodiments, wherein the second convolution in (c) comprises a corresponding number of 2*n, 2*n, 4*n, 8*n filters in a last repetition, last minus one, last minus two, and last minus three repetition, respectively.
[1220] The computer-implemented method of any one of the embodiments, wherein the first and second resolution is in 3D.
[1221] The computer-implemented method of any one of the embodiments, wherein n is in a range from 4 to 1024.
[1222] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are from a single color channel. [1223] The computer-implemented method of any one of the embodiments, wherein (v) performing, by the processor, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing, by the processor, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
[1224] The computer-implemented method of any one of the embodiments, wherein the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
[1225] The computer-implemented method of any one of the embodiments, wherein the fourth plurality of flow cell images comprises the second resolution.
[1226] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are from one or more color channels.
[1227] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are of unbalanced nucleotide diversity.
[1228] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles.
[1229] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles.
[1230] The computer-implemented method of any one of the embodiments, wherein two or more different concatemer molecules among the concatemer molecules have different insert sequences.
[1231] The computer-implemented method of any one of the embodiments, wherein different insert sequences correspond to different target RNA molecules or target cDNA molecules. [1232] The computer-implemented method of any one of the embodiments, wherein each location of the determined polonies corresponds to a location of the concatemer molecules.
[1233] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support.
[1234] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
[1235] The computer-implemented method of any one of the embodiments, wherein the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
[1236] The computer-implemented method of any one of the embodiments, wherein the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
[1237] The computer-implemented method of any one of the embodiments, wherein the sample comprises overloaded concatemer molecules with a spatial density in a range of
102 -1015 per mm2.
[1238] The computer-implemented method of any one of the embodiments, wherein the sample comprises overloaded concatemer molecules with a spatial density in a range of
103 -IO10 per mm2.
[1239] The computer-implemented method of any one of the embodiments, wherein the first resolution is in a range of 0.1 um to 5 um.
[1240] The computer-implemented method of any one of the embodiments, wherein the second resolution in a range of 0.01 um to 2 um. [1241] The computer-implemented method of any one of the embodiments, wherein the down-sampling factor is 2, 4, or 8.
[1242] The computer-implemented method of any one of the embodiments, wherein the up-sampling factor is 2, 4, or 8.
[1243] The computer-implemented method of any one of the embodiments, wherein one or more of operations (ii) to (v) are performed while a sequencing run is being performed.
[1244] The computer-implemented method of any one of the embodiments, wherein one or more of operations (B) and (a) to (e) are performed while a sequencing run is being performed.
[1245] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are acquired in sequencing cycles ranging from 1 to
[1246] The computer-implemented method of any one of the embodiments, wherein the one or more cycles comprises a current cycle N.
[1247] The computer-implemented method of any one of the embodiments, wherein N is in a range from 1 to 500.
[1248] The computer-implemented method of any one of the embodiments, wherein the one or more cycles comprises a single cycle ranging from 1 to 500.
[1249] The computer-implemented method of any one of the embodiments, wherein the one or more cycles comprises multiple cycles ranging from 1 to 500.
[1250] The computer-implemented method of any one of the embodiments, wherein one or more of operations (ii) to (v) are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
[1251] The computer-implemented method of any one of the embodiments, wherein the training data set of flow cell images comprises z-stacks of flow cell images taken at different z-locations.
[1252] The computer-implemented method of any one of the embodiments, wherein the z-axis is orthogonal to image planes of the flow cell images.
[1253] The computer-implemented method of any one of the embodiments, wherein performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 3D on the first plurality of flow cell images, thereby generating a first convolution result. [1254] The computer-implemented method of any one of the embodiments, wherein (a) comprises: performing, by the processor, the second convolution in 3D on the first convolution result, thereby generating a second convolution result.
[1255] The computer-implemented method of any one of the embodiments, wherein performing, by the processor, the first convolution in one or more dimensions on the first plurality of flow cell images comprises: performing, by the processor, a first convolution in 2D on the first plurality of flow cell images, thereby generating a first convolution result.
[1256] The computer-implemented method of any one of the embodiments, wherein (a) comprises: performing, by the processor, the second convolution in 2D on the first convolution result, thereby generating a second convolution result.
[1257] The computer-implemented method of any one of the embodiments, wherein repetitively performing, for one or more times, operations comprising (c) and (d) comprise: repetitively performing, for one or more times, operations comprising (c), (d), and (e), wherein (e) is after operation (c) and before operation (e), and wherein (e) comprises: concatenating, by the processor, the first up-sampled result in a current up- sampling repetition with the first down-sampled result in a previous down-sample repetition, wherein the first up-sampled result has a same size as the first down-sampled result in the previous down-sampling repetition.
[1258] The computer-implemented method of any one of the embodiments, wherein the second resolution is 4, 6, or 8 times greater than the first resolution in all three dimensions.
[1259] The computer-implemented method of any one of the embodiments, further comprising: registering the second plurality of flow cell images to a common coordinate system.
[1260] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are acquired from a single color channel of the sequencing system. [1261] The computer-implemented method of any one of the embodiments, wherein (vi) determining, by the processor, polonies from the second plurality of flow cell images comprises: generating a 3D polony map comprising spatial location of polonies based on the determined polonies in (iv).
[1262] The computer-implemented method of any one of the embodiments, wherein generating a 3D polony map comprising spatial location of polonies based on the determined polonies in (iv) further comprises: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus.
[1263] The computer-implemented method of any one of the embodiments, wherein determining, by the processor, polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
[1264] The computer-implemented method of any one of the embodiments, wherein the support comprises a glass or plastic substrate.
[1265] The computer-implemented method of any one of the embodiments, wherein the support is comprised in a flow cell device.
[1266] The computer-implemented method of any one of the embodiments, further comprising: providing the sample harboring a plurality of RNA which comprises the first target RNA molecule and the second target RNA molecule.
[1267] The computer-implemented method of any one of the embodiments, further comprising: generating inside the sample a plurality of cDNA molecules which include a first target cDNA molecule that corresponds to the first target RNA molecule and a second target cDNA molecule that corresponds to the second target RNA molecule.
[1268] The computer-implemented method of any one of the embodiments, further comprising: contacting the plurality of cDNA molecules in the sample with a plurality of target-specific padlock probes which includes at least a first plurality of first targetspecific padlock probes and a second plurality of second target-specific padlock probes.
[1269] The computer-implemented method of any one of the embodiments, further comprising: contacting the plurality of RNA molecules in the sample with a plurality of targetspecific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
[1270] The computer-implemented method of any one of the embodiments, wherein individual padlock probes in the first plurality of first target-specific padlock probes comprise: first and second terminal regions, wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule or the first target RNA molecule, and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule or the first target RNA molecule.
[1271] The computer-implemented method of any one of the embodiments, wherein contacting the plurality of RNA molecules in the sample with the plurality of targetspecific padlock probes comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule or the first target RNA molecule to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions.
[1272] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a first target barcode sequence that corresponds to and uniquely identifies the first target cDNA sequence or the first target RNA sequence.
[1273] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule or the first target RNA sequence.
[1274] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises at least one universal adaptor sequence. [1275] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer or a complementary sequence thereof.
[1276] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site or a complementary sequence thereof.
[1277] The computer-implemented method of any one of the embodiments, further comprising: closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the sample.
[1278] The computer-implemented method of any one of the embodiments, further comprising: conducting a rolling circle amplification reaction inside the sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least the first concatemer molecule that corresponds to the first target RNA molecule, and the second concatemer molecule that corresponds to the second target RNA molecule.
[1279] The computer-implemented method of any one of the embodiments, wherein the first concatemer comprises: tandem repeat units of: a first target barcode sequence that uniquely identifies the first target RNA or the first target cDNA sequence, a first insert sequences that corresponds to the first target RNA or the first target cDNA, and a first sequencing primer binding site or a complementary sequence thereof.
[1280] The computer-implemented method of any one of the embodiments, wherein the first concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
[1281] The computer-implemented method of any one of the embodiments, wherein the second concatemer comprises: tandem repeat units of: a second target barcode sequence that uniquely identifies the second target RNA or the second target cDNA sequence, a second insert sequences that corresponds to the second target RNA or the second target cDNA, and a second sequencing primer binding site or a complementary sequence thereof.
[1282] The computer-implemented method of any one of the embodiments, wherein the second concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
[1283] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: contacting the plurality of concatemer molecules inside the sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers.
[1284] The computer-implemented method of any one of the embodiments, wherein the plurality of nucleotide reagents comprise: multivalent molecules, nucleotides, nucleotide analogs, or their combinations.
[1285] The computer-implemented method of any one of the embodiments, wherein individual nucleotides or nucleotide analogs are detectably labeled or non-labeled.
[1286] The computer-implemented method of any one of the embodiments, wherein the detectably labeled individual nucleotides or nucleotide analogs comprises a different detectable color label that corresponds with each different type of nucleotide base of A, G, C, and T/U.
[1287] The computer-implemented method of any one of the embodiments, wherein an individual multivalent molecule comprise a core attached with multiple nucleotide arms and each arm of the individual multivalent molecule comprises the same type of nucleotide base.
[1288] The computer-implemented method of any one of the embodiments, wherein generating the first plurality of flow cell images comprises: in each cycle, imaging, by an optical system, optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
[1289] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules. [1290] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing only the first target barcode sequence region of the first concatemer, thereby generating the first sequencing read product.
[1291] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing the first target barcode sequence region and at least a portion of the first insert sequence of the first concatemer, thereby generating the first sequencing read product.
[1292] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing only the second target barcode sequence region of the second concatemer, thereby generating the second sequencing read product.
[1293] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing the second target barcode sequence region and at least a portion of the second insert sequence of the second concatemer, thereby generating the second sequencing read product.
[1294] The computer-implemented method of any one of the embodiments, further comprising: removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the sample.
[1295] The computer-implemented method of any one of the embodiments, further comprising: reiteratively sequencing the plurality of concatemers by repeating the following operations for at least once: generating the first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions thereby generating the first sequencing read product and the second sequencing product, the sample comprising a plurality of concatemer molecules therewithin, wherein a first concatemer molecule of the plurality of concatemer molecules corresponds to a first target RNA molecule of the sample, and a second concatemer molecule of the plurality of concatemer molecules corresponds to a second target RNA molecule of the sample, wherein the first plurality of flow cell images; and removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the sample.
[1296] The computer-implemented method of any one of the embodiments, wherein the first sequencing read product comprises some or all of: a first target barcode sequence in one or more tandem units of the first concatemer molecule; a first insert sequence in one or more tandem units of the first concatemer molecule; or their combinations.
[1297] The computer-implemented method of any one of the embodiments, further comprising: confirming presence of the first target RNA molecule, the second target RNA molecule, or both molecules in the sample based on the performed base calling of the second plurality of flow cell images at the base calling locations in the base calling template.
[1298] The computer-implemented method of any one of the embodiments, further comprising: generating, by the sequencing system, the second plurality of flow cell images of the sample immobilized on the support by conducting subsequent cycles of sequencing reactions after the one or more cycles.
[1299] The computer-implemented method of any one of the embodiments, wherein generating the first plurality of flow cell images of the sample immobilized on the support comprises: sequencing at least the first concatemer inside the sample under a condition that inhibits sequencing the second concatemer.
[1300] The computer-implemented method of any one of the embodiments, wherein sequencing at least the first concatemer inside the sample comprises: generating a plurality of first sequencing read products, and wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm presence of the first target RNA in the sample. [1301] The computer-implemented method of any one of the embodiments, wherein generating the first plurality of flow cell images of the sample immobilized on the support comprises: sequencing at least the second concatemer inside the sample under a condition that inhibits sequencing the first concatemer.
[1302] The computer-implemented method of any one of the embodiments, wherein sequencing at least the second concatemer inside the cellular sample comprises: generating a plurality of second sequencing read products, and wherein sequences of the second sequencing read products are aligned with a second target reference sequence to confirm presence of the second target RNA in the sample.
[1303] A computer-implemented method for training a neural network, the method comprising: generating, by a processor, a first reconfigurable logic device, or an integrated circuit, a training set comprising a corresponding plurality of training flow cell images for each sample with a first resolution; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a reference set comprising reference flow cell images at a second resolution higher than the first resolution, each reference flow cell image in the reference set corresponding to an individual image of the corresponding pluralities of training flow cell images; providing, by the processor, the first reconfigurable logic device, or the integrated circuit , the training set as inputs to the neural network to generate a training output; repeatedly performing, by the processor, the first reconfigurable logic device, or the integrated circuit, until the output error satisfies a stopping criterion, one or more operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error; and generating a trained neural network with the adjusted parameters.
[1304] A computer-implemented method for training a neural network, the method comprising: generating, by a processor, a first reconfigurable logic device, or an integrated circuit , a corresponding plurality of training flow cell images of each individual sample by conducting one or more cycles of sequencing reactions, wherein each individual sample is immobilized on a support and comprises concatemer molecules therewithin, wherein the corresponding plurality of flow cell images are acquired at a z-stack of different z-locations or at a single z location with a first resolution; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a reference set by acquiring reference flow cell images of each individual sample by conducting one or more cycles of sequencing reactions, wherein each individual sample is immobilized on a support and comprises concatemer molecules therewithin, wherein the reference flow cell images are acquired at a z-stack of different z-locations with a second resolution; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a training set comprising the corresponding pluralities of training flow cell images; providing, by the processor, the first reconfigurable logic device, or the integrated circuit, the training set as inputs to the neural network; and iteratively generating a training output and determining a value of a predetermined lost function, by a processor, a first reconfigurable logic device, or an integrated circuit, by comparing the training output to the reference set until a predetermined stopping criterion is met, wherein the predetermined stopping criterion corresponds to the value of the predetermined lost function.
[1305] The computer-implemented method of any one of the embodiments, wherein the neural network comprises one or more U-Net units.
[1306] The computer-implemented method of any one of the embodiments, wherein generating, by a processor, a corresponding plurality of training flow cell images for each sample with a first resolution comprises: generating the corresponding plurality of training flow cell images for each sample with a first resolution for a single color channel by simulation.
[1307] The computer-implemented method of any one of the embodiments, wherein comparing the training output to the reference set comprises: calculating mean squared error in image intensity of one or mor pixels in each pair of an image from the reference set and a corresponding image from the training output. [1308] The computer-implemented method of any one of the embodiments, wherein each pair of the image from the reference set and the corresponding image from the training output comprises a same matrix size, a same field of view, a same resolution, or a combination thereof.
[1309] The computer-implemented method of any one of the embodiments, wherein the one or more pixels excludes pixels that are outside of cell boundaries.
[1310] The computer-implemented method of any one of the embodiments, wherein the cell boundaries are determined based on cell staining images acquired from the same individual samples.
[13H] A computer-implemented system for processing flow cell images comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, wherein the first plurality of flow cell images are acquired with a first resolution;
(ii) providing, by a processor, a first reconfigurable logic device, or an integrated circuit, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images;
(iii) predicting, by the processor, the first reconfigurable logic device, or the integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images corresponds to a corresponding image of the first plurality of flow cell images with a second resolution, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions;
(iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and
(v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[1312] A computer-implemented system for processing flow cell images comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising:
(A) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, the sample comprising concatemer molecules therewithin, wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations or at a single z location with a first resolution;
(B) predicting, by the processor, the first reconfigurable logic device, or the integrated circuit, a second plurality of flow cell images from the first plurality of flow cell images using a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images, comprising: performing, by the first reconfigurable logic device or the integrated circuit, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising:
(a) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and
(b) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result; performing, by the processor, the first reconfigurable logic device, or the integrated circuit, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (c) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and
(d) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a sixth convolution result; performing, by the processor, the first reconfigurable logic device, or the integrated circuit, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; predicting, by the processor, the first reconfigurable logic device, or the integrated circuit, a second plurality of flow cell images based on the seventh convolution result, wherein each of the second plurality of flow cell images corresponds to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions; determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base call for each of the determined polonies based on the second plurality of flow cell images.
[1313] A computer-implemented system for processing flow cell images, comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising: generating, by a processor, a first reconfigurable logic device, or an integrated circuit, a corresponding plurality of training flow cell images for each sample with a first resolution by simulation; up-sampling, by the processor, the first reconfigurable logic device, or the integrated circuit, the corresponding plurality of training flow cell images for each sample to a second resolution to generate a reference set comprising high resolution training flow cell images; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a training set comprising the corresponding pluralities of training flow cell images; and providing, by the processor, the first reconfigurable logic device, or the integrated circuit, the training set as inputs to the neural network to generate a training output; repeatedly performing by the processor, the first reconfigurable logic device, or the integrated circuit, until the output error satisfies a stopping criterion, one or more operations comprising: determining an output error by comparing the training output and the reference set; and adjusting current values of parameters of the neural network based on the output error.
[1314] A computer-implemented system for processing flow cell images, comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising: generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding plurality of training flow cell images of each individual sample by conducting one or more cycles of sequencing reactions, wherein each individual sample is immobilized on a support and comprises concatemer molecules therewithin, wherein the corresponding plurality of flow cell images are acquired at a z-stack of different z-locations or at a single z location with a first resolution; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a reference set by acquiring reference flow cell images of each individual sample by conducting one or more cycles of sequencing reactions, wherein each individual sample is immobilized on a support and comprises concatemer molecules therewithin, wherein the corresponding plurality of flow cell images are acquired at a z-stack of different z-locations or at the single z location with a second resolution; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a training set comprising the corresponding pluralities of training flow cell images; providing, by the processor, the first reconfigurable logic device, or the integrated circuit, the training set as inputs to the neural network; and iteratively generating, by the reconfigurable logic device or integrated circuit, a training output and determining a value of a predetermined lost function by comparing the training output to the reference set until a predetermined stopping criterion is met, wherein the predetermined stopping criterion corresponds to the value of the predetermined lost function.
[1315] One or more non-transitory computer readable storage media encoded with instructions stored thereon that, when executed by one or more hardware processors, perform operations for processing flow cell images, the operations comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, the sample comprising concatemer molecules therewithin, wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations or at a single z location with a first resolution;
(ii) providing, by a processor, a first reconfigurable logic device, or an integrated circuit, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images;
(iii) predicting, by the processor, the first reconfigurable logic device, or the integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images corresponds to a corresponding image of the first plurality of flow cell images with a second resolution, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions;
(iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and (v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[1316] One or more non-transitory computer readable storage media encoded with instructions stored thereon that, when executed by one or more hardware processors, perform operations for processing flow cell images, the operations comprising:
(A) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, the sample comprising concatemer molecules therewithin, wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations or at a single z location with a first resolution;
(B) predicting, by the processor, the first reconfigurable logic device, or the integrated circuit, a second plurality of flow cell images from the first plurality of flow cell images using a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images, comprising: performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a first convolution in one or more dimensions on the first plurality of flow cell images, thereby generating a first convolution result; repetitively performing, for one or more times, down-sampling operations comprising:
(e) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a second convolution in one or more dimensions on the first convolution result, thereby generating a second convolution result; and
(f) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a down sampling of the second convolution result by a down sampling factor thereby generating a first down-sampled result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a third convolution result; performing, by the processor, the first reconfigurable logic device, or the integrated circuit, the second convolution in one or more dimensions on the third convolution result, thereby generating a fourth convolution result; repetitively performing, for one or more times, up sampling operations comprising: (g) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, an up sampling of the fourth convolution result by an up sampling factor thereby generating a first up-sampled result; and
(h) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, the second convolution in one or more dimensions of the first up-sampled result, thereby generating a fifth convolution result, wherein in each repetition, the second convolution comprises a corresponding number of filters, thereby generating a sixth convolution result; performing, by the processor, the first reconfigurable logic device, or the integrated circuit, the first convolution in one or more dimensions on the sixth convolution result, thereby generating a seventh convolution result; predicting, by the processor, the first reconfigurable logic device, or the integrated circuit, a second plurality of flow cell images based on the seventh convolution result, wherein each of the second plurality of flow cell images corresponds to the corresponding flow cell image of the first plurality of flow cell images with a second resolution that is 2, 4, 6, 8, 10, 12, or 16 times greater than the first resolution in one or more spatial dimensions; determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base call for each of the determined polonies based on the second plurality of flow cell images.
[1317] One or more non-transitory computer readable storage media encoded with instructions stored thereon that, when executed by one or more hardware processors, perform operations for processing flow cell images, the operations comprising: generating, by a processor, a first reconfigurable logic device, or an integrated circuit, a corresponding plurality of training flow cell images for each sample with a first resolution by simulation; up-sampling, by the processor, the first reconfigurable logic device, or the integrated circuit, the corresponding plurality of training flow cell images for each sample to a second resolution to generate a reference set comprising high resolution training flow cell images; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a training set comprising the corresponding pluralities of training flow cell images; and providing, by the processor, the first reconfigurable logic device, or the integrated circuit, the training set as inputs to the neural network to generate a training output; repeatedly performing, by the processor, the first reconfigurable logic device, or the integrated circuit, until the output error satisfies a stopping criterion, one or more operations comprising: determining, by the processor, the first reconfigurable logic device, or the integrated circuit, an output error by comparing the training output and the reference set; and adjusting, by the processor, the first reconfigurable logic device, or the integrated circuit current values of parameters of the neural network based on the output error.
[1318] One or more non-transitory computer readable storage media encoded with instructions stored thereon that, when executed by one or more hardware processors, perform operations for processing flow cell images, the operations comprising: generating, by a processor, a first reconfigurable logic device, or an integrated circuit, a corresponding plurality of training flow cell images of each individual sample by conducting one or more cycles of sequencing reactions, wherein each individual sample is immobilized on a support and comprises concatemer molecules therewithin, wherein the corresponding plurality of flow cell images are acquired at a z-stack of different z-locations or at a single z location with a first resolution; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a reference set by acquiring reference flow cell images of each individual sample by conducting one or more cycles of sequencing reactions, wherein each individual sample is immobilized on a support and comprises concatemer molecules therewithin, wherein the corresponding plurality of flow cell images are acquired at a z-stack of different z-locations with a second resolution; generating, by the processor, the first reconfigurable logic device, or the integrated circuit, a training set comprising the corresponding pluralities of training flow cell images; providing, by the processor, the first reconfigurable logic device, or the integrated circuit, the training set as inputs to the neural network; and iteratively generating a training output and determining a value of a predetermined lost function by the reconfigurable logic device or integrated circuit by comparing the training output to the reference set until a predetermined stopping criterion is met, wherein the predetermined stopping criterion corresponds to the value of the predetermined lost function.
[1319] One or more non-transitory computer storage media encoded with instructions executable by one or more hardware processors to perform operations, the operations comprising any one of the preceding claims.
[1320] A computer-implemented system for processing flow cell images, comprising: one or more hardware processors; one or more data storage devices storing instructions executable by the one or more hardware processors to cause the one or more hardware processors to perform operations, the operations comprising any one of the preceding claims.
[1321] A computer-implemented method for predicting base calls comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions in one or more color channels, the first plurality of flow cell images comprising a first resolution;
(ia) generating, by a processor, a first reconfigurable logic device, or an integrated circuit, a second plurality of flow cell images comprising a second resolution;
(ii) providing, by the processor or the first reconfigurable logic device, or the integrated circuit, the second plurality of flow cell images as an input to a neural network; or providing, by the processor or the first reconfigurable logic device, or the integrated circuit, the second plurality of flow cell images to a polony map generation algorithm or a base calling algorithm;
(iii) predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network, or predicting, by the first reconfigurable device or the integrated circuit, one or more classifications corresponding to one or more pixels of the second plurality of flow cell images using the neural network,
[1322] The method of any one of the embodiments, wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions.
[1323] The method of any one of the embodiments, further comprising:
(iia) determining, by the processor, the first reconfigurable device, or the integrated circuit, a polony map based on the second plurality of flow cell images; and
(iiia) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding location of the one or more predicted base calls or the one or more predicted classifications based on the polony map.
[1324] The method of any one of the embodiments, wherein the sample comprises concatemer molecules therewithin, and wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations or at a single z location with the first resolution.
[1325] The method of any one of the embodiments, wherein a location of the one or more polonies is determined based on the determined polony map.
[1326] The method of any one of the embodiments, wherein the one or more pixels include at least one pixel that is not comprised in any polony of the polony map.
[1327] The method of any one of the embodiments, wherein the one or more pixels include at least one pixel that is not comprised in any polony in the polony map and at least one pixel that is comprised in at least one polony in the polony map.
[1328] The method of any one of the embodiments, further comprising:
(iv) in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background classification, determining a first morphological feature, a first RNA or mRNA, or a first protein based on the one or more predicted classifications; and
(v) in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification, determining a second morphological feature, a second RNA or mRNA, or a second protein based on the one or more predicted classifications.
[1329] The method of any one of the embodiments, further comprising: (iv) in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background classification, determining at least a first target of a first morphological feature; a first RNA or mRNA; and a first protein based on the one or more predicted classifications; and
(v) in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification, determining at least a second target different from the first target from: the first morphological feature; the first RNA or mRNA; and the first protein based on the one or more predicted classifications.
[1330] The method of any one of the embodiments, further comprising: spatially aligning the location of the first and the second targets based on the one or more predicted classifications; and determining a biological character of the sample immobilized on the support based on the spatial alignment.
[1331] The method of any one of the embodiments, further comprising:
(iv) determining a location of one or more of: a first morphological feature, a first RNA or mRNA, and a first protein based on the corresponding location of the one or more predicted base calls or predicted classifications.
[1332] The method of any one of the embodiments, further comprising:
(v) determining a location of one or more of: a second morphological feature, a second RNA or mRNA, and a second protein based on the corresponding location of one or more second predicted base calls or predicted classifications.
[1333] The method of any one of the embodiments, further comprising:
(vi) spatially aligning the location of one or more of: a second morphological feature, a second RNA or mRNA, and second protein with the location of one or more of: the first morphological feature, the first RNA or mRNA, and the first protein; and
(vii) determining a biological character of the sample immobilized on the support based on the spatial alignment.
[1334] The method of any one of the embodiments, wherein the sample is a cellular sample comprising in situ cells or tissue.
[1335] The method of any one of the embodiments, wherein the neural network is a convolutional neural network. [1336] The method of any one of the embodiments, wherein the sample is a 3D sample, and the neural network is a 3D neural network.
[1337] The method of any one of the embodiments, wherein the sample is 2D sample and wherein the neural network is a 2D neural network.
[1338] The method of any one of the embodiments, wherein the neural network is pretrained using a training data set of training flow cell images
[1339] The method of any one of the embodiments, wherein the neural network is pretrained using the training data set of training flow cell images prior to the operation (iia).
[1340] The method of any one of the embodiments, wherein the neural network is not trained after operation (iia) and prior to operation (iii).
[1341] The method of any one of the embodiments, wherein the sample comprises concatemer molecules therewithin, wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations with a first resolution along a z direction.
[1342] The method of any one of the embodiments, wherein the first plurality of flow cell images are acquired from a single channel.
[1343] The method of any one of the embodiments, wherein the one or more cycles comprises a current cycle N, and the first plurality of flow cell images are acquired from at least one cycle prior to the current cycle N.
[1344] The method of any one of the embodiments, wherein the one or more cycles comprises a plurality of cycles in a sequencing run.
[1345] The method of any one of the embodiments, wherein (iia) determining, by the first reconfigurable device or the integrated circuit, the polony map based on the second plurality of flow cell images comprises: predicting, using the neural network and by the first reconfigurable device or the integrated circuit, the polony map based on the second plurality of flow cell images.
[1346] The method of any one of the embodiments, wherein (iia) comprises: predicting, by the first reconfigurable device or the integrated circuit, a base call corresponding to each polony of the second plurality of flow cell images using the neural network at a third resolution; and determining the polony map based on the predicted base calls and a corresponding quality index of each predicted base call at the third resolution. [1347] The method of any one of the embodiments, wherein the third resolution is at least 2 to 32 times greater than the first/second resolution in one or more spatial dimensions.
[1348] The method of any one of the embodiments, wherein the third resolution is greater than the first and second resolution in one or more spatial dimensions.
[1349] The method of any one of the embodiments, wherein the polony map comprises a spatial coordinate for each of at least some of polonies in the second plurality of flow cell images.
[1350] The method of any one of the embodiments, wherein (ii) providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network comprises:
(ii) providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network without providing a polony map or locations of polonies in the second plurality of flow cell images as the input to the neural network.
[1351] The method of any one of the embodiments, wherein (iii) predicting, by the first reconfigurable device or an integrated circuit, base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network, comprises: extracting a plurality of patches from the second plurality of flow cell images based on the polony map; providing an input to the neural network, the input comprising the plurality of patches, wherein each patch comprises one or more patch images from the one or more color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images; and predicting a plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch.
[1352] The method of any one of the embodiments, wherein each corresponding patch comprises a polony located at or in close vicinity to a center of the corresponding patch.
[1353] The method of any one of the embodiments, each patch comprises 2 to 128 pixels along a spatial dimension.
[1354] The method of any one of the embodiments, the plurality of patches comprises 100 to 106 patches. [1355] The method of any one of the embodiments, wherein at least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise identical pixels.
[1356] The method of any one of the embodiments, wherein each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
[1357] The method of any one of the embodiments, the one or more color channels comprises 2, 3, or 4 color channels.
[1358] The method of any one of the embodiments, the method further comprising: up-sampling the first plurality of flow cell images to generate the second plurality of flow cell images.
[1359] The method of any one of the embodiments, wherein each patch comprises one or more patch images from the one or more color channels and multiple cycles.
[1360] The method of any one of the embodiments, wherein the multiple cycles are continuous cycles of a sequencing run.
[1361] The method of any one of the embodiments, wherein predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: generating a probability map for each channel of the one or more color channels corresponding to the corresponding patch; determining the base call of the corresponding patch based on the probability maps.
[1362] The method of any one of the embodiments, wherein predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: generating a first single intensity for a first channel of the one or more color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the single intensity.
[1363] The method of any one of the embodiments, further comprising: predicting a second single intensity for a second channel of the one or more color channels corresponding to the corresponding patch using a second neural network; and determining the base call of the corresponding patch based on at least the first single intensity and the second single intensity. [1364] The method of any one of the embodiments, further comprising: predicting a second single intensity for a second channel of the one or more color channels corresponding to the corresponding patch using a second neural network; and predicting a third single intensity for a third channel of the one or more color channels corresponding to the corresponding patch using a third neural network; and determining the base call of the corresponding patch based on at least the first, second, and third single intensities.
[1365] The method of any one of the embodiments, wherein (iii) predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network comprises: determining two or more pixels of the second plurality of flow cell images as duplications of a single polony; and selecting one pixel of the two or more pixels as a center of the single polony.
[1366] A sequencing method comprising:
(a) acquiring, by an imager of a sequencing system, a training set comprising a plurality of training flow cell images of one or more samples immobilized on a support;
(b) up-sampling the plurality of training flow cell images to generate high resolution training flow cell images having a second resolution;
(c) generating, by the sequencing system, reference intensities corresponding to the intensities in the high resolution training flow cell images based on base calls of the high resolution training flow cell images;
(d) providing the reference intensities and the high resolution training flow cell images;
(e) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output and the reference intensities; and adjusting current values of parameters of the neural network based on the output error; and
(f) generating a trained neural network with adjusted parameters. [1367] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from a single channel, a single z level, or both of the one or more samples.
[1368] The method of any one of the embodiments, the one or more color channels comprises 2, 3, or 4 color channels.
[1369] The method of any one of the embodiments, wherein (d) providing the reference intensities comprises: providing the reference intensities in a plurality of patches, wherein each patch comprises one or more patch images from the one or more color channels.
[1370] The method of any one of the embodiments, wherein the one or more patch images comprises multiple images from adjacent cycles.
[1371] The method of any one of the embodiments, wherein the training flow cell images are from one or more color channels and multiple cycles.
[1372] The method of any one of the embodiments, wherein the training flow cell images are of one or more samples immobilized on a flow cell device.
[1373] The method of any one of the embodiments, wherein the one or more samples are in situ samples.
[1374] The method of any one of the embodiments, wherein at least part of the one or more samples comprises predetermined bases in the one or more cycles.
[1375] The method of any one of the embodiments, further comprising: determining, by the sequencing system, a location list of polonies in the plurality of training flow cell images; and extracting, by the sequencing system, intensities in the plurality of training flow cell images based on the location list.
[1376] The method of any one of the embodiments, further comprising: determining, by the sequencing system, a location list of polonies in the high resolution training flow cell images; and extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
[1377] The method of any one of the embodiments, wherein providing the reference intensities comprises: providing reference intensities, the high resolution training flow cell images, and the location list to the neural network. [1378] The method of any one of the embodiments, wherein generating the reference intensities in the high resolution training flow cell images based on the base calls of the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference intensity.
[1379] The method of any one of the embodiments, wherein determining an output error by comparing the training output and the reference intensities comprises: determining an output error by comparing the training output comprising predicted intensities and the reference intensities, wherein the predicted intensities are at locations in the location list.
[1380] The method of any one of the embodiments, wherein the trained neural network is a convolution neural network.
[1381] The method of any one of the embodiments, wherein the sample is an in situ sample, and the trained neural network is a 3D neural network.
[1382] The method of any one of the embodiments, wherein the sample is a 2D sample, and the trained neural network is a 2D neural network.
[1383] The method of any one of the embodiments, wherein the input to the neural network lacks any location of polonies in the plurality of training flow cell images or any location of polonies the high resolution training flow cell images.
[1384] A sequencing method comprising:
(a) acquiring, by an imager of a sequencing system, a training set comprising a plurality of training flow cell images of one or more samples immobilized on a support;
(b) up-sampling the corresponding plurality of training flow cell images to generate high resolution training flow cell images having a second resolution;
(c) generating, by the sequencing system, references corresponding to the intensities in the high resolution training flow cell images;
(d) providing the references and the high resolution training flow cell images;
(e) repeatedly performing, until the output error satisfies a stopping criterion, training operations comprising: determining an output error by comparing the training output to the references; and adjusting current values of parameters of the neural network based on the output error; and (f) generating a trained neural network with adjusted parameters.
[1385] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from a single color channel.
[1386] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from a single z level of the one or more samples.
[1387] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from one or more cycles.
[1388] The method of any one of the embodiments, the one or more color channels comprises 2, 3, or 4 color channels.
[1389] The method of any one of the embodiments, wherein (d) providing the references and the plurality of training flow cell images comprises: providing the references in a plurality of patches, wherein each patch comprise one or more patch images from one or more color channels.
[1390] The method of any one of the embodiments, wherein the one or more patch images comprises multiple images from adjacent cycles in a sequencing run.
[1391] The method of any one of the embodiments, wherein the plurality of training flow cell images are from multiple color channels and multiple cycles.
[1392] The method of any one of the embodiments, wherein the one or more samples are in situ samples.
[1393] The method of any one of the embodiments, wherein at least part of the one or more samples comprises predetermined bases in the one or more cycles.
[1394] The method of any one of the embodiments, further comprising: determining, by the sequencing system, a location list of polonies in the plurality of training flow cell images; and extracting, by the sequencing system, intensities in the plurality of training flow cell images based on the location list.
[1395] The method of any one of the embodiments, further comprising: determining, by the sequencing system, a location list of polonies in the high resolution training flow cell images; and extracting, by the sequencing system, intensities in the high resolution training flow cell images based on the location list.
[1396] The method of any one of the embodiments, wherein providing the references and the high resolution training flow cell images comprises: providing the references, the high resolution training flow cell images, and the location list to the neural network.
[1397] The method of any one of the embodiments, wherein generating the references in the high resolution training flow cell images comprises: performing color correction on each extracted intensity in the high resolution training flow cell image thereby generating the corresponding reference.
[1398] The method of any one of the embodiments, wherein determining an output error by comparing the training output and the references comprises: determining an output error by comparing the training output comprising predicted values and the references, wherein at least some of the predicted values are at locations in the location list.
[1399] The method of any one of the embodiments, wherein the trained neural network is a convolution neural network.
[1400] The method of any one of the embodiments, wherein the sample is an in situ sample, and the trained neural network is a 3D neural network.
[1401] The method of any one of the embodiments, wherein the sample is a 2D sample, and the trained neural network is a 2D neural network.
[1402] The method of any one of the embodiments, wherein the input to the neural network lacks any location of polonies in the plurality of training flow cell images or any location of polonies the high resolution training flow cell images.
[1403] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from a single channel, wherein the references comprise reference polony maps, and wherein each reference polony map corresponds to at least a portion of an image of the plurality of high resolution training flow cell images.
[1404] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from multiple color channels, wherein the references comprise reference polony maps, and wherein each reference polony map corresponds to at least a portion of an image of the plurality of high resolution training flow cell images.
[1405] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from multiple color channels, wherein the references comprises reference base calls, and wherein each reference base call corresponds to a polony in the plurality of high resolution training flow cell images. [1406] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from multiple color channels, wherein the references comprises reference classifications, and wherein each reference classification corresponds to a pixel in the plurality of high resolution training flow cell images from the multiple color channels.
[1407] The method of any one of the embodiments, wherein the plurality of training flow cell images are acquired from multiple color channels, wherein the references comprises reference classifications, wherein a first reference classification corresponds to a pixel of a polony, and a second reference classification corresponds to a pixel outside of any polony in the plurality of high resolution training flow cell images from the multiple color channels.
[1408] The method of any one of the embodiments, wherein the neural network comprises a first image processing part and a second base calling part, and wherein adjusting the current values of parameters of the neural network based on the output error comprises: adjusting the current values of first parameters only in the image processing part of the neural network without adjusting values of second parameters in the base calling part of the neural network.
[1409] A computer-implemented method comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, wherein the first plurality of flow cell images are acquired with a first resolution;
(ii) providing, by a processor or a first reconfigurable logic device, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images and reference base calls of the training dataset;
(iii) predicting, by the first reconfigurable device or an integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images is with a second resolution and corresponds to a corresponding image of the first plurality of flow cell images, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions; (iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and
(v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
[1410] The computer-implemented method of any one of the embodiments, wherein the sample is a 2D or 3D sample.
[14H] The computer-implemented method of any one of the embodiments, wherein the sample is a cellular sample comprising a cell, tissue, or a combination thereof.
[1412] The computer-implemented method of any one of the embodiments, wherein the neural network is a convolutional neural network.
[1413] The computer-implemented method of any one of the embodiments, wherein the neural network is pre-trained using one or more loss functions based on comparing training base calls of the training flow cell images to the reference base calls of the training flow cell images.
[1414] The computer-implemented method of any one of the embodiments, wherein the training flow cell images are acquired only from a same color channel.
[1415] The computer-implemented method of any one of the embodiments, wherein each of the training flow cell images comprise flow cell images of a same field of view from a plurality of sequencing cycles stacked along a time dimension.
[1416] The computer-implemented method of any one of the embodiments, wherein each of the training flow cell images comprise flow cell images of a same field of view from one or more sequencing cycles.
[1417] The computer-implemented method of any one of the embodiments, wherein each of the training flow cell images comprise flow cell images of the sample at one or more z- levels.
[1418] The computer-implemented method of any one of the embodiments, wherein (iii) predicting the second plurality of flow cell images using the neural network comprises predicting high resolution post-processing images of the first plurality of flow cell images, and wherein the processing comprises one or more of noise reduction, background reduction; intensity offset correction; intensity normalization; color correction; phasing and/or dephasing; image registration; and deconvolution. [1419] The computer-implemented method of any one of the embodiments, wherein (v) performing the corresponding base calling for each of the determined polonies based on the second plurality of flow cell images lacks usage of a neural network.
[1420] The computer-implemented method of any one of the embodiments, wherein (iv) determining the polonies from the second plurality of flow cell images lacks usage of a neural network.
[1421] The computer-implemented method of any one of the embodiments, wherein (iv) determining the polonies from the second plurality of flow cell images comprises determining a location for a center of each of the polonies from the second plurality of flow cell images.
[1422] The computer-implemented method of any one of the embodiments, wherein (iv) determining the polonies from the second plurality of flow cell images lacks usage of a neural network.
[1423] The computer-implemented method of any one of the embodiments, wherein the neural network comprises a first image processing part and a second base calling part.
[1424] The computer-implemented method of any one of the embodiments, wherein predicting, by the first reconfigurable device or the integrated circuit, the second plurality of flow cell images using the neural network comprises: generating output images from the first image processing part of the neural network as the second plurality of flow cell images without going through the second base calling part of the neural network.
[1425] The computer-implemented method of any one of the embodiments, wherein the first image processing part comprises at least part of one or more of: an input layer, a hidden layer, an embedding layer, an output layer, and an encoder of the neural network, and wherein the second base calling part lacks any one of: a convolutional layer, a pooling layer, an embedding layer, an output layer, an encoder, and a decoder of the neural network.
[1426] The computer-implemented method of any one of the embodiments, wherein the first image processing part comprises at least part of one or more of: an input layer, a hidden layer, an embedding layer, and an encoder of the neural network, and wherein the second base calling part comprises at least part of one or more of: an input layer, an output layer, a convolutional layer, a pooling layer, an embedding layer, an encoder, and a decoder of the neural network. [1427] The computer-implemented method of any one of the embodiments, wherein the processor comprises one or more of: a CPU, a GPU, a TPU, a NPU, a FPGA, and an Al chip.
[1428] The computer-implemented method of any one of the embodiments, wherein the processor comprises one or more of: a CPU, a NPU, and a FPGA.
[1429] The computer-implemented method of any one of the embodiments, wherein the first reconfigurable logic device comprises one or more FPGA units.
[1430] The computer-implemented method of any one of the embodiments, wherein the integrated circuit comprises one or more NPUs.
[1431] The computer-implemented method of any one of the embodiments, wherein the integrated circuit is in data communication with the first reconfigurable logic device.
[1432] The computer-implemented method of any one of the embodiments, wherein the convolutional network comprises a U-Net.
[1433] The computer-implemented method of any one of the embodiments, wherein the first convolution comprises a 3D convolution with a convolution kernel.
[1434] The computer-implemented method of any one of the embodiments, wherein the convolutional kernel have at least four dimension.
[1435] The computer-implemented method of any one of the embodiments, wherein the convolutional kernel is m x m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[1436] The sequencing system of any one of the embodiments, wherein the first convolution comprises a 2D convolution with a convolution kernel.
[1437] The sequencing system of any one of the embodiments, wherein the convolutional kernel have at least three dimension.
[1438] The sequencing system of any one of the embodiments, wherein the convolutional kernel is m x m x n, wherein m is an integer in a range from 3 to 30, wherein n is an integer.
[1439] The sequencing system of any one of the embodiments, wherein the neural network is a 3D convolutional neural network.
[1440] The sequencing system of any one of the embodiments, wherein the neural network is a 2D convolutional neural network.
[1441] The computer-implemented method of any one of the embodiments, wherein n is an integer from 1 to 16384. [1442] The computer-implemented method of any one of the embodiments, wherein the first and second resolution is in 3D.
[1443] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are from a single color channel.
[1444] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are from one or more color channels.
[1445] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are of unbalanced nucleotide diversity.
[1446] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles.
[1447] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises: a balanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles.
[1448] The computer-implemented method of any one of the embodiments, wherein two or more different concatemer molecules among the concatemer molecules have different insert sequences.
[1449] The computer-implemented method of any one of the embodiments, wherein different insert sequences correspond to different target RNA molecules or target cDNA molecules.
[1450] The computer-implemented method of any one of the embodiments, wherein each location of the determined polonies corresponds to a location of the concatemer molecules.
[1451] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a balanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support.
[1452] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical signals emitted from nucleotide reagents bound to a unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules immobilized on the support in the one or more subsequent cycles.
[1453] The computer-implemented method of any one of the embodiments, wherein the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
[1454] The computer-implemented method of any one of the embodiments, wherein the balanced diversity of nucleotide bases of A, G, C and T/U among the plurality of concatemer molecules comprises: a percentage of (1) a number of each type of nucleotide bases to (2) a total number of bases in the one or more cycles is more than 10%, 15%, or 20%.
[1455] The computer-implemented method of any one of the embodiments, wherein the sample comprises overloaded concatemer molecules with a spatial density in a range of
102 -1015 per mm2.
[1456] The computer-implemented method of any one of the embodiments, wherein the sample comprises overloaded concatemer molecules with a spatial density in a range of
103 -IO10 per mm2.
[1457] The computer-implemented method of any one of the embodiments, wherein the first resolution is in a range of 0.1 um to 5 um.
[1458] The computer-implemented method of any one of the embodiments, wherein the second resolution in a range of 0.01 um to 2 um.
[1459] The computer-implemented method of any one of the embodiments, wherein the down-sampling factor is 2, 4, or 8.
[1460] The computer-implemented method of any one of the embodiments, wherein the up-sampling factor is 2, 4, or 8.
[1461] The computer-implemented method of any one of the embodiments, wherein one or more of operations are performed while a sequencing run is being performed.
[1462] The computer-implemented method of any one of the embodiments, wherein the one or more cycles comprises a current cycle N.
[1463] The computer-implemented method of any one of the embodiments, wherein N is in a range from 1 to 500. [1464] The computer-implemented method of any one of the embodiments, wherein the one or more cycles comprises a single cycle ranging from 1 to 500.
[1465] The computer-implemented method of any one of the embodiments, wherein the one or more cycles comprises multiple cycles ranging from 1 to 500.
[1466] The computer-implemented method of any one of the embodiments, wherein one or more of operations are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
[1467] The computer-implemented method of any one of the embodiments, wherein the training data set of flow cell images comprises z-stacks of flow cell images taken at different z-locations.
[1468] The computer-implemented method of any one of the embodiments, wherein the z-axis is orthogonal to image planes of the flow cell images.
[1469] The computer-implemented method of any one of the embodiments, wherein the second resolution is 4, 6, or 8 times greater than the first resolution in all three dimensions.
[1470] The computer-implemented method of any one of the embodiments, further comprising: registering the second plurality of flow cell images to a common coordinate system.
[1471] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images are acquired from a single color channel of the sequencing system.
[1472] The computer-implemented method of any one of the embodiments, wherein generating a polony map comprising spatial location of polonies based on the determined polonies further comprises: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus.
[1473] The computer-implemented method of any one of the embodiments, wherein determining, by the processor, polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
[1474] The computer-implemented method of any one of the embodiments, wherein the support comprises a glass or plastic substrate.
[1475] The computer-implemented method of any one of the embodiments, wherein the support is comprised in a flow cell device.
[1476] The computer-implemented method of any one of the embodiments, further comprising: providing the sample harboring a plurality of RNA which comprises the first target RNA molecule and the second target RNA molecule.
[1477] The computer-implemented method of any one of the embodiments, further comprising: generating inside the sample a plurality of cDNA molecules which include a first target cDNA molecule that corresponds to the first target RNA molecule and a second target cDNA molecule that corresponds to the second target RNA molecule.
[1478] The computer-implemented method of any one of the embodiments, further comprising: contacting the plurality of cDNA molecules in the sample with a plurality of target-specific padlock probes which includes at least a first plurality of first targetspecific padlock probes and a second plurality of second target-specific padlock probes.
[1479] The computer-implemented method of any one of the embodiments, further comprising: contacting the plurality of RNA molecules in the sample with a plurality of targetspecific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
[1480] The computer-implemented method of any one of the embodiments, wherein individual padlock probes in the first plurality of first target-specific padlock probes comprise: first and second terminal regions, wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule or the first target RNA molecule, and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule or the first target RNA molecule. [1481] The computer-implemented method of any one of the embodiments, wherein contacting the plurality of RNA molecules in the sample with the plurality of targetspecific padlock probes comprises: hybridizing the first and second terminal regions of the first target-specific padlock probes to proximal positions on the first target cDNA molecule or the first target RNA molecule to form a circularized first target-specific padlock probe having a nick or gap between the hybridized first and second terminal regions.
[1482] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a first target barcode sequence that corresponds to and uniquely identifies the first target cDNA sequence or the first target RNA sequence.
[1483] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule or the first target RNA sequence.
[1484] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises at least one universal adaptor sequence.
[1485] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer or a complementary sequence thereof.
[1486] The computer-implemented method of any one of the embodiments, wherein the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site or a complementary sequence thereof.
[1487] The computer-implemented method of any one of the embodiments, further comprising: closing the nick or gap in the at least first and second circularized target-specific padlock probes by conducting an enzymatic reaction, thereby generating at least a first covalently closed circular padlock probe and a second covalently closed circular padlock probe inside the sample.
[1488] The computer-implemented method of any one of the embodiments, further comprising: conducting a rolling circle amplification reaction inside the sample using the first and second covalently closed circular padlock probes as template molecules, thereby generating a plurality of concatemer molecules including at least the first concatemer molecule that corresponds to the first target RNA molecule, and the second concatemer molecule that corresponds to the second target RNA molecule.
[1489] The computer-implemented method of any one of the embodiments, wherein the first concatemer comprises: tandem repeat units of: a first target barcode sequence that uniquely identifies the first target RNA or the first target cDNA sequence, a first insert sequences that corresponds to the first target RNA or the first target cDNA, and a first sequencing primer binding site or a complementary sequence thereof.
[1490] The computer-implemented method of any one of the embodiments, wherein the first concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
[1491] The computer-implemented method of any one of the embodiments, wherein the second concatemer comprises: tandem repeat units of: a second target barcode sequence that uniquely identifies the second target RNA or the second target cDNA sequence, a second insert sequences that corresponds to the second target RNA or the second target cDNA, and a second sequencing primer binding site or a complementary sequence thereof.
[1492] The computer-implemented method of any one of the embodiments, wherein the second concatemer further comprises: a universal binding site for an amplification primer or a complementary sequence thereof, and a universal binding site for a compaction oligonucleotide or a complementary sequence thereof.
[1493] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: contacting the plurality of concatemer molecules inside the sample with (i) a plurality of universal sequencing primers, (ii) a plurality of sequencing polymerases, and (iii) a plurality of nucleotide reagents, under a condition suitable for hybridizing the plurality of universal sequencing primers to their respective universal sequencing primer binding sites on the concatemers.
[1494] The computer-implemented method of any one of the embodiments, wherein the plurality of nucleotide reagents comprise: multivalent molecules, nucleotides, nucleotide analogs, or their combinations. [1495] The computer-implemented method of any one of the embodiments, wherein individual nucleotides or nucleotide analogs are detectably labeled or non-labeled.
[1496] The computer-implemented method of any one of the embodiments, wherein the detectably labeled individual nucleotides or nucleotide analogs comprises a different detectable color label that corresponds with each different type of nucleotide base of A, G, C, and T/U.
[1497] The computer-implemented method of any one of the embodiments, wherein an individual multivalent molecule comprise a core attached with multiple nucleotide arms and each arm of the individual multivalent molecule comprises the same type of nucleotide base.
[1498] The computer-implemented method of any one of the embodiments, wherein generating the first plurality of flow cell images comprises: in each cycle, imaging, by an optical system, optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
[1499] The computer-implemented method of any one of the embodiments, wherein the first plurality of flow cell images comprises optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
[1500] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing only the first target barcode sequence region of the first concatemer, thereby generating the first sequencing read product.
[1501] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing the first target barcode sequence region and at least a portion of the first insert sequence of the first concatemer, thereby generating the first sequencing read product.
[1502] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing only the second target barcode sequence region of the second concatemer, thereby generating the second sequencing read product.
[1503] The computer-implemented method of any one of the embodiments, wherein conducting the one or more cycles of sequencing reactions comprises: sequencing the second target barcode sequence region and at least a portion of the second insert sequence of the second concatemer, thereby generating the second sequencing read product.
[1504] The computer-implemented method of any one of the embodiments, further comprising: removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the sample.
[1505] The computer-implemented method of any one of the embodiments, further comprising: reiteratively sequencing the plurality of concatemers by repeating the following operations for at least once: generating the first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions thereby generating the first sequencing read product and the second sequencing product, the sample comprising a plurality of concatemer molecules therewithin, wherein a first concatemer molecule of the plurality of concatemer molecules corresponds to a first target RNA molecule of the sample, and a second concatemer molecule of the plurality of concatemer molecules corresponds to a second target RNA molecule of the sample, wherein the first plurality of flow cell images; and removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the sample.
[1506] The computer-implemented method of any one of the embodiments, wherein the first sequencing read product comprises some or all of: a first target barcode sequence in one or more tandem units of the first concatemer molecule; a first insert sequence in one or more tandem units of the first concatemer molecule; or their combinations.
[1507] The computer-implemented method of any one of the embodiments, further comprising: confirming presence of the first target RNA molecule, the second target RNA molecule, or both molecules in the sample based on the performed base calling of the second plurality of flow cell images at the base calling locations in the base calling template.
[1508] The computer-implemented method of any one of the embodiments, further comprising: generating, by the sequencing system, the second plurality of flow cell images of the sample immobilized on the support by conducting subsequent cycles of sequencing reactions after the one or more cycles.
[1509] The computer-implemented method of any one of the embodiments, wherein generating the first plurality of flow cell images of the sample immobilized on the support comprises: sequencing at least the first concatemer inside the sample under a condition that inhibits sequencing the second concatemer.
[1510] The computer-implemented method of any one of the embodiments, wherein sequencing at least the first concatemer inside the sample comprises: generating a plurality of first sequencing read products, and wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm presence of the first target RNA in the sample.
[15H] The computer-implemented method of any one of the embodiments, wherein generating the first plurality of flow cell images of the sample immobilized on the support comprises: sequencing at least the second concatemer inside the sample under a condition that inhibits sequencing the first concatemer.
[1512] The computer-implemented method of any one of the embodiments, wherein sequencing at least the second concatemer inside the cellular sample comprises: generating a plurality of second sequencing read products, and wherein sequences of the second sequencing read products are aligned with a second target reference sequence to confirm presence of the second target RNA in the sample.
[1513] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. The breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A computer-implemented method comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions in one or more color channels, wherein the first plurality of flow cell images are acquired with a first resolution;
(ii) providing, by a processor or a first reconfigurable logic device, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images;
(iii) predicting, by the first reconfigurable device or an integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images is of a second resolution and corresponds to a corresponding image of the first plurality of flow cell images, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions;
(iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and
(v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
2. A computer-implemented method for predicting base calls comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions in one or more color channels, the first plurality of flow cell images comprising a first resolution;
(ia) generating, by a processor, a first reconfigurable logic device, or an integrated circuit, a second plurality of flow cell images comprising a second resolution, wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions;
(ii) providing, by the processor or the first reconfigurable logic device, or the integrated circuit, the second plurality of flow cell images as an input to a neural network; or providing, by the processor or the first reconfigurable logic device, or the integrated circuit, the second plurality of flow cell images to a polony map generation algorithm or a base calling algorithm;
(iii) predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network; or predicting, by the first reconfigurable device or the integrated circuit, one or more classifications corresponding to one or more pixels of the second plurality of flow cell images using the neural network.
3. The computer-implemented method of claim 2 further comprising:
(iia) determining, by the processor, the first reconfigurable device, or the integrated circuit, a polony map based on the second plurality of flow cell images; and
(iiia) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding location of the one or more predicted base calls or the one or more predicted classifications based on the polony map.
4. The computer-implemented method of claim 2, wherein the one or more pixels include at least one pixel that is not comprised in any polony of the polony map.
5. The computer-implemented method of claim 2, wherein the one or more pixels include at least one pixel that is not comprised in any polony in the polony map and at least one pixel that is comprised in at least one polony in the polony map.
6. The computer-implemented method of claim 2 further comprising:
(iv) in response to determining that a first pixel of the one or more pixels has a predicted classification that is different from a background classification, determining a first morphological feature, a first RNA or mRNA, or a first protein based on the one or more predicted classifications; and
(v) in response to determining that a second pixel of the one or more pixels has a predicted classification that is different from the background classification, determining a second morphological feature, a second RNA or mRNA, or a second protein based on the one or more predicted classifications.
7. The computer-implemented method of claim 3 further comprising: (iv) determining a location of one or more of: a first morphological feature, a first RNA or mRNA, and a first protein based on the corresponding location of the one or more predicted base calls or predicted classifications.
8. The computer-implemented method of claim 7 further comprising:
(v) determining a location of one or more of: a second morphological feature, a second RNA or mRNA, and a second protein based on the corresponding location of one or more second predicted base calls or predicted classifications.
9. The computer-implemented method claim 8 further comprising:
(vi) spatially aligning the location of one or more of: a second morphological feature, a second RNA or mRNA, and second protein with the location of one or more of: the first morphological feature, the first RNA or mRNA, and the first protein; and
(vii) determining a biological character of the sample immobilized on the support based on the spatial alignment.
10. The computer-implemented method of any one of claims 2-9, wherein the neural network is pre-trained using a training data set of training flow cell images.
11. The computer-implemented method of claim 10, wherein the neural network is pretrained using the training data set of training flow cell images prior to the operation (iia).
12. The computer-implemented method of any one of the preceding claims, wherein the neural network is not trained or retrained after operation (iia) and prior to operation (iii).
13. The computer-implemented method of any one of the preceding claims, wherein the sample comprises concatemer molecules therewithin, wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations with the first resolution along a z direction.
14. The computer-implemented method of any one of the preceding claims, wherein the first plurality of flow cell images are acquired from a single color channel.
15. The computer-implemented method of any one of the preceding claims, wherein the one or more cycles comprises a current cycle N, and the first plurality of flow cell images are acquired from at least one cycle prior to the current cycle N.
16. The computer-implemented method of any one of the preceding claims, wherein the one or more cycles comprises a plurality of cycles in a sequencing run.
17. The computer-implemented method of any one of the preceding claims, wherein (iia) determining, by the first reconfigurable device or the integrated circuit, the polony map based on the second plurality of flow cell images comprises: predicting, using the neural network and by the first reconfigurable device or the integrated circuit, the polony map based on the second plurality of flow cell images.
18. The computer-implemented method of any one of the preceding claims, wherein (iia) comprises: predicting, by the first reconfigurable device or the integrated circuit, a base call corresponding to each polony of the second plurality of flow cell images using the neural network at the second resolution or a third resolution; and determining the polony map based on the predicted base calls and a corresponding quality index of each predicted base call at the second or third resolution.
19. The computer-implemented method of any one of the preceding claims, wherein the third resolution is at least 2 to 32 times greater than the first or second resolution in one or more spatial dimensions.
20. The computer-implemented method of any one of the preceding claims, wherein the third resolution is greater than the first and second resolution in one or more spatial dimensions.
21. The computer-implemented method of any one of the preceding claims, wherein the polony map comprises a spatial coordinate for each of at least some of polonies in the second plurality of flow cell images.
22. The computer-implemented method of any one of claims 2-21, wherein (ii) providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network comprises:
(ii) providing, by the processor or the first reconfigurable logic device, the second plurality of flow cell images as the input to the neural network without providing a polony map or locations of polonies in the second plurality of flow cell images as the input to the neural network.
23. The computer-implemented method of any one of claims 2-22, wherein (iii) predicting, by the first reconfigurable device or an integrated circuit, base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network, comprises: extracting a plurality of patches from the second plurality of flow cell images based on the polony map; providing an input to the neural network, the input comprising the plurality of patches, wherein each patch comprises one or more patch images from the one or more color channels, and wherein each patch comprises at least a portion of the second plurality of flow cell images; and predicting a plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch.
24. The computer-implemented method of claim 23, wherein each corresponding patch comprises a polony located at or in close vicinity to a center of the corresponding patch.
25. The computer-implemented method of any one of claims 23-24, wherein each corresponding patch comprises 2 to 128 pixels along a spatial dimension.
26. The computer-implemented method of any one of claims 23-25, the plurality of patches comprises 100 to 106 patches.
27. The computer-implemented method of any one of claims 23-26, wherein at least two patches of the plurality of patches comprise at least partially overlapped patch images that comprise identical pixels.
28. The computer-implemented method of any one of claims 23-27, wherein each patch of the plurality of patches comprise at least partially overlapped pixels with another patch of the plurality of patches.
29. The computer-implemented method of any one of the preceding claims, the one or more color channels comprises 2, 3, or 4 color channels.
30. The computer-implemented method of any one of claims 2-29, the method further comprising: up-sampling the first plurality of flow cell images to generate the second plurality of flow cell images.
31. The computer-implemented method of any one of claims 23-30, wherein each patch comprises one or more patch images from the one or more color channels and one or more cycles.
32. The computer-implemented method of any one of the preceding claims, wherein the one or more cycles comprise continuous cycles of a sequencing run.
33. The computer-implemented method of any one of claims 23-32, wherein predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: predicting a probability map for each channel of the one or more color channels corresponding to the corresponding patch, wherein each probability map comprise probability values of a base calling for each pixel of the corresponding patch; and determining the base call of the corresponding patch based on the probability maps of the one or more channels.
34. The computer-implemented method of any one of claims 23-33, wherein predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: generating a first single intensity for a first channel of the one or more color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the single intensity.
35. The computer-implemented method of any one of claims 23-34, wherein predicting the plurality of base calls using the neural network and based on the input, wherein each base call corresponds to a corresponding patch comprises: generating a first single intensity for a first channel of the one or more color channels corresponding to the corresponding patch; and determining the base call of the corresponding patch based on the single intensity.
36. The computer-implemented method of claim 35 further comprising: predicting a second single intensity for a second channel of the one or more color channels corresponding to the corresponding patch using a second neural network; and determining the base call of the corresponding patch based on at least the first single intensity and the second single intensity.
37. The computer-implemented method of claim 36 further comprising: predicting a second single intensity for a second channel of the one or more color channels corresponding to the corresponding patch using a second neural network; and predicting a third single intensity for a third channel of the one or more color channels corresponding to the corresponding patch using a third neural network; and determining the base call of the corresponding patch based on at least the first, second, and third single intensities.
38. The method of claim 2, wherein (iii) predicting, by the first reconfigurable device or the integrated circuit, one or more base calls corresponding to one or more polonies of the second plurality of flow cell images using the neural network comprises: determining two or more pixels of the second plurality of flow cell images as duplications of a single polony; and selecting one pixel of the two or more pixels as a center of the single polony.
39. The computer-implemented method of claim 1, wherein the sample comprises concatemer molecules therewithin, and wherein the first plurality of flow cell images are acquired at a z-stack of different z-locations.
40. A computer-implemented method comprising:
(i) generating, by a sequencing system, a first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions, wherein the first plurality of flow cell images are of a first resolution;
(ii) providing, by a processor or a first reconfigurable logic device, the first plurality of flow cell images as an input to a neural network, wherein the neural network is pre-trained using a training data set of training flow cell images and reference base calls of the training dataset;
(iii) predicting, by the first reconfigurable device or an integrated circuit, a second plurality of flow cell images using the neural network, wherein each of the second plurality of flow cell images is with a second resolution and corresponds to a corresponding image of the first plurality of flow cell images, and wherein the second resolution is at least 2 to 32 times greater than the first resolution in one or more spatial dimensions;
(iv) determining, by the processor, the first reconfigurable logic device, or the integrated circuit, polonies from the second plurality of flow cell images; and
(v) performing, by the processor, the first reconfigurable logic device, or the integrated circuit, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images.
41. The method of any one of the preceding claims, wherein the sample is a cellular sample comprising in situ cells, tissue, both.
42. The method of any one of the preceding claims, wherein the neural network is a convolutional neural network.
43. The method of any one of the preceding claims, wherein the sample is a 3D sample, and the neural network is a 3D neural network.
44. The method of any one of the preceding claims, wherein the sample is 2D sample and wherein the neural network is a 2D neural network.
45. The computer-implemented method of any one of the preceding claims, wherein the neural network is a convolutional neural network.
46. The computer-implemented method of any one of the preceding claims, wherein the neural network is pre-trained using one or more loss functions based on comparing training base calls of the training flow cell images to the reference base calls of the training flow cell images.
47. The computer-implemented method of any one of the preceding claims, wherein the training flow cell images are only from a single color channel.
48. The computer-implemented method of any one of the preceding claims, wherein each of the training flow cell images comprise flow cell images of a same field of view from a plurality of sequencing cycles stacked along a time dimension.
49. The computer-implemented method of any one of the preceding claims, wherein each of the training flow cell images comprise flow cell images of a same field of view from one or more cycles.
50. The computer-implemented method of any one of the preceding claims, wherein each of the training flow cell images comprise flow cell images of the sample at one or more z- levels.
51. The computer-implemented method of any one of the preceding claims, wherein (iii) predicting the second plurality of flow cell images using the neural network comprises predicting high resolution post-processing images of the first plurality of flow cell images, and wherein the processing comprises one or more of: noise reduction, background reduction; intensity offset correction; intensity normalization; color correction; phasing and/or dephasing; image registration; and deconvolution.
52. The computer-implemented method of any one of the preceding claims, wherein (v) performing the corresponding base calling for each of the determined polonies based on the second plurality of flow cell images lacks usage of a neural network.
53. The computer-implemented method of any one of the preceding claims, wherein (iv) determining the polonies from the second plurality of flow cell images lacks usage of a neural network.
54. The computer-implemented method of any one of the preceding claims, wherein (iv) determining the polonies from the second plurality of flow cell images comprises determining a location for a center of each of the polonies from the second plurality of flow cell images.
55. The computer-implemented method of any one of the preceding claims, wherein (iv) determining the polonies from the second plurality of flow cell images lacks usage of a neural network.
56. The computer-implemented method of any one of the preceding claims, wherein the neural network comprises a first image processing part and a second base calling part.
57. The computer-implemented method of any one of the preceding claims, wherein predicting, by the first reconfigurable device or the integrated circuit, the second plurality of flow cell images using the neural network comprises: generating output images from the first image processing part of the neural network as the second plurality of flow cell images without going through the second base calling part of the neural network.
58. The computer-implemented method of any one of the preceding claims, wherein the first image processing part comprises at least part of one or more of: an input layer, a convolutional layer, a pooling layer, an embedding layer, an output layer, and an encoder of the neural network, and wherein the second base calling part lacks any one of: a convolutional layer, a pooling layer, an embedding layer, an output layer, an encoder, and a decoder of the neural network.
59. The computer-implemented method of any one of the preceding claims, wherein the first image processing part comprises at least part of one or more of: an input layer, a convolutional layer, a pooling layer, an embedding layer, and an encoder of the neural network, and wherein the second base calling part comprises at least part of one or more of: an input layer, an output layer, a convolutional layer, a pooling layer, an embedding layer, an encoder, and a decoder of the neural network.
60. The computer-implemented method of any one of the preceding claims, wherein the processor comprises one or more of: a CPU, a GPU, a TPU, a NPU, a FPGA, and an Al chip.
61. The computer-implemented method of any one of the preceding claims, wherein the processor comprises one or more of: a CPU, a NPU, and a FPGA.
62. The computer-implemented method of any one of the preceding claims, wherein the processor lacks any NPU, FPGA, or Al chips.
63. The computer-implemented method of any one of the preceding claims, wherein the first reconfigurable logic device comprises one or more FPGA units.
64. The computer-implemented method of any one of the preceding claims, wherein the integrated circuit comprises one or more NPUs, Al chips, or both.
65. The computer-implemented method of any one of the preceding claims, wherein one or more of the processor, the first reconfigurable logic device, and the integrated circuit is comprised in the sequencing system within a single housing of the sequencing system.
66. The computer-implemented method of any one of the preceding claims, wherein the integrated circuit is in data communication with the first reconfigurable logic device.
67. The computer-implemented method of any one of the preceding claims, wherein the convolutional network comprises a U-Net.
68. The computer-implemented method of any one of the preceding claims, wherein the first convolution comprises a 3D convolution with a convolution kernel.
69. The sequencing system of any one of the preceding claims, wherein the first convolution comprises a 2D convolution with a convolution kernel.
70. The sequencing system of any one of the preceding claims, wherein the convolutional kernel have at least three dimension.
71. The sequencing system of any one of the preceding claims, wherein the neural network is a 3D convolutional neural network.
72. The sequencing system of any one of the preceding claims, wherein the neural network is a 2D convolutional neural network.
73. The computer-implemented method of any one of the preceding claims, wherein the first and second resolution is in 3D.
74. The computer-implemented method of any one of the preceding claims, wherein the first plurality of flow cell images are from a single color channel.
75. The computer-implemented method of any one of the preceding claims, wherein (v) performing, by the processor, a corresponding base calling for each of the determined polonies based on the second plurality of flow cell images comprises: performing, by the processor, a corresponding base calling for each of the determined polonies based on a fourth plurality of flow cell images, wherein the fourth plurality of images are predicted using a second neural network based on a third plurality of flow cell images.
76. The computer-implemented method of any one of the preceding claims, wherein the third plurality of flow cell images are acquired from one or more color channels that is different from the single channel, and wherein the third plurality of flow cell images comprises the first resolution.
77. The computer-implemented method of any one of the preceding claims, wherein the fourth plurality of flow cell images comprises the second resolution.
78. The computer-implemented method of any one of the preceding claims, wherein the first plurality of flow cell images are from one or more color channels.
79. The computer-implemented method of any one of the preceding claims, wherein the first plurality of flow cell images comprises: an unbalanced diversity of nucleotide bases of A, G, C and T/U among concatemer molecules immobilized on the support in one or more cycles.
80. The computer-implemented method of any one of the preceding claims, wherein two or more different concatemer molecules among the concatemer molecules have different insert sequences that correspond to different target RNA molecules or target cDNA molecules.
81. The computer-implemented method of any one of the preceding claims, wherein each location of the determined polonies corresponds to a location of the concatemer molecules.
82. The computer-implemented method of any one of the preceding claims, wherein the unbalanced diversity of nucleotide bases of A, G, C and T/U among the concatemer molecules comprises: a percentage of (1) a number of one or more types of nucleotide bases to (2) a total number of bases is less than 20%, 15%, 10%, or 5% in the one or more cycles.
83. The computer-implemented method of any one of the preceding claims, wherein the sample comprises overloaded concatemer molecules with a spatial density in a range of 103 -IO10 per mm2.
84. The computer-implemented method of any one of the preceding claims, wherein the first resolution is in a range of 0.1 um to 5 um.
85. The computer-implemented method of any one of the preceding claims, wherein the down-sampling factor is 2, 4, or 8.
86. The computer-implemented method of any one of the preceding claims, wherein one or more of operations (ii) to (v) are performed while a sequencing run is being performed.
87. The computer-implemented method of any one of the preceding claims, wherein the one or more cycles comprises a current cycle N.
88. The computer-implemented method of any one of the preceding claims, wherein N is in a range from 1 to 500.
89. The computer-implemented method of any one of the preceding claims, wherein one or more of operations (ii) to (v) are performed while the sequencing reactions in cycles subsequent to the current cycle N is yet to be performed or currently being performed.
90. The computer-implemented method of any one of the preceding claims, wherein the training data set of flow cell images comprises z-stacks of flow cell images taken at different z-locations.
91. The computer-implemented method of any one of the preceding claims, wherein the second resolution is at least 4, 6, or 8 times greater than the first resolution in all three dimensions.
92. The computer-implemented method of any one of the preceding claims further comprising: registering the second plurality of flow cell images to a common coordinate system.
93. The computer-implemented method of any one of the preceding claims, wherein the first plurality of flow cell images are acquired from a single color channel of the sequencing system.
94. The computer-implemented method of any one of the preceding claims, wherein (vi) determining, by the processor, polonies from the second plurality of flow cell images comprises: generating a polony map comprising spatial location of polonies based on the determined polonies in (iv).
95. The computer-implemented method of any one of the preceding claims, wherein generating the polony map comprising spatial location of polonies based on the determined polonies in (iv) further comprises: deleting duplicate polonies from the determined polonies, wherein the duplicate polonies are out-of-focus.
96. The computer-implemented method of any one of the preceding claims, wherein determining, by the processor, polonies from the second plurality of flow cell images comprises: superimposing the second plurality of flow cell images with corresponding cell staining images; and generating the polony map by only including polonies that are within cell boundaries in the corresponding cell staining images.
97. The computer-implemented method of any one of the preceding claims, wherein the support comprises a glass or plastic substrate.
98. The computer-implemented method of any one of the preceding claims, wherein the support is comprised in a flow cell device.
99. The computer-implemented method of any one of the preceding claims further comprising: providing the sample harboring a plurality of RNA which comprises the first target RNA molecule and the second target RNA molecule.
100. The computer-implemented method of any one of the preceding claims further comprising: generating inside the sample a plurality of cDNA molecules which include a first target cDNA molecule that corresponds to the first target RNA molecule and a second target cDNA molecule that corresponds to the second target RNA molecule.
101. The computer-implemented method of any one of the preceding claims further comprising: contacting the plurality of cDNA molecules in the sample with a plurality of target-specific padlock probes which includes at least a first plurality of first targetspecific padlock probes and a second plurality of second target-specific padlock probes.
102. The computer-implemented method of any one of the preceding claims further comprising: contacting the plurality of RNA molecules in the sample with a plurality of targetspecific padlock probes which includes at least a first plurality of target-specific padlock probes and a second plurality of target-specific padlock probes.
103. The computer-implemented method of any one of the preceding claims, wherein individual padlock probes in the first plurality of first target-specific padlock probes comprise: first and second terminal regions, wherein the first terminal region selectively hybridizes to a first region of the first target cDNA molecule or the first target RNA molecule, and the second terminal region selectively hybridizes to a second region of the first target cDNA molecule or the first target RNA molecule.
104. The computer-implemented method of any one of the preceding claims, wherein the first target-specific padlock probe comprises a first target barcode sequence that corresponds to an uniquely identifies the first target cDNA sequence or the first target RNA sequence.
105. The computer-implemented method of any one of the preceding claims, wherein the first target-specific padlock probe comprises a first target barcode sequence that is located adjacent to one of the regions of the first target-specific padlock probe that selectively hybridizes to the first target cDNA molecule or the first target RNA sequence.
106. The computer-implemented method of any one of the preceding claims, wherein the first target-specific padlock probe comprises at least one universal adaptor sequence.
107. The computer-implemented method of any one of the preceding claims, wherein the first target-specific padlock probe comprises a universal primer binding site for a rolling circle amplification primer or a complementary sequence thereof.
108. The computer-implemented method of any one of the preceding claims, wherein the first target-specific padlock probe comprises a universal compaction oligonucleotide binding site or a complementary sequence thereof.
109. The computer-implemented method of any one of the preceding claims, wherein the plurality of nucleotide reagents comprise: multivalent molecules, nucleotides, nucleotide analogs, or their combinations.
110. The computer-implemented method of any one of the preceding claims, wherein individual nucleotides or nucleotide analogs are detectably labeled or non-labeled.
111. The computer-implemented method of any one of the preceding claims, wherein the detectably labeled individual nucleotides or nucleotide analogs comprises a different detectable color label that corresponds with each different type of nucleotide base of A, G, C, and T/U.
112. The computer-implemented method of any one of the preceding claims, wherein an individual multivalent molecule comprise a core attached with multiple nucleotide arms and each arm of the individual multivalent molecule comprises the same type of nucleotide base.
113. The computer-implemented method of any one of the preceding claims, wherein generating the first plurality of flow cell images comprises: in each cycle, imaging, by an optical system, optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
114. The computer-implemented method of any one of the preceding claims, wherein the first plurality of flow cell images comprises optical color signals emitted from the nucleotide reagents that are bound to the plurality of concatemer molecules.
115. The computer-implemented method of any one of the preceding claims further comprising: removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the sample.
116. The computer-implemented method of any one of the preceding claims further comprising: reiteratively sequencing the plurality of concatemers by repeating the following operations for at least once: generating the first plurality of flow cell images of a sample immobilized on a support by conducting one or more cycles of sequencing reactions thereby generating the first sequencing read product and the second sequencing product, the sample comprising a plurality of concatemer molecules therewithin, wherein a first concatemer molecule of the plurality of concatemer molecules corresponds to a first target RNA molecule of the sample, and a second concatemer molecule of the plurality of concatemer molecules corresponds to a second target RNA molecule of the sample, wherein the first plurality of flow cell images; and removing a first sequencing read product from the first concatemer molecule and retaining the first concatemer molecule in the sample, and removing a second sequencing read product from the second concatemer molecule and retaining the second concatemer molecule in the sample.
117. The computer-implemented method of any one of the preceding claims, wherein the first sequencing read product comprises some or all of: a first target barcode sequence in one or more tandem units of the first concatemer molecule; a first insert sequence in one or more tandem units of the first concatemer molecule; or their combinations.
118. The computer-implemented method of any one of the preceding claims further comprising: confirming presence of the first target RNA molecule, the second target RNA molecule, or both molecules in the sample based on the performed base calling of the second plurality of flow cell images at the base calling locations in the base calling template.
119. The computer-implemented method of any one of the preceding claims further comprising: generating, by the sequencing system, the second plurality of flow cell images of the sample immobilized on the support by conducting subsequent cycles of sequencing reactions after the one or more cycles.
120. The computer-implemented method of any one of the preceding claims, wherein generating the first plurality of flow cell images of the sample immobilized on the support comprises: sequencing at least the first concatemer inside the sample under a condition that inhibits sequencing the second concatemer.
121. The computer-implemented method of any one of the preceding claims, wherein sequencing at least the first concatemer inside the sample comprises: generating a plurality of first sequencing read products, and wherein the sequences of the first sequencing read products are aligned with a first target reference sequence to confirm presence of the first target RNA in the sample.
122. The computer-implemented method of any one of the preceding claims, wherein generating the first plurality of flow cell images of the sample immobilized on the support comprises: sequencing at least the second concatemer inside the sample under a condition that inhibits sequencing the first concatemer.
123. The computer-implemented method of any one of the preceding claims, wherein sequencing at least the second concatemer inside the cellular sample comprises: generating a plurality of second sequencing read products, and wherein sequences of the second sequencing read products are aligned with a second target reference sequence to confirm presence of the second target RNA in the sample.
PCT/US2025/014022 2024-02-02 2025-01-31 Three-dimensional base calling in next generation sequencing analysis Pending WO2025166157A1 (en)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US202463549327P 2024-02-02 2024-02-02
US202463549333P 2024-02-02 2024-02-02
US63/549,333 2024-02-02
US63/549,327 2024-02-02
US202463570038P 2024-03-26 2024-03-26
US63/570,038 2024-03-26
US202463661332P 2024-06-18 2024-06-18
US63/661,332 2024-06-18
US202463724712P 2024-11-25 2024-11-25
US63/724,712 2024-11-25
US202463736743P 2024-12-20 2024-12-20
US63/736,743 2024-12-20

Publications (1)

Publication Number Publication Date
WO2025166157A1 true WO2025166157A1 (en) 2025-08-07

Family

ID=96591469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/014022 Pending WO2025166157A1 (en) 2024-02-02 2025-01-31 Three-dimensional base calling in next generation sequencing analysis

Country Status (1)

Country Link
WO (1) WO2025166157A1 (en)

Similar Documents

Publication Publication Date Title
US20230326065A1 (en) Primary analysis in next generation sequencing
AU2022407175A1 (en) Primary analysis in next generation sequencing
US20250285709A1 (en) Phasing and prephasing correction of base calling in next generation sequencing
US20250209617A1 (en) Image registration in primary analysis
US20250349138A1 (en) Three-dimensional base calling in next generation sequencing analysis
US12469162B2 (en) Primary analysis in next generation sequencing
EP4590856A2 (en) Increasing sequencing throughput in next generation sequencing of three-dimensional samples
US20250232578A1 (en) Quality measurement of base calling in next generation sequencing
US20250363596A1 (en) Color correction of flow cell images
WO2024081805A1 (en) Separating sequencing data in parallel with a sequencing run in next generation sequencing data analysis
AU2023282904A1 (en) Adapter trimming and determination in next generation sequencing data analysis
EP4445327A1 (en) Primary analysis in next generation sequencing
WO2025166157A1 (en) Three-dimensional base calling in next generation sequencing analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25749461

Country of ref document: EP

Kind code of ref document: A1