WO2025240924A1 - Blind equalization systems for base calling applications - Google Patents
Blind equalization systems for base calling applicationsInfo
- Publication number
- WO2025240924A1 WO2025240924A1 PCT/US2025/029859 US2025029859W WO2025240924A1 WO 2025240924 A1 WO2025240924 A1 WO 2025240924A1 US 2025029859 W US2025029859 W US 2025029859W WO 2025240924 A1 WO2025240924 A1 WO 2025240924A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- values
- estimated
- equalizer
- oligonucleotides
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
Definitions
- existing sequencing systems determine individual nucleotide bases of nucleic-acid sequences by using conventional Sanger sequencing or by using sequencing-by-synthesis (SBS).
- SBS sequencing-by-synthesis
- existing sequencing systems can monitor millions to billions of nucleic-acid polymers being synthesized in parallel to detect more accurate nucleobase calls.
- a camera in SBS platforms can capture images of irradiated fluorescent tags from nucleobases incorporated into such synthesized nucleic-acid sequences often grouped into clusters of oligonucleotides (e.g., clusters within nanowells of a flow cell).
- a sequencing device uses specialized software to determine nucleobases that were detected in a given image based on the light signal captured in the image data.
- existing sequencing systems can determine the sequence of nucleobases present in clusters and determine nucleotide reads for the samples.
- some existing sequencing systems utilize an equalizer to process and analyze received images. For example, some existing systems convert energy depicted in images of light signals of oligonucleotide clusters into intensity values by applying an equalizer to the images. Some existing systems generate intensity values for oligonucleotide clusters by applying coefficients or weights associated with the equalizer to pixels in the images depicting energy intensities from the respective oligonucleotide cluster. By iteratively training the equalizer from sequencing cycle to sequencing cycle on predicted base call data and batch updating of coefficients, existing systems commonly determine which coefficients to apply to image pixels.
- the base call training data is incorrect or contains mistakes (e.g., includes low resolution images and/or clusters of oligonucleotides corrupted by high polyclonality)
- some existing sequencing systems e.g., equalizers
- training development platforms e.g., sequencing instruments under development or beta testing
- ill-maintained systems because it is difficult to benchmark the performance of such development platforms.
- the accurate coefficients for the equalizer are unknown.
- the disclosed systems can quickly and accurately determine equalizer coefficients for an equalizer based on estimated point-spread-function values and estimated noise values derived from expected response signals from oligonucleotide clusters.
- the disclosed systems can receive signal values for expected response signals from one or more clusters of oligonucleotides incorporating labeled nucleobases. Based on the signal values, the disclosed systems can determine estimated point-spread-function values for one or more such clusters of oligonucleotides and estimated noise values within a channel.
- the disclosed systems can determine equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values.
- the disclosed systems can further determine a base call for one or more such clusters of oligonucleotides by utilizing the equalizer coefficients.
- the disclosed systems can utilize such equalizer coefficients for a variety of basecalling applications described further below.
- the disclosed systems can more accurately determine response signals and their corresponding nucleobase calls for a given sequencing cycle by (i) initializing an equalizer with accurate equalizer coefficients and without batch updating coefficients based on predicted base calls and (ii) determining nucleobase calls for clusters of oligonucleotides based on corrected signal values adjusted in part by the equalizer coefficients.
- FIG. 1 illustrates an environment in which a blind-equalizer sequencing system can operate in accordance with one or more embodiments of the present disclosure.
- FIG. 2 illustrates an overview diagram of the blind-equalizer sequencing system determining a base call for one or more clusters by determining and utilizing equalizer coefficients in accordance with one or more embodiments of the present disclosure.
- FIGS. 3A-3B illustrate the blind-equalizer sequencing system generating an image of a region of a nucleotide-sample slide and the blind-equalizer sequencing system depicting response signals based on fluorescent responses in different channels in accordance with one or more embodiments of the present disclosure.
- FIG. 4 illustrates a blind equalizer measuring signal values for a pixel depicting the cluster of oligonucleotides in accordance with one or more embodiments.
- FIG. 5 illustrates blind-equalizer sequencing system determining equalizer coefficients for one or more clusters of oligonucleotides in accordance with one or more embodiments of the present disclosure.
- FIG. 6 illustrates the blind-equalizer sequencing system converting an image depicting signals from oligonucleotide clusters within a region of a nucleotide-sample slide from a frequency domain to a spatial domain and further determining estimated point-spread-function values based on an image representation in the spatial domain in accordance with one or more embodiments of the present disclosure.
- FIG. 7 illustrates the blind-equalizer sequencing system applying an image matrix comprising equalizer coefficients to images depicting signals in one or more channels from oligonucleotide clusters in accordance with one or more embodiments of the present disclosure.
- FIG. 8 illustrates the blind-equalizer sequencing system applying a distribution function (e.g., Dirac delta function) to one or more signal values of a target cluster of oligonucleotides and signal values of neighboring clusters of oligonucleotides within a region of a nucleotide-sample slide in accordance with one or more embodiments of the present disclosure.
- a distribution function e.g., Dirac delta function
- FIG. 9 illustrates the improved performance of the blind-equalizer sequencing system in terms of base-call-quality scores for nucleobase calls relative to a baseline sequencing system in accordance with one or more embodiments of the present disclosure.
- FIGS. 10A-10B illustrate a series of acts for determining a base call for one or more clusters of oligonucleotides using equalizer coefficients in accordance with one or more embodiments of the present disclosure.
- FIG. 11 illustrates a block diagram of an example computing device in accordance with one or more embodiments of the present disclosure.
- the disclosure describes one or more embodiments of a blind-equalizer sequencing system that quickly determines equalizer coefficients for an equalizer based on estimated point- spread-function values and estimated noise values derived from expected response signals of oligonucleotide clusters — but without relying on predicted base calls to initially adjust the equalizer coefficients. By determining and utilizing the equalizer coefficients, the blind-equalizer sequencing system can more precisely initialize the equalizer and more accurately determine nucleobase calls. [0025] In some implementations, for instance, the blind-equalizer sequencing system receives signal values for an expected response signal in a sequencing cycle from one or more clusters of oligonucleotides.
- the blind-equalizer sequencing system determines estimated point-spread-function values corresponding to the expected response signal of one or more such clusters of oligonucleotides. In the same such sequencing cycle, the blind-equalizer sequencing system can determine estimated noise values for a given channel. By combining the estimated point-spread-function values and the estimated noise values, the blind-equalizer sequencing system can determine the equalizer coefficients. Such equalizer coefficients compensate for the estimated point-spread-function values and the estimated noise values with respect to the expected response signal from one or more such cluster of oligonucleotides.
- the blind-equalizer sequencing system compensates for the estimated point-spread-function values and the estimated noise values by (i) rectifying the effects of capturing the expected response signal with an imaging device and (ii) accounting for noise in the blind-equalizer sequencing system.
- the blind-equalizer sequencing system can utilize the equalizer coefficients to determine a base call accurately and quickly for one or more clusters of oligonucleotides.
- the blind-equalizer sequencing system can identify or receive, for a sequencing cycle, signal values (e.g., pixel intensity, wavelength, and/or brightness values) for an expected response signal emitted by at least a cluster of oligonucleotides within a region of a nucleotide-sample slide (e.g., flow cell).
- signal values e.g., pixel intensity, wavelength, and/or brightness values
- an expected or ideal response signal represents a response signal emitted from a cluster of oligonucleotides before an imaging device disperses energy of the expected response signal through an act of capturing an image of the nucleotide-sample-slide region to which the cluster is immobilized.
- the blind-equalizer sequencing system can receive the signal values by determining, for a given channel, signal values that would accurately represent pixel intensity or wavelengths within one or more images of the expected response signal from an oligonucleotide cluster in a given channel — before image capture by an imaging device disperses energy of the expected response signal.
- the blind-equalizer sequencing system can determine estimated point-spread-function values within a channel. As discussed in more detail below, capturing an image of the expected response signal for the cluster of oligonucleotides with an imaging device distorts the expected response signal by dispersing the energy or light of the expected response signal.
- the blind-equalizer sequencing system can represent the dispersed light with a point-spread function (PSF).
- PSF point-spread function
- the blind-equalizer sequencing system captures an image of the expected response signal
- the blind-equalizer sequencing system also typically captures the noise in the channel.
- the blind-equalizer sequencing system depicts the signal values of the expected response signal in a captured image.
- the image can include the point-spread function of the expected response signal and noise within the channel.
- the blind-equalizer sequencing system can determine the estimated point-spread-function values by converting the signal values from a spatial domain to a frequency domain and processing the signal values in the frequency domain while enforcing certain constraints on the signal values in the frequency domain.
- the blind-equalizer sequencing system can utilize physical characteristics of the blind-equalizer sequencing system to simplify how the blind-equalizer sequencing system determines estimated point-spread-function values.
- the imaging device can represent a minimum response channel where the response signals are a set of real coefficients limited to specified area (e.g., number of pixels).
- the blind-equalizer sequencing system can convert the signal values from a spatial domain into a frequency domain and measure the average energy (e.g., power spectral density) of the signal response in the frequency domain. Subsequently, in some cases, the blind-equalizer sequencing system can utilize the power spectral density along with Hermitian symmetry to generate a real, two-dimensional finite, symmetric, estimated point-spread- function values.
- the blind-equalizer sequencing system can determine estimated noise values within the channel. As discussed in more detail below, sequencing devices have varying levels of noise and different sources of noise in the sequencing devices. Moreover, the blind-equalizer sequencing system generates equalizer coefficients that account for noise in the sequencing device because — without accounting for such noise — an equalizer will generate equalizer coefficients based on inaccurate signal-to-noise ratios. In some cases, the blind-equalizer sequencing system can determine the estimated noise values by measuring noise in the channel where a response signal is not present. In certain implementations, the blind-equalizer sequencing system can apply independent identically distributed (IID) Gaussian noise for each cluster of oligonucleotides within the region of the nucleotide-sample slide.
- IID independent identically distributed
- the blind-equalizer sequencing system can determine equalizer coefficients by combining the estimated point-spread-function values and the estimated noise values.
- the blindequalizer sequencing system can determine equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values with respect to the estimated response signal. For instance, during a sequencing cycle in which an expected response signal interacts with a camera or other imaging device capturing an image in a channel, the imaging device disperses energy from the expected response signal.
- the blind-equalizer sequencing system utilizes an equalizer to generate equalizer coefficients that mitigate or reverse the dispersion effects that response signals from oligonucleotide clusters experience in a given channel. More specifically, the equalizer mitigates or reverses the effects of an imaging device dispersing the energy of the expected response signal by inverting and concentrating the energy from the expected response signal.
- the blind-equalizer sequencing system can determine a nucleobase call for one or more clusters. For example, in a same sequencing cycle or subsequent sequencing cycle, the blind-equalizer sequencing system can apply the equalizer coefficients to one or more signal values of an expected response signal from a cluster of oligonucleotides and determine accurate intensity values for the cluster of oligonucleotides. Based on the more accurate intensity values for the estimated response signal, the blind-equalizer sequencing system determines a nucleobase call for the cluster of oligonucleotides.
- the blind-equalizer sequencing system provides several technical advantages over existing sequencing systems that primarily or exclusively train an equalizer across sequencing cycles.
- the blind-equalizer sequencing system can improve the accuracy of nucleobase calling, increase the efficiency of nucleobase calling, and improve the flexibility of sequencing systems.
- the blind-equalizer sequencing system can improve the accuracy of a specialized or special-purpose computer — that is, a sequencing device — determining base calls for nucleobases incorporated into oligonucleotide clusters by estimating equalizer coefficients in a feedforward manner.
- the blindequalizer sequencing system can utilize the properties of a physical sequencing device and a particular channel to determine accurate equalizer coefficients. For example, as discussed below with respect to FIGS. 7-8, the blind-equalizer sequencing system can account for a pitch and/or pattern of the nucleotide-sample slide and pixels of the image to generate the equalizer coefficients.
- the blind-equalizer sequencing system can determine equalizer coefficients that are blind to (or not directly dependent on) the characteristics (e.g., signal intensity, noise, etc.) of the base calls and/or the predicted base calls for a given sequencing cycle. Consequently, the blind-equalizer sequencing system can use equalizer coefficients that are not compromised by inaccurate base calls to measure more accurate and/or purer response signals for clusters of oligonucleotides. Based on the more accurate or purer cluster signal, the blind-equalizer sequencing system can likewise determine more accurate nucleobase calls for nucleobases incorporated by one or more clusters of oligonucleotides during a biochemical reaction.
- the blind-equalizer sequencing system can improve the computational and reagent-use efficiency of base calling. For example, the blind-equalizer sequencing system can accurately determine equalizer coefficients without going through several iterations of updating the equalizer coefficients, as existing sequencing systems do, based on predicted base calls in a series of sequencing cycles. As mentioned above, some existing sequencing systems rely on inaccurate base calls when determining coefficients for the equalizer. To compensate for relying on inaccurate base calls, such existing systems further process and update the coefficients in additional sequencing cycles or sequencing runs.
- the blind-equalizer sequencing system can quickly determine accurate equalizer coefficients with low error rates — without going through several iterations to initialize relatively more accurate equalizer coefficients.
- the blind-equalizer sequencing system can determine equalizer coefficients based on attributes of the sequencing system.
- the blind-equalizer sequencing system improves the flexibility of sequencing systems. For instance, the blind-equalizer sequencing system provides an agnostic instrument method for determining equalizer coefficients that is blind (or not dependent on) specific base calls or actual response signals of oligonucleotide clusters in a specific sequencing device. By determining signal values for expected response signals of oligonucleotide clusters — and estimating point-spread-function values and noise values from such expected response signals — the blind-equalizer sequencing system can determine equalizer coefficients that are agnostic to a given sequencing device on which sequencing and base calling occurs.
- the blind-equalizer sequencing system can determine accurate equalizer coefficients that initialize point-spread-function and noise compensation of an equalizer on various sequencing devices.
- nucleotide-sample slide refers to a plate or substrate, such as a flow cell, comprising oligonucleotides for sequencing nucleotide sequences from genomic samples or other sample nucleic-acid polymers.
- a nucleotide-sample slide can refer to a substrate containing fluidic channels through which reagents and buffers can travel as part of sequencing.
- a flow cell may comprise small fluidic channels and oligonucleotide samples that can be bound to adapter sequences on the substrate.
- a nucleotide-sample slide can be an open substrate with one or more regions for oligonucleotide samples to be analyzed and the oligonucleotide samples may be positioned using charged pads or other means.
- the nucleotide-sample slide can be a membrane having a nanopore through which one or more oligonucleotide samples may pass.
- a flow cell or other nucleotide-sample slide can (i) include a device having a lid extending over a reaction structure to form a flow channel therebetween that is in communication with a plurality of reaction sites of the reaction structure and (ii) include a detection device that is configured to detect designated reactions that occur at or proximate to the reaction sites.
- a flow cell or other nucleotide-sample slide may include a solid-state light detection or “imaging” device, such as a Charge-Coupled Device (CCD) or Complementary Metal-Oxide Semiconductor (CMOS) (light) detection device.
- CCD Charge-Coupled Device
- CMOS Complementary Metal-Oxide Semiconductor
- a flow cell may be configured to fluidically and electrically couple to a cartridge (having an integrated pump), which may be configured to fluidically and/or electrically couple to a bioassay system.
- a cartridge and/or bioassay system may deliver a reaction solution to reaction sites of a flow cell according to a predetermined protocol (e.g., sequencing-by-synthesis), and perform a plurality of imaging events.
- a cartridge and/or bioassay system may direct one or more reaction solutions through the flow channel of the flow cell, and thereby along the reaction sites. At least one of the reaction solutions may include four types of nucleobases having the same or different fluorescent labels.
- the nucleobases may bind to the reaction sites of the flow cell, such as to corresponding oligonucleotides at the reaction sites.
- the cartridge and/or bioassay system may then illuminate the reaction sites using an excitation light source (e.g., solid-state light sources, such as lightemitting diodes (LEDs)).
- the excitation light may provide emission signals (e.g., light of a wavelength or wavelengths that differ from the excitation light and, potentially, each other) that may be detected by the light sensors of the flow cell.
- region of a nucleotide-sample slide refers to a section or part of a nucleotide-sample slide, such as a section of a surface of the nucleotide-sample slide.
- a region of a nucleotide-sample slide can refer to a discrete section of a nucleotide-sample slide that differs from other sections of the nucleotide-sample slide.
- a region of a nucleotide-sample slide can include a well (e.g., a nano- well) or wells of a patterned flow cell or a discrete subsection of a non -pattered flow cell (e.g., a subsection corresponding to a cluster).
- a region of a nucleotide-sample slide includes a tile or a sub-tile having clusters of the same or similar oligonucleotide growing in parallel.
- labeled nucleobase refers to a nucleobase having a fluorescent or light-based indicator or fluorescent dye indicator of the classification of the nucleobase.
- a labeled nucleobase can refer to a nucleobase that incorporates a fluorescent or light-based indicator or fluorescent dye indicator to identify the type of base (e.g., adenine, cytosine, thymine, or guanine).
- a labeled nucleobase includes a nucleobase having a fluorescent tag that emits a signal that either by itself or together with another fluorescent tag identifies the base type.
- a nucleobase may be identified by a mixture of dyes (or a mixture of fluorescent tags) that together indicate the nucleobase type (e.g., “ON”/ “ON” expected response signals and/or estimated response signals).
- the type of base e.g., adenine, cytosine, thymine, or guanine
- the type of base can be determined in certain embodiments of the crosstalk-aware-base-calling system.
- cluster of oligonucleotides refers to a localized group or collection of DNA or RNA molecules on a nucleotide-sample slide, such as a flow cell, or other solid surface.
- a cluster includes tens, hundreds, thousands, or more copies of a cloned or the same DNA or RNA segment.
- a cluster includes a grouping of oligonucleotides immobilized in a section of a flow cell or other nucleotide-sample slide.
- clusters are evenly spaced or organized in a systematic structure within a patterned flow cell.
- clusters are randomly organized within a non-pattemed flow cell.
- a cluster of oligonucleotides can be imaged utilizing one or more light signals. For instance, an oligonucleotide-cluster image may be captured by a camera during a sequencing cycle of light emitted by irradiated fluorescent tags incorporated into oligonucleotides from one or more clusters on a flow cell.
- response signal refers to a signal emitted, reflected, or otherwise communicated from a labeled nucleobase or a group of labeled nucleobases (e.g., labeled nucleobases added to a cluster of oligonucleotides).
- a response signal can refer to a signal indicating the type of base.
- a response signal can include a light signal emitted or reflected from a fluorescent tag of a nucleobase or fluorescent tags of multiple nucleobases incorporated into oligonucleotides.
- a nucleobase incorporated into a cluster may (in response to a laser) likewise emit a signal that can be identified as a mixture of dyes (or a mixture of fluorescent tags) that together indicate the nucleobase type (e.g., a cluster with “ON”/ “ON” illumination indicators).
- the blind-equalizer sequencing system triggers the response signal through an external stimulus, such as a laser or other light source. In some cases, blind-equalizer sequencing system triggers the response signal through some internal stimuli.
- the blind-equalizer sequencing system observes the response signal using a filter applied when capturing an image of the nucleotide- sample slide (e.g., section of the nucleotide-sample slide).
- a response signal includes an aggregate of the signals provided by each labeled nucleobase added to individual oligonucleotides in a cluster of oligonucleotides.
- expected response signal refers to an expected signal emitted, reflected, or otherwise communicated from a labeled nucleobase or a group of labeled nucleobases (e.g., labeled nucleobases added to a cluster of oligonucleotides).
- an expected response signal can represent input data (e.g., sparse point sources, impulse responses) captured by a camera or other imaging device in one or more channels — before the camera or imaging device disperses the energy emitted by the labeled nucleobase(s).
- an expected response signal is represented as a value or multiple values within a matrix, such as a matrix X.
- the term “estimated response signal” refers to a signal or indicator communicated from a labeled nucleobase or a group of labeled nucleobases (e.g., labeled nucleobases added to a cluster of oligonucleotides) transmitted, transferred, or otherwise relayed through an equalizer.
- the blind-equalizer sequencing system generates the estimated response signal applying the equalizer coefficients (e.g., weights) to pixel intensities.
- the estimated response signal can be an output of the equalizer for any position and/or location of one or more clusters of oligonucleotides.
- the estimated response signal mimics the expected response signal from one or more clusters of oligonucleotides.
- the blind-equalizer sequencing system can take on a binary format such as “0 or 1” or “ON or OFF.”
- the estimated response signal is a two-dimensional sparse matrix.
- an estimated response signal is represented as a value or multiple values within a matrix, such as a matrix Z.
- the term “channel” refers to a range or filter of light, intensity, or color used to transmit, detect, and/or measure a response signal from a cluster of oligonucleotides.
- the channel can include a particular range of light, intensity, or color of a laser used to illicit a fluorescent signal from fluorescent tags on nucleobases incorporated into oligonucleotides within a cluster.
- the blind-equalizer sequencing system utilizes a two- channel implementation by, for instance, using two different ranges of light, intensities, or colors to illicit signals from clusters per sequencing cycle and capturing two corresponding images of a region of a nucleotide-sample slide per sequencing cycle.
- the first and second images can capture the intensity values of the emitted response signal from the clusters that correspond to first and second light ranges.
- an equalizer corresponds to or is specific to a given channel.
- a first channel corresponds to first equalizer and a second channel corresponds to second equalizer.
- the blind-equalizer sequencing system can utilize a single channel implementation, three-channel implementation, or four-channel implementation.
- the channel can take on a matrix form describing the estimated response signal in multiple channels.
- the term “signal value” refers to a value indicating the intensity and/or energy emitted, reflected, or otherwise communicated from an expected response signal.
- a signal value measures energy emitted by labeled nucleobases incorporated by a cluster of oligonucleotides — as indicated by both a point-spread function and noise within a channel and/or sequencing device.
- the signal value can refer to a value and/or measurement associated with a color intensity (e.g., wavelength) or a light intensity (e.g., brightness) of one or more pixels from an image of one or more expected response signals in the channel.
- the blind-equalizer sequencing system captures several images of one or more clusters of oligonucleotides with labeled nucleobases using different filters (or intensity channels).
- a signal value can correspond to the intensity of the expected response signal as observed through a particular filter.
- the term “transmission medium” refers to a system or substance that acts as a pathway for transmitting or communicating information.
- the transmission medium is a camera or other imaging device that transmits an expected response signal for a cluster of oligonucleotides from a nucleotide-sample slide to an equalizer by capturing an image of the expected response signal.
- the term “imaging device” refers to a device or sensor that detects, captures, and/or conveys information in the form of a visual image.
- a camera or other imaging device can capture an image of the expected response signal of one or more clusters emitted during SBS and transmit the image of the expected response signal to the equalizer.
- the transmission medium can distort and/or disperse the energy or light of response signals in a captured image.
- the term “point-spread function” refers to a function that describes a response of an imaging device or other optical system to a point source.
- the point-spread function can measure the response — that is, a dispersion of energy from a response signal — caused by an imaging device capturing an image in a channel.
- the point-spread function shows how light emitted from an input response signal blurs and/or spreads when the blind-equalizer sequencing system captures an image of the input response with an imaging device within a channel.
- estimated point-spread-function values refer to estimated values representing the response of an imaging device (e.g., a form of transmission medium) to an expected response signal.
- the blind-equalizer sequencing system can use autocorrelation and physical aspects of sequencing device to determine the estimated point-spread- function values for the expected response signal in a given channel.
- the blindequalizer sequencing system can convert the signal values of the expected response signal from a spatial domain to a frequency domain, and in some embodiments, based on the regular spacing of the expected response signals from one or more clusters of oligonucleotides, the blind-equalizer sequencing system can measure the signal values (e.g., signal energy) of one or more clusters of oligonucleotides in the frequency domain. Moreover, the blind-equalizer sequencing system can force the expected response signal to be real and symmetric by utilizing Hermitian symmetry in the frequency domain. As mentioned above, in one or more embodiments, the estimated point-spread- function values correspond to a specific channel.
- first estimated point- spread-function values correspond to a first channel and a second estimated point-spread-function values correspond to a second channel.
- the blind-equalizer sequencing system utilizes the first estimated point-spread-function values to determine equalizer coefficients for the first channel and the second estimated point-spread-function values to determine equalizer coefficients for the second channel.
- the term “estimated noise values” refers to an estimated amount of interference or distortion affecting a quality or quantification of a response signal (e.g., expected response signal).
- the estimated noise values can be estimated background noise associated with a sequencing system.
- the estimated noise values can be independent identically distributed (IID) Gaussian noise for each cluster of oligonucleotides within a region of a nucleotide-sample slide.
- an equalizer coefficient refers to weights or values applied to an image of clusters of oligonucleotides that adjust for (or reduce) noise from adjacent clusters of oligonucleotides and/or the distortion and/or dispersion effects of a transmission medium.
- an equalizer coefficient can include a weighted value that, when applied to an image of oligonucleotide clusters, adjusts for inter-symbol interference (e.g., crosstalk) of one cluster of oligonucleotides on a target cluster of oligonucleotides.
- the equalizer coefficients apply weighted values to the image to measure one or more signal values of the target cluster of oligonucleotides while minimizing the signal values of the neighboring clusters of oligonucleotides.
- the equalizer coefficients can represent an equalizer response that, when applied to one or more signal values, generates an estimated response signal (e.g., output) that mimics the estimated response signal (input) by mitigating or reversing the effects of an imaging device on the expected response signal while capturing an image of the expected response signal.
- the blind-equalizer sequencing system utilizes an equalizer to apply equalizer coefficients to an image with pixels that represent one or more signal values from one or more clusters of oligonucleotides.
- the equalizer coefficients when applied to pixels of the image minimize the least mean square between the expected response signal and the estimated response signal for one or more clusters of oligonucleotides.
- equalizer coefficients can be pixel coefficients.
- pixel coefficients refers to weighted coefficients that mix and/or combine one or more signal values of pixels that depict expected response signals from one or more clusters of oligonucleotides.
- the blind-equalizer sequencing system can multiply signal values of one or more pixels with the pixel coefficients and calculate a weighted sum of the signal values of the pixels.
- the blind-equalizer sequencing system can use the weighted sum of the signal values to make a base call.
- image matrix refers to a matrix that includes one or more values, such as equalizer coefficients, that adjust (or reduces) intensity data from an image for noise or distortion.
- an image matrix can include equalizer coefficients that that increase or maximize a signal-to-noise ratio of an expected response signal affected by noise and/or crosstalk.
- blind-equalizer sequencing system can modify the intensity data (e.g., signal values) by applying the image matrix to the pixels depicting signal values of the expected response signals.
- the image matrix constitutes an image mask that applies to pixel values in an image.
- an equalizer refers to a model or system that can use a function to convert dispersed energy of a response signal into values representing one or more estimated response signals from one or more clusters of oligonucleotides and/or reduce noise that is part of such a response signal.
- an equalizer includes a model that converts received dispersed- over-pixels intensity energy (e.g., signal values) into an intensity value representing light emitted from a cluster and intensity values for adjacent clusters by linearly weighting pixel intensities and/or energy.
- the equalizer receives an input image and gathers signal values (e.g., light energy) across pixels in the image and converts the energy to an intensity value for one or more clusters during a sequencing cycle in a channel.
- the equalizer utilizes equalizer coefficients (e.g., an image matrix comprising equalizer coefficients) to increase or maximize the signal-to-noise ratio of the intensity data by weighting signal values depicted in the image to determine a weighted sum of the signal values of the pixels.
- the equalizer can combine the signal values from one or more clusters of oligonucleotides to increase or maximize one or more signal values of a target cluster of oligonucleotides and minimize the signal values (e.g., crosstalk) from adjacent clusters of oligonucleotides while accounting for the amplified noise in a sequencing device.
- the equalizer is a linear equalizer that utilizes a linear filter that can be designed or optimized to filter out noise.
- the linear filter can be applied to each cluster individually or across an entire image.
- the equalizer can utilize different equalizer coefficients for different channels.
- the term “estimated cluster locations” refers to an approximated position of clusters of oligonucleotides and/or nanowell locations holding clusters of oligonucleotides on a nucleotide-sample slide.
- the blind-equalizer sequencing system determines the estimated cluster locations based on the configuration of the nucleotide-sample slide. For example, in some implementations, the nucleotide-sample slide is patterned and distributes clusters of oligonucleotides across the nucleotide-sample slide according to a patterned arrangement. Alternatively, in one or more cases, the nucleotide-sample slide can be an unpattemed arrangement and randomly distribute clusters of oligonucleotides over the nucleotide-sample slide.
- nucleobase call refers to a determination or prediction of a particular nucleobase (or nucleobase pair) for an oligonucleotide (e.g., nucleotide read) during a sequencing cycle or for a genomic coordinate of a sample genome.
- a nucleobase call can indicate a determination or prediction of the type of nucleobase that has been incorporated within an oligonucleotide on a nucleotide-sample slide (e.g., read-based nucleobase calls).
- a nucleobase call includes a determination or a prediction of a nucleobase based on intensity values resulting from fluorescent-tagged nucleotides added to an oligonucleotide of a nucleotide-sample slide (e.g., in a cluster of a flow cell).
- a single nucleobase call can be an adenine (A) call, a cytosine (C) call, a guanine (G) call, a thymine (T) call, or an uracil (U) call.
- sequencing cycle refers to an iteration of adding or incorporating a nucleobase to an oligonucleotide or an iteration of adding or incorporating nucleobases to oligonucleotides in parallel.
- a cycle can include an iteration of taking an analyzing one or more images with data indicating individual nucleobases added or incorporated into an oligonucleotide or to oligonucleotides in parallel. Accordingly, cycles can be repeated as part of sequencing a nucleic-acid polymer.
- each sequencing cycle involves either single reads in which DNA or RNA strands are read in only a single direction or paired-end reads in which DNA or RNA strands are read from both ends.
- each sequencing cycle involves a camera taking an image of the nucleotide-sample slide or multiple sections of the nucleotide-sample slide to generate image data for determining a particular nucleobase added or incorporated into particular oligonucleotides.
- a sequencing system can remove certain fluorescent labels from incorporated nucleobases and perform another sequencing cycle until the nucleic-acid polymer has been completely sequenced.
- a sequencing cycle includes a cycle within a Sequencing By Synthesis (SBS) run.
- SBS Sequencing By Synthesis
- FIG. 1 illustrates a schematic diagram of a computing system 100 in which a blind-equalizer sequencing system 106 operates in accordance with one or more embodiments.
- the computing system 100 includes one or more server device(s) 102 connected to a user client device 108 and a sequencing device 114 via a network 112. While FIG. 1 shows an embodiment of the blind-equalizer sequencing system 106, alternative embodiments and configurations are possible.
- the server device(s) 102, the user client device 108, and the sequencing device 114 are connected via the network 112. Each of the components of the computing system 100 can communicate via the network 112.
- the network 112 comprises any suitable network over which computing devices can communicate. Example networks are discussed in additional detail below in relation to FIG. 11.
- the computing system 100 includes the sequencing device 114.
- the sequencing device 114 comprises a device for sequencing a whole genome or other nucleic-acid polymer. In some embodiments, the sequencing device 114 analyzes samples to generate data utilizing computer implemented methods and systems described herein either directly or indirectly on the sequencing device 114.
- the sequencing device 114 utilizes Sequencing By Synthesis (SBS) to sequence whole genomes or other nucleic-acid polymers. As shown, in some embodiments, the sequencing device 114 bypasses the network 112 and communicates directly with the user client device 108.
- SBS Sequencing By Synthesis
- the computing system 100 includes the server device(s) 102.
- the server device(s) 102 may generate, receive, analyze, store, receive, and transmit electronic data, such as data for sequencing nucleic-acid polymers.
- the server device(s) 102 may receive data from the sequencing device 114.
- the server device(s) 102 may gather and/or receive sequencing data including nucleobase call data, quality data, and other data relevant to sequencing nucleic-acid polymers.
- the server device(s) 102 may also communicate with the user client device 108.
- the server device(s) 102 can send read data, nucleic-acid polymer sequences, error data, and other information to the user client device 108.
- the server device(s) 102 comprise distributed servers, where the server device(s) 102 include a number of server devices distributed across the network 112 and located in different physical locations.
- the server device(s) 102 can comprise a content server, an application server, a communication server, a web-hosting server, or another type of server.
- the server device(s) 102 can include a sequencing system 104.
- the sequencing system 104 analyzes sequencing data received from the sequencing device 114 to determine nucleotide sequences for whole genomic samples or other nucleic-acid polymers.
- the sequencing system 104 can receive raw data (e.g., base-call data for nucleotide reads) from the sequencing device 114 and determine a nucleic acid sequence for a genomic sample.
- the sequencing system 104 can receive data for nucleotide reads from the sequencing device 114, and the sequencing system 104 generates variant calls (or other nucleobase calls) for a genomic sample from the nucleotide reads.
- the sequencing system 104 determines the sequences of nucleobases in DNA and/or RNA.
- the sequencing device 114 includes the blind-equalizer sequencing system 106.
- the blind-equalizer sequencing system 106 determines equalizer coefficients the compensate for the estimated point-spread-function values and estimated noise values that minimize a mean squared error or other measured difference between the expected response signal and the estimated response signal for at least a cluster of oligonucleotides. More specifically, in some embodiments, the blind-equalizer sequencing system 106 receives signal values (e.g., intensity values) for at least a cluster of oligonucleotides in a given sequencing cycle.
- signal values e.g., intensity values
- the blind-equalizer sequencing system 106 determines (i) an estimated point-spread-function values based on the signal values corresponding to one or more clusters of oligonucleotides and (ii) estimated noise values for a given channel.
- the blind-equalizer sequencing system 106 further determines equalizer coefficients by combining the estimated point-spread-function values and the estimated noise values within the channel with respect to one or more cluster of oligonucleotides.
- the blind-equalizer sequencing system 106 further determines a base call for one or more cluster of oligonucleotides by utilizing the equalizer coefficients.
- the computing system 100 illustrated in FIG. 1 further includes the user client device 108.
- the user client device 108 can generate, store, receive, and send digital data.
- the user client device 108 can receive sequencing data from the sequencing device 114.
- the user client device 108 may communicate with the server device(s) 102 to receive nucleobase calls, nucleotide sequences, and variant call files.
- the user client device 108 can present sequencing data to a user associated with the user client device 108.
- the user client device 108 illustrated in FIG. 1 may comprise various types of client devices.
- the user client device 108 includes non-mobile devices, such as desktop computers or servers, or other types of client devices.
- the user client device 108 includes mobile devices, such as laptops, tablets, mobile telephones, smartphones, etc. Additional details with regard to the user client device 108 are discussed below with respect to FIG. 11.
- the user client device 108 includes a sequencing application 110.
- the sequencing application 110 may be a web application or a native application on the user client device 108 (e.g., a mobile application, desktop application, etc.).
- the sequencing application 110 can comprise instructions that (when executed) cause the user client device 108 to receive or request data from the blind-equalizer sequencing system 106 and present sequencing data.
- the sequencing application 110 can comprise instructions that (when executed) cause the user client device 108 to provide a graphical visualization of a read pileup or read alignment for nucleotide reads for a genomic sample.
- the blind-equalizer sequencing system 106 may be located on the user client device 108 as part of the sequencing application 110. As illustrated, in some embodiments, the blind-equalizer sequencing system 106 is implemented by (e.g., located entirely or in part on) the user client device 108. In yet other embodiments, the blind-equalizer sequencing system 106 is implemented by one or more other components of the computing system 100. In particular, the blind-equalizer sequencing system 106 can be implemented in a variety of different ways across the server device(s) 102, the user client device 108, and the sequencing device 114. In one example, the blind-equalizer sequencing system 106 is located in part on the sequencing device 114 and also the server device(s) 102.
- the blind-equalizer sequencing system 106 can determine equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values on the sequencing device 114 and make a base call for at least the cluster of oligonucleotides utilizing the equalizer coefficients as part of the server device(s) 102.
- FIG. 1 illustrates the components of computing system 100 communicating via the network 112, in some embodiments, the components of computing system 100 communicate directly with each other, bypassing the network.
- the user client device 108 can communicate directly with the sequencing device 114.
- the user client device 108 can communicate directly with the blind-equalizer sequencing system 106, bypassing the network 112.
- the blind-equalizer sequencing system 106 can access one or more databases housed on the server device(s) 102 or elsewhere in the computing system 100.
- FIG. 2 depicts an overview of the blind-equalizer sequencing system 106 generating equalizer coefficients and determining a base call for one or more clusters of oligonucleotides utilizing the equalizer coefficients.
- FIG. 2 depicts an overview of the blind-equalizer sequencing system 106 generating equalizer coefficients and determining a base call for one or more clusters of oligonucleotides utilizing the equalizer coefficients.
- the blind-equalizer sequencing system 106 performs a series of acts that includes an act 202 of receiving signal values for an expected response signal from one or more clusters of oligonucleotides, an act 204 of determining estimated point-spread-function values, an act 206 of determining estimated noise values, an act 208 of determining equalizer coefficients, and an act 210 of determining a base call utilizing the equalizer coefficients.
- FIG. 2 illustrates the act 202 of receiving signal values for an expected response signal from one or more clusters of oligonucleotides.
- the blind-equalizer sequencing system 106 may receive signal values for an expected response signal by capturing expected response signals for one or more clusters of oligonucleotides that the blindequalizer sequencing system 106 excites through laser (e.g., light).
- the blind-equalizer sequencing system 106 can direct a light source with a specified wavelength at a nucleotide-sample slide (or portion of the nucleotide-sample slide) and capture with a camera or other imaging device an image of the clusters within the nucleotide-sample slide emitting an expected response signal.
- the blind-equalizer sequencing system 106 captures multiple images of clusters emitting expected response signals. For instance, the blind-equalizer sequencing system 106 can capture multiple images using various filters or imaging devices.
- the blind-equalizer sequencing system 106 utilizes a two-channel implementation by capturing two images of a section of the nucleotide- sample slide per sequencing cycle.
- the blind-equalizer sequencing system 106 captures a first image using a first filter and captures a second image using a second filter.
- the first and second images can capture the intensity of the emitted signal from one or more clusters that correspond to the filter.
- the blind-equalizer sequencing system 106 can implement sequencing runs using other channel -based approaches.
- the blind-equalizer sequencing system 106 utilizes a four-channel implementation and captures four different images of the section of the flow cell. Similar to the two-channel implementation, the blind-equalizer sequencing system 106 can capture each image for the four-channel implementation using a different filter. Each image can capture an intensity of the emitted signal (e.g., estimated response signal) based on the filter used for that image. Thus, in some cases, each of the four images depicts the emitted signal with a different intensity. Additionally, the blind-equalizer sequencing system 106 can utilize a three-channel implementation and capture three images of the section of the nucleotide-sample slide and using a specific filter capture the intensity of the emitted signal.
- the blind-equalizer sequencing system can perform an act 204 of determining estimated point-spread-function values.
- the estimated point-spread-function values estimate how the camera or imaging device distorts and/or disperses the expected response signal while capturing an image of the cluster of oligonucleotides emitting light in a given channel.
- the blind-equalizer sequencing system can determine the estimated point-spread-function values by leveraging characteristics of the imaging device, sequencing device, and/or the expected response signal.
- the blind-equalizer sequencing system 106 can assume that the clusters of oligonucleotides depicted in a captured image are centered on pixels and regularly spaced according to the layout of the nucleotide-sample slide by combining the captured image with the arrangement of the nucleotide- sample slide as discussed in more detail below.
- the blind-equalizer sequencing system since the blind-equalizer sequencing system depicts the signal values in a captured image, the blind-equalizer sequencing system 106 receives the signal values in a spatial domain (e.g., two-dimensional matrix depicting the intensity of pixels in an image).
- the blindequalizer sequencing system 106 can analyze the signal values in the captured images by converting the image from the spatial domain to a frequency domain which can be alternate representation of the signal values of the expected response signal.
- the blind-equalizer sequencing system can determine a power spectral density for the signal values within the frequency domain and determine the estimated point-spread-function values by converting the power spectral density from the frequency domain to the spatial domain with an inverse fast Fourier transformation.
- the blind-equalizer sequencing system can perform an act 206 of determining estimated noise values.
- the blind-equalizer sequencing system 106 can measure the power of the expected response signal at the comers of the power spectral density grid. Based on the measured power, the blind-equalizer sequencing system 106 can determine the estimated noise values.
- the estimated noise values include independent identically distributed (IID) Gaussian noise, and the blind-equalizer sequencing system can use the IID Gaussian noise as the estimated noise values.
- IID independent identically distributed
- the blind-equalizer sequencing system can perform an act 208 of determining equalizer coefficients. For example, the blind-equalizer sequencing system 106 can determine the equalizer coefficients based on combining the estimated point-spread-function values and the estimated noise values of the given channel. As described further below, in some embodiments, the blind-equalizer sequencing system 106 determines the equalizer coefficients based on an assumption that the expected response signal equals the estimated response signal.
- the blindequalizer sequencing system 106 can determine equalizer coefficients that minimize the mean squared error between the expected response signal (e.g., system input) combined with the estimated point-spread-function values and estimated noise values and the estimated response signal (e.g., system output).
- blind-equalizer sequencing system 106 After determining the equalizer coefficients, blind-equalizer sequencing system 106 performs an act 210 of determining a base call utilizing the equalizer coefficients. For example, the blind-equalizer sequencing system 106 can apply the equalizer coefficients to the signal values depicted by pixels in an image of the region of the nucleotide-sample slide and generate an estimated response signal (e.g., output) that conveys more accurate intensity values by reducing the amount of crosstalk from adjacent clusters of oligonucleotides.
- an estimated response signal e.g., output
- the blind-equalizer sequencing system 106 can make a more accurate nucleobase call for one or more clusters of oligonucleotides.
- the blind-equalizer sequencing system 106 can determine base calls and corresponding response signals for a cluster of oligonucleotides.
- FIGS. 3A-3B shows the blind-equalizer sequencing system 106 capturing and image of expected response signals and determining a nucleobase call based on the estimated response signals for a cluster of oligonucleotides in different channels for a given sequencing cycle.
- an estimated response signal and expected response signal indicates whether and/or to what degree a cluster provides a fluorescent response in a given channel during sequencing.
- FIG. 3 A describes how the blind-equalizer sequencing system 106 captures an image of the expected response signal and how the camera or imaging system affects the expected response signal in accordance with one or more embodiments.
- the blind-equalizer sequencing system 106 can include a nucleotide-sample slide 306 with one or more clusters of oligonucleotides 308. As discussed above, in some embodiments, the blind-equalizer sequencing system 106 can generate an expected response signal 310 by exciting a fluorescent tag with a laser. As further shown in FIG. 3 A, the blind-equalizer sequencing system 106 can transmit or communicate the expected response signal 310 from the nucleotide-sample slide 306 to an equalizer by capturing an image 312 of the expected response signal 310 with a camera 302 or other imaging device.
- the blindequalizer sequencing system 106 transmits the expected response signal 310 with the camera 302, the camera 302 distorts the expected response signal 310 by dispersing the energy or light of the expected response signal 310.
- the equalizer aims to generate an accurate representation (e.g., estimated response signal) of the expected response signal 310 by undoing the distorting effects of the camera 302 while accounting for noise in the sequencing device.
- the blind-equalizer sequencing system 106 and equalizer undo the distortion effects of the camera 302 by identifying characteristics of the camera 302 and generating a mathematical model (e.g., matrix model) that represents the characteristics of the camera 302.
- the blind-equalizer sequencing system 106 circumvents these issues by simplifying the characteristics of the camera 302 and the expected response signal 310 (e.g., a transmitted, expected response signal).
- the blind-equalizer sequencing system 106 can represent the expected response signal 310 for the cluster of oligonucleotides 304 as binary decisions (e.g., the cluster is either on or off).
- the blind-equalizer sequencing system 106 can generate a two-dimensional matrix representing the expected response signals of clusters of oligonucleotides within a region of the nucleotide-sample slide.
- the blind-equalizer sequencing system 106 can utilize the two-dimensional matrix to reduce the interference between adjacent clusters and determine the estimated point-spread-function values.
- each expected response signal 310 for each cluster of the one or more clusters of oligonucleotides 308 in image 312 can have a corresponding PSF.
- the PSF of neighboring clusters overlaps.
- the blind-equalizer sequencing system 106 can simplify the characteristics of the camera 302 by representing the expected response signal 310 as finite and real. In other words, the blind-equalizer sequencing system 106 can generate an image 312 where the area of the expected response signal 310 is limited to a certain number of pixels.
- the blind-equalizer sequencing system 106 can identify a characteristic of the camera 302. More specifically, the blind-equalizer sequencing system 106 can determine that the camera 302 is a minimum phase channel. For example, based on a minimum phase channel, the blind-equalizer sequencing system 106 can determine that the channel has causal and stable characteristics that make the channel’s inverse system unique (e.g., by applying a multiplicative inverse operation) that can be used to estimate channel-specific equalizer coefficients for an image depicting a region of a nucleotide-sample slide.
- inverse system unique e.g., by applying a multiplicative inverse operation
- the blind-equalizer sequencing system 106 can determine such equalizer coefficients based on estimated point-spread-function values, estimated noise values, and estimated cluster locations. For instance, the blind-equalizer sequencing system 106 can utilize the unique causality and stability characteristics of the channel — and combine sampled values of the estimated point-spread-function values and estimated noise values by applying a distribution function and/or a multiplicative inverse operation — to determine equalizer coefficients. Further detail regarding the blind-equalizer sequencing system 106 applying such a distribution function (e.g., Delta distribution function) and/or a multiplicative inverse operation (e.g., inv) in the context of determining equalizer coefficients is described below with regard to FIG. 8.
- a distribution function e.g., Delta distribution function
- a multiplicative inverse operation e.g., inv
- FIG. 3B shows the on/off status of sets of estimated response signals and expected response signals in two different intensity channels for a cluster of oligonucleotides corresponding a particular type of nucleotide base in accordance with one or more embodiments.
- FIG. 3B depicts light intensity in a particular frequency (e.g., frequency band) emitting or not emitting from the cluster of oligonucleotides 322 in cropped images shown in rows alongside nucleobase calls of adenine (A) 328, cytosine (C) 320, thymine (T) 332, and guanine (G) 334.
- A adenine
- C cytosine
- T thymine
- G guanine
- the blind-equalizer sequencing system 106 determines that the expected response signals and the estimated response signals indicate that the cluster of oligonucleotides 322 is “on” (e.g., illuminated or emits light intensity in a particular frequency) in both a first channel captured by a first-channel image 324 and a second channel captured by a second-channel image 326.
- the blind-equalizer sequencing system 106 determines an expected response signal and/or the estimated response signal of the cluster of oligonucleotides 322 is “on” in the first channel captured by the first-channel image 324 and “off’ (e.g., not illuminated or not emitting light intensity in a particular frequency) in the second channel captured by the second-channel image 326.
- the blind-equalizer sequencing system 106 determines expected response signal and/or the estimated response signal indicating that the cluster of oligonucleotides 322 is “off’ in the first channel captured by the first-channel image 324 and “on” in the second channel captured by the second-channel image 326.
- the blind-equalizer sequencing system 106 determines the expected response signal and/or the estimated response signal indicating that the cluster of oligonucleotides 322 is “off’ in both the first channel captured by the first-channel image and the second channel captured by the second-channel image 326.
- the illumination status (e.g., on/active/detectable or off/inactive/undetectable status) of the expected response signal and/or the estimated response signal can take a couplet form or continuous form.
- the illumination status of the expected response signal and/or the estimated response signal can be represented as an illumination indicator. For instance, if an illumination indicator is “on” (and emits light intensity in a particular frequency) in the intensity channel during sequencing, the “on” status can be represented by a 1. Conversely, if the illumination indicator is “off’ (and does not emit detectable light intensity in a particular frequency) in the intensity channel during sequencing, the “off’ status can be represented by a 0.
- the blind-equalizer sequencing system 106 can utilize signal values to determine equalizer coefficients.
- FIG. 4 illustrates a model for measuring signal values from an expected response signal in a given channel in accordance with one or more embodiments.
- the blind-equalizer sequencing system 106 can excite a fluorescent tag that emits an expected response signal by directing a laser (e.g., light) at clusters of oligonucleotides within a region of the nucleotide-sample slide. As shown in FIG. 4, the blind-equalizer sequencing system 106 can perform the act of measuring signal values.
- a laser e.g., light
- the expected response signal 404 can be an input into a camera or other imaging device indicating the on or off status of the cluster of oligonucleotides in a channel.
- the expected response signal 404 can be sparse point sources (e.g., impulse responses) that represent the locations and/or positions of clusters of oligonucleotides (e.g., wells) on the nucleotide-sample slide.
- the expected response signals 404 can be binary signals, such as “on” or “off’ (or alternatively 1 or 0) in a particular channel.
- the expected response signal 404 of one or more clusters are regularly spaced based on the layout of the nucleotide-sample slide.
- the blind-equalizer sequencing system 106 can input the expected response signal 404 into a camera 406 by capturing an image of the expected response signal 404.
- the point-spread function 408 can depict the response of the camera 406 on the expected response signal 404 in an image.
- the response of the camera 406 disperses the expected response signal 404 so that it is no longer a sparse point or impulse but a point-spread function 408.
- the expected response signals 404 can be binary values representing the illumination of the expected response signal 404.
- the blind-equalizer sequencing system 106 can generate a matrix of the expected response signals 404 of one or more clusters by utilizing a distribution function. For example, the blind-equalizer sequencing system 106 can generate a grid (e.g., matrix) of the locations of the cluster of oligonucleotides and set the signal values of the expected response signals of clusters of oligonucleotides to one or zero.
- a grid e.g., matrix
- the blind-equalizer sequencing system can add noise values 410 to the point-spread function 408. For instance, as described above, when the equalizer directly inverts the point-spread function 408 without considering the noise values 410 in the channel, the blind-equalizer sequencing system 106 can amplify the noise in the channel and generate inaccurate intensity values for a cluster of oligonucleotides leading to inaccurate base calls. Thus, in one or more embodiments, the blind-equalizer sequencing system can account for noise in the system by adding noise values 410 to the convolution of the point-spread function 408 and expected response signal 404.
- the blind-equalizer sequencing system 106 can combine the point-spread function 408 and the noise values 410 to generate the signal values 412.
- fy l yl represents the signal values (e.g., pixel intensities) for a given pixel h Xpsf>
- y psf represents the response (e.g., point-spread function) of the channel, X xi-x psf ,yi-y
- the blind-equalizer sequencing system can utilize a system model for measuring signal values in matrix form 416.
- the blind-equalizer sequencing system 106 can capture one or more images of the expected response signal 404 in the channel where pixels of the image depict signal values 412 of the expected response signal 404 combined with noise values 410. As described in more detail below, in one or more cases, the blind-equalizer sequencing system can utilize the image to determine equalizer coefficients.
- the blind-equalizer sequencing system can receive signal values from a captured image comprising signal values for at least a cluster of oligonucleotides and further invert estimated point-spread-function values and estimated noise values — corresponding to the signal values — together to determine equalizer coefficients.
- FIG. 5 illustrates the blind-equalizer sequencing system determining equalizer coefficients.
- the blind-equalizer sequencing system 106 can input an expected response signal 504 for a cluster of oligonucleotides into an imaging device 506 and generate signal values 512 by combining estimated point-spread-function values 508 and estimated noise values 510.
- the blind-equalizer sequencing system can determine the estimated point-spread-function values 508 by leveraging physical characteristics of the system.
- the blind-equalizer sequencing system 106 can determine estimated noise values 510.
- the blind-equalizer sequencing system can determine more accurate base calls by accounting for noise in the sequencing device (e.g., the sequencing device 114).
- noise from the sequencing device can originate from the method of illuminating the clusters of oligonucleotides, the sensor in the optical system, DC offset, spatial crosstalk, etc.
- the blind-equalizer sequencing system 106 can account for one or more sources of noise by determining estimated noise values 510.
- the blind-equalizer sequencing system 106 determines the estimated noise values 510 by applying an independent identically distributed (IID) Gaussian noise.
- IID independent identically distributed
- independent identically distributed Gaussian noise refers to random signal disturbances (e.g., noise) that are statistically unrelated and identically distributed along a Gaussian (e.g., bell-shaped) distribution.
- the blind-equalizer sequencing system 106 can determine the estimated noise values 510 by converting the signal values 512 from a spatial domain to a frequency domain and measuring a band within the frequency domain that does not have a signal.
- the blind-equalizer sequencing system 106 can receive or determine signal values 512 for the expected response signal 504.
- the signal values 512 correspond to the estimated point-spread-function values 508 combined with the expected response signal 504 summed with the estimated noise values 510.
- the blind-equalizer sequencing system 106 can combine the estimated point-spread- function value 508 and the expected response signal 504 by performing a two-dimensional convolution of the estimated point-spread-function values 508 with the expected response signal 504.
- the blind-equalizer sequencing system 106 can simplify the analysis of the signal values and expected response signal 504 by assuming that the pixels in the captured image align with the center of the estimated point-spread-function values 508. In some embodiments, where the pixel does not align with the center of the estimated point- spread-function values 508, the blind-equalizer sequencing system 106 can utilize one or more interpolation methods to determine signal values between pixel centers and/or align the center of the estimated point-spread-function values 508 with a pixel.
- the blind-equalizer sequencing system 106 can receive the signal values for the estimated response signal 518 from one or more clusters of oligonucleotides within a region of the nucleotide-sample slide.
- existing systems can apply an equalizer to the signal values and generate an estimated response signal 518 or output from the equalizer.
- existing system models for an equalizer can be modeled as:
- the blind-equalizer sequencing system 106 can utilize the existing system model to determine more accurate equalizer coefficients.
- the blind-equalizer sequencing system 106 can limit the signal values to certain number of pixels around the cluster of oligonucleotides.
- Z represents the estimated response signal 518
- W represents equalizer coefficients 516
- Y represents signal values 512 for at least the cluster of oligonucleotides.
- the blind-equalizer sequencing system 106 can determine the estimated response signal 518 by multiplying the signal values (e.g., received pixel intensities) with the equalizer coefficients 516 that represent the equalizer response.
- the blind-equalizer sequencing system 106 can utilize blind-equalizer sequencing system model 520.
- some existing systems utilize a decision-directed approach that determines and/or estimates and output response signal by directly processing the signal values 512.
- the blind-equalizer sequencing system 106 can utilize the system model for measuring signal values in matrix form as discussed in FIG. 4 and the limited system model in matrix form as discussed above to determine accurate equalizer coefficients 516 in a feedforward approach.
- the blindequalizer sequencing system 106 can replace the signal values (F) in the limited system model with the estimated point-spread-function values (H), estimated noise values (7), and the expected response signal (X).
- the blind-equalizer sequencing system 106 can determine equalizer coefficients on a cluster-by-cluster basis. For example, in one or more implementations, the blind-equalizer sequencing system 106 can determine, for a target cluster of oligonucleotides, a target estimated point-spread-function value based on targets signal values corresponding to the target cluster of oligonucleotides. For example, as described above, the blind-equalizer sequencing system 106 can receive an image of a target cluster of oligonucleotides and determine a pointspread function 408 (as depicted in FIG. 4) based on the target signal values.
- the blind-equalizer sequencing system 106 can determine target equalizer coefficients by combining the target estimated point-spread-function values and the estimated noise values within the channel. In one or more cases, the target equalizer coefficients can compensate for the target estimated point-spread-function values and the estimated noise values with respect to the target expected response signal. In certain implementations, the blind-equalizer sequencing system 106 can determine a base call for the target cluster by utilizing the target equalizer coefficients.
- the blind-equalizer sequencing system 106 determines the equalizer coefficients 516 by assuming that the estimated response signal 518 equals the estimated response signal 518. In other words, in some embodiments, the blind-equalizer sequencing system 106 assumes that the imaging device 506 did not disperse the expected response signal 504 and the blind-equalizer sequencing system 106 outputs an estimated response signal 518 that mirrors the input (e.g., the expected response signal 504).
- the blind-equalizer sequencing system 106 can determine equalizer coefficients 516 that compensate for the estimated point-spread-function values 508 and the estimated noise values 510 because the equalizer coefficients 516 are the only unknown variable in the in the blind-equalizer sequencing system model 520.
- the blind-equalizer sequencing system 106 can utilize the blind-equalizer sequencing system model 520 to determine accurate equalizer coefficients 516.
- the blind-equalizer sequencing system 106 can minimize the mean squared error between the estimated response signal 518 and the expected response signal 504 in the blind-equalizer sequencing system model 520. For example, in an embodiment where the estimated response signal 518 and the expected response signal 504 for at least a cluster of nucleotides both equal 1 (e.g., “on”), the blind-equalizer sequencing system 106 can determine equalizer coefficients 516 that minimize the error or distance between the estimated response signal 518 and the expected response signal 504 in the blind-equalizer sequencing system model 520. In some embodiments, the blind-equalizer sequencing system 106 can determine the equalizer coefficients 516 utilizing a least-squares approach.
- the blind-equalizer sequencing system 106 can adjust the equalizer coefficients 516 to minimize the mean squared error between one or more expected response signals corresponding to a set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide and one or more estimated response signals (e.g., including the estimated response signal 518) across the set of neighboring clusters of oligonucleotides.
- the blind-equalizer sequencing system 106 can optimize the equalizer response so that the cluster at the center of the estimated point-spread-function values has an output of one and the set of neighboring (e.g., adjacent wells) have an output of zero.
- the blind-equalizer sequencing system 106 can determine a base call 519 utilizing the equalizer coefficients 516. As described above, once the blind-equalizer sequencing system 106 determines the equalizer coefficients in some embodiments, the blindequalizer sequencing system 106 can utilize the equalizer coefficients to make a base call for one or more clusters of oligonucleotides. For example, the blind-equalizer sequencing system 106 can utilize an equalizer that applies the equalizer coefficients to signal values and generates an estimated response signal. In some embodiments, the blind-equalizer sequencing system 106 may utilize a linear equalizer to determine an intensity value for one or more clusters by processing signal values depicted in received images.
- a linear equalizer is a linear filter that can be designed or optimized to filter out noise.
- the equalizer can convert signal values representing light or energy dispersed over one or more pixels into the estimated response signal representing accurate intensity values for at least a cluster of oligonucleotides by linearly weighting pixel intensities with the equalizer coefficients and summing the weighted pixel intensities.
- the linear filter can be applied to each cluster individually or across an entire image.
- the blind-equalizer sequencing system 106 can use the more accurate intensity values to determine a base call.
- the blind-equalizer sequencing system 106 determines a base call by analyzing the intensity values associated with the cluster of oligonucleotides.
- the emitted signals of the cluster can indicate the type of nucleotide base.
- the blind-equalizer sequencing system 106 can analyze the intensity values for signals from the given cluster in both channels or in each of multiple channels (e.g., concurrently) to determine the nucleobase call.
- the blind-equalizer sequencing system 106 can calculate, utilizing an expectation maximization and Gaussian probability distributions, the probability that the signal falls within the intensity-value boundaries of a certain base (A, C, G, or T). The blind-equalizer sequencing system 106 can then call the nucleobase incorporated into the cluster by selecting the intensity -value boundaries of the nucleobase with the highest probability. For example, based on the intensity values emitted by the signal of the cluster, the blind-equalizer sequencing system 106 can determine that the intensityvalues boundaries of the nucleobase with the highest probability for the cluster is adenine (A).
- A adenine
- the blind-equalizer sequencing system 106 can determine the estimated point-spread-function values, such as the estimated point-spread-function values 508 depicted in FIG. 5.
- FIG. 6 illustrates the blindequalizer sequencing system 106 determining the estimated point-spread-function values by converting one or more signal values from one or more cluster of oligonucleotides from a spatial domain to a frequency domain.
- FIG. 6 illustrates the blindequalizer sequencing system 106 determining the estimated point-spread-function values by converting one or more signal values from one or more cluster of oligonucleotides from a spatial domain to a frequency domain.
- FIG. 6 shows the blind-equalizer sequencing system 106 receiving signal values by receiving an initial image 602 depicting signal values from clusters of oligonucleotides within a larger region (or super region) of a nucleotide-sample slide and cropping the initial image 602 to generate an image 604 of a region of the nucleotide-sample slide, where the image 604 depicts an estimated response signal from a target cluster of oligonucleotides located within the region of a flow cell or other nucleotide-sample slide and in the context of a spatial domain.
- spatial domain refers to a two-dimensional matrix (e.g., grid) depicting or representing an image comprising one or more pixels, where each pixel (e.g., element) corresponds to the intensity and/or location of a pixel on the image.
- the image and corresponding pixels can depict one or more signal values for an expected response signal from a target cluster of oligonucleotides and/or expected response signals from neighboring clusters of oligonucleotides.
- the blind-equalizer sequencing system 106 can receive a channelspecific image depicting a signal value for an expected response signal from a target cluster of oligonucleotides within a pixel or sub-pixel of the channel-specific image.
- the channel-specific image depicts (i) a single signal value for an expected response signal from a target cluster of oligonucleotides within a pixel or sub-pixel of the channel -specific image and (ii) additional signal values for additional expected response signals from additional target cluster of oligonucleotides within other pixels or other sub-pixels of the channel-specific image.
- the blind-equalizer sequencing system 106 receives the image 604 by either cropping the image 604 of the region of the nucleotide-sample slide from the initial image 602 or accessing or receiving the image 604 of the region of the nucleotide-sample slide without such cropping.
- the blind-equalizer sequencing system 106 selects a region (e.g., a sub-tile) of a nucleotide-sample slide from a larger or super region (e.g., tile) of the nucleotide-sample slide and crops the image 604 of the region of the nucleotide-sample slide from a center of the initial image 602.
- the blind-equalizer sequencing system 106 can select and crop an image of a region of the nucleotide-sample slide from any location of the initial image 602. In the alternative to cropping the initial image 602, the blind-equalizer sequencing system 106 can access or receive the image 604 of the region of the nucleotide-sample slide without such cropping. For instance, a camera or other imaging device of a sequencing device may initially capture the image 604 of the region of the nucleotide-sample slide and save the initially captured version of the image 604 for further processing.
- the blind-equalizer sequencing system 106 can determine a size of the image 604 of the region of the nucleotide-sample slide.
- the size and/or dimension of the image 604 of the region of the nucleotide-sample slide can comprise 256 x 256 pixels.
- the blind-equalizer sequencing system 106 can select and/or crop a different size of the image 604 from the initial image 602.
- the blind-equalizer sequencing system 106 can select an image depicting a particular side or region of a nucleotide-sample slide based on the particular sequencing device used to process a genomic sample and determining corresponding nucleotide reads.
- the blind-equalizer sequencing system 106 can generate a convoluted matrix 608 depicting modified signal values of expected response signals from the target cluster of oligonucleotides and neighboring clusters of oligonucleotides by combining the image 604 of the region of the nucleotide-sample slide with a Hanning window 606 (e.g., two- dimensional Hanning window).
- the convoluted matrix 608 includes a matrix that modifies the values of pixels of an image.
- the blind-equalizer sequencing system 106 can extract and/or highlight certain features and/or patterns of an image and include such features and/or patterns in the convoluted matrix 608.
- the blind- equalizer sequencing system 106 can generate the convoluted matrix 608 by performing elementwise multiplication between the pixels within the image 604 of the region of the nucleotide- sample slide and the Hanning window 606. Moreover, in certain embodiments, the convoluted matrix 608 can comprise a convoluted image depicting the modified signal values of the pixels in the initial image 602. In some embodiments, prior to converting the signal values from the spatial domain to the frequency domain, the blind-equalizer sequencing system 106 can further remove the DC offset (e.g., low frequency noise) from the signal values.
- DC offset e.g., low frequency noise
- the blindequalizer sequencing system 106 can convert the signal values in the convoluted matrix 608from the spatial domain to a frequency domain.
- the term “frequency domain” refers to a domain that represents one or more signal values in terms of frequency components (e.g., sine and cosine components).
- the frequency domain comprises points that represent a particular frequency (e.g., pixel intensity) present in an image or matrix in the spatial domain.
- the frequency domain can express the rate of change of pixel intensities in an image.
- the blind-equalizer sequencing system 106 can perform additional image analysis in the frequency domain, as further described below.
- the blind-equalizer sequencing system 106 can transform the convoluted matrix 608 into a frequency domain matrix 610 depicting a power spectral density 612 of the signal values for the expected response signals.
- the term “frequency domain matrix” refers to a matrix that includes one or more values representing power spectral density of a signal (e.g., an expected response signal) from a target cluster of oligonucleotides in a frequency domain.
- the frequency domain matrix 610 constitutes or can be referred to as a frequency domain image.
- the blind-equalizer sequencing system 106 generates the frequency domain matrix 610 by applying a Fast Fourier Transformation (FFT) to the convoluted matrix 608.
- FFT Fast Fourier Transformation
- the blind-equalizer sequencing system 106 can apply a non-equispaced Fast Fourier (NFFT) transformation.
- NFFT non-equispaced Fast Fourier
- the NFFT can preserve the original data while enabling analysis of the signal values in the initial image 602.
- the blind-equalizer sequencing system 106 can generate the convoluted matrix 608 in the form of a complex or real matrix by generating conjugate (e.g., complement) matrix of the convoluted matrix 608 and combining (e.g., multiplying) the conjugate matrix with the convoluted matrix 608. Regardless of the format, in some cases, the blind-equalizer sequencing system 106 can normalize the convoluted matrix 608.
- the blind-equalizer sequencing system 106 can determine the power spectral density 612 of the signal values in the frequency domain and/or within the frequency domain matrix 610.
- the term “power spectral density” refers to a distribution of power of signal values over frequency.
- power spectral density can comprise or constitute an average measurement of energy (e.g., response) within a range of spectral bands (wavelengths or frequencies).
- the power spectral density can be represented as a measure of the power spectral density or an average energy over a region of the nucleotide-sample slide.
- the blind-equalizer sequencing system 106 can measure the power spectral density over a central tile or other tile of a nucleotide-sample slide.
- values for the power spectral density can indicate an accumulation and/or average of the energy from at least one cluster of oligonucleotides within a region of the nucleotide-sample slide. For example, in some embodiments, thousands to millions of clusters of oligonucleotides will be “on” or “off’ (e.g., have on or off expected response signals).
- the blind-equalizer sequencing system 106 can convert a power spectral density from a frequency domain to a spatial domain to generate estimated point-spread-function values from a point-spread function based on the on/off status of the cluster of oligonucleotides within the given region of the nucleotide-sample slide.
- the power spectral density 612 can collect at the comers of the frequency domain matrix 610. For example, in a comer coordinate system, the power spectral density 612 gathers at the comers of the frequency domain matrix 610.
- the blind-equalizer sequencing system 106 can generate an up-sampled power spectral density 614 by up-sampling the frequency domain matrix 610. In some cases, the blind-equalizer sequencing system 106 up-samples the frequency domain matrix 610 according to an up-sampling factor.
- an up-sampling factor includes a value that scales or represents a degree to which a matrix and/or image is expanded.
- the blind-equalizer sequencing system 106 scales the frequency domain matrix 610 to the dimensions of the up-sampled power spectral density 614 of 1024 x 1024 pixels.
- the up-sampling factor reduces the interpolation between different frequencies and/or signal values in the frequency domain matrix 610.
- signals can be real or complex based on the components of the signal.
- a signal is complex when it includes two different signals, such as where a first signal comprises real components and a second signal comprises imaginary components.
- a signal is real when it only contains real numbers without any complex or imaginary components.
- signals can include an amplitude and phase.
- the amplitude indicates the height or magnitude of the light emitted by a signal and the phase indicates the position or timing of the signal relative to a reference point.
- the transmission medium can determine if a signal is real or complex.
- the transmission medium can be real or complex.
- the transmission medium is complex if it distorts the amplitude and phase while transmitting a signal.
- the blind-equalizer sequencing system 106 uses an imaging device as the transmission medium for transmitting the expected response signal.
- the blind-equalizer sequencing system 106 based on the blind-equalizer sequencing system 106 using an imaging device as the transmission medium for the expected response signal, the blind-equalizer sequencing system 106 only measures the magnitude or amplitude of the expected response signal without considering the phase of the expected response signal in the frequency domain.
- the blind-equalizer sequencing system 106 can measure one or more signal values (e.g., light intensity) of the expected response signal in a captured image.
- the blind-equalizer sequencing system 106 can determine that the transmission medium, imaging device, is real. By determining that the imaging device is real, the blind-equalizer sequencing system 106 can determine the estimated point-spread-function values by forcing it to be real.
- the blind-equalizer sequencing system 106 can determine estimated point-spread-function values 624 by converting the up-sampled power spectral density 614 of the expected response signal from the frequency domain to the spatial domain.
- the blind-equalizer sequencing system 106 can enforce certain constrains while converting the power spectral density 612 and/or up-sampled power spectral density 614 from the frequency domain to the spatial domain. For example, as discussed above, the blind-equalizer sequencing system 106 can determine that the transmission medium, imaging device, is real and finite.
- the blind-equalizer sequencing system 106 can ensure that the estimated point-spread- function values are real by enforcing Hermitian symmetry to the power spectral density 612 and/or up-sampled power spectral density 614 in the frequency domain.
- the blind-equalizer sequencing system 106 can determine the power spectral density 612 and/or the up-sampled power spectral density 614 in the frequency domain. Because signals can be complex or include different components, in one or more embodiments, the power spectral density 612 and/or up-sampled power spectral density 614has an amplitude component that represents the energy of the signal values across different spectral bands. Accordingly, in one or more implementations, the blind-equalizer sequencing system 106 can determine the estimated point-spread-function values 624 in part by determining the amplitude of the estimated point-spread-function values 624.
- the blind-equalizer sequencing system 106 can determine the amplitude of the estimated point-spread-function values 624 by taking the square root of the amplitude components of the power spectral density 612 and/or the up-sampled power spectral density 614 and converting the amplitude of the estimated point- spread-function values 624 from the frequency domain to the spatial domain.
- the blind-equalizer sequencing system 106 can convert the up-sampled power spectral density 614 from the frequency domain to the spatial domain by taking the square root of the up-sampled power spectral density 614 and applying an inverse Fast Fourier transform (IFFT) to the up-sampled power spectral density 614 of the estimated response signal.
- IFFT inverse Fast Fourier transform
- the term “inverse Fast Fourier transform” refers to a mathematical operation that reverses the transformation performed by a Fast Fourier transformation.
- the IFFT can take the frequency and/or amplitude components in the frequency domain and reconstruct the image in the spatial domain.
- the blind-equalizer sequencing system 106 can generate a spatial domain matrix 616 depicting an up-sampled PSF 618.
- the term “spatial domain matrix” refers to a matrix that includes one or more values representing a point spread function for a signal from a target cluster of oligonucleotides in a spatial domain.
- the spatial domain matrix 616 can include values representing changes to one or more signal values of the target cluster of oligonucleotides occurring from a change between the frequency domain to the spatial domain.
- the spatial domain matrix 616 can be a spatial domain image depicting the PSF and/or up-sampled PSF 618 of the expected response signal.
- an up-sampled PSF comprises a function that describes an up-sampled response of an imaging device or other optical system to a point source.
- the up-sampled PSF 618 can comprise a function that determines values for upsampling the power spectral density 612 in the spatial domain. As shown in FIG. 6, in some embodiments, based on a comer-coordinate system, the up-sampled PSF 618 can be captured in comers of the spatial domain matrix 616.
- the blindequalizer sequencing system 106 can further enforce symmetry on estimated point-spread-function values 624 in the spatial domain and normalize the amplitude of the estimated point-spread- function values. For instance, in one or more implementations, the blind-equalizer sequencing system 106 can generate an intermediate PSF 620. As depicted in FIG. 6, an intermediate PSF includes a transformed matrix or image of the up-sampled PSF 618 at a center coordinate of the matrix and/or image depicting values from the intermediate PSF 620. As indicated by FIG.
- the blind-equalizer sequencing system 106 can generate the intermediate PSF by selecting (e.g., cropping) and combining data of the up-sampled PSF 618 in the comer regions of the spatial domain matrix 616. In one or more embodiments, the blind-equalizer sequencing system 106 can further normalize the cropped and combined data from the up-sampled PSF 618 and set the intermediate PSF 620 as a center coordinate within the matrix and/or image depicting the intermediate PSF 620.
- the blind-equalizer sequencing system 106 can generate the estimated point-spread-function values 624 by applying a Hamming window 622 to the intermediate PSF 620.
- applying the Hamming window 622 involves a two-dimensional convolution with a padding Hamming interpolator.
- the blind-equalizer sequencing system 106 can up-sample the intermediate PSF 620 by the up-sampling factor. As shown in FIG.
- the blind-equalizer sequencing system 106 by convolving the Hamming window 622 with the intermediate PSF 620 and up-scaling the intermediate PSF 620, the blind-equalizer sequencing system 106 generates the estimated point-spread-function values 624.
- the blind-equalizer sequencing system 106 can determine estimated point-spread-function values that are real, two-dimensional, finite, padded, and/or symmetric.
- the blind-equalizer sequencing system 106 can determine such estimated PSF values relevant to a given channel for a camera or other imaging device of a sequencing device. In addition or in the alternative to determining estimated PSF values and equalizer coefficients for an image in an initial or a single channel, the blind-equalizer sequencing system 106 can determine such estimated PSF values — and determine equalizer coefficients based on such estimated PSF values — for images of a given region of a nucleotide-sample slide in different channels (e.g., a first channel and a second channel). As shown in FIG.
- the blind-equalizer sequencing system 106 can utilize such estimated point-spread-function values to improve a signal-to-noise ratio of a target cluster of oligonucleotides.
- FIG. 7 illustrates the blind-equalizer sequencing system 106 generating an image matrix comprising equalizer coefficients for one or more channels and applying the image matrix to subregions of a flow cell or other nucleotide-sample slide within an image.
- the blind-equalizer sequencing system 106 can receive an image 702 from a first channel. As discussed above, the blind-equalizer sequencing system 106 can access or select the image 702 depicting a region 703 of the nucleotide-sample slide. In some cases, the image 702 from the first channel is selected and cropped from an initial image. Moreover, as shown in FIG. 7, the blind-equalizer sequencing system 106 can determine estimated point-spread- function values 708 for the first channel — consistent with the process depicted in FIG. 6 and described above.
- the blind-equalizer sequencing system 106 can utilize the physical characteristics of a sequencing device to determine equalizer coefficients.
- FIG. 7 depicts and the following paragraphs describe how the blind-equalizer sequencing system 106 utilizes the nucleotide-sample slide and aspects of the sequencing device to determine equalizer coefficients.
- FIG. 7 further illustrates that the blind-equalizer sequencing system 106 can receive estimated cluster locations for a target cluster and neighboring clusters of oligonucleotides.
- the blind-equalizer sequencing system 106 can receive the estimated cluster locations by receiving a patterned arrangement 706 of the estimated cluster locations for the target cluster and the neighboring clusters of oligonucleotides arranged according to a pattern within the region 703 of the nucleotide-sample slide.
- the term “patterned arrangement” refers to a patterned configuration of estimated cluster locations or estimated well locations comprising either clusters of oligonucleotides (e.g., comprising clusters of oligonucleotides or unseeded lawn).
- the patterned arrangement can include a patterned distribution of the estimated cluster locations within a region of the nucleotide-sample slide.
- the patterned arrangement can take a form of, but is not limited to, a grid taking a shape of a square, rectangle, triangle, rhombus, hexagon, or diamond.
- the patterned arrangement can include a pitch between estimated cluster locations.
- the pitch can indicate an estimated distance between pixels depicting clusters of oligonucleotides.
- the blind-equalizer sequencing system 106 can receive the estimated cluster locations by receiving an unpattemed arrangement of estimated cluster locations for a target cluster and neighboring clusters of oligonucleotides.
- unpattemed arrangement refers to a randomly or unevenly distributed configuration of estimated cluster locations or estimated well locations (e.g., comprising clusters of oligonucleotides or unseeded lawn).
- the blind-equalizer sequencing system 106 can determine a grid of estimated nanowell locations for nanowells comprising a target cluster of oligonucleotides and neighboring clusters of oligonucleotides. To illustrate, based on a square patterned arrangement of estimated cluster locations, the blind-equalizer sequencing system 106 can generate a square grid of the estimated nanowell locations for nanowells including the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides.
- the blind-equalizer sequencing system 106 can utilize the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides and the estimated point-spread-function values 708 to generate an image matrix 712 with equalizer coefficients. As suggested by FIG. 7, for instance, the blind-equalizer sequencing system 106 can combine the estimated point-spread-function values 708 and the estimated cluster locations to determine the image matrix 712 comprising the equalizer coefficients. As discussed below in FIG.
- the blind-equalizer sequencing system 106 can generate the image matrix 712 comprising equalizer coefficients by identifying the pitch of the patterned arrangement 706 for the region 703 of the nucleotide-sample slide and utilizing the pitch of the patterned arrangement 706 to generate a convolution matrix for a given channel. In one or more embodiments, the blind-equalizer sequencing system 106 utilizes the convolution matrix for the channel to determine the equalizer coefficients.
- the blind-equalizer sequencing system 106 can generate, for the first channel, a set of subregion image matrices 716 to subsequently apply to a set of subregions 720 of the image 702 of the region 703 of nucleotide- sample slide.
- a subregion image matrix can constitute a subregion of a larger image matrix (e.g., the image matrix 712) and include equalizer coefficients for editing and/or processing a subregion of an image.
- a subregion image matrix applies such coefficients and increases or maximizes signal-to-noise ratio of an expected response signal affected by noise and/or crosstalk.
- a subregion image matrix includes equalizer coefficients
- the subregion image matrix can include subregion equalizer coefficients.
- Such subregion equalizer coefficients can accordingly include weighted values that can be applied to a subregion of an image of clusters of oligonucleotides that adjust for (or reduce) inter-symbol interference (e.g., crosstalk) between clusters of oligonucleotides.
- the blindequalizer sequencing system 106 identifies corresponding subregions of an image.
- the blind-equalizer sequencing system 106 identifies a set of subregions 720 from the image 702 of the region 703 of the nucleotide-sample slide.
- a subregion of an image that is part of a set of subregions includes or corresponds to a subregion of a nucleotide- sample slide from a larger region. Accordingly, in some cases, the subregion differs in size, dimension, and location of the region 703 of the nucleotide-sample slide but can nevertheless be within the region 703.
- the blind-equalizer sequencing system 106 can identify, for a channel, the set of subregions 720 within the image 702 of the region 703 of the nucleotide-sample slide. In some embodiments, the blind-equalizer sequencing system 106 can select the number, size, and/or dimensions of the subregions in the set of subregions 720. For example, as shown in FIG. 7, the blind-equalizer sequencing system 106 identifies nine such subregions (e.g., 3 x 3) for the set of subregions 720.
- the blind-equalizer sequencing system 106 can identify, from an image of a nucleotide-sample-slide region, nine subregions of different layouts (e.g., 1 x 9), five subregions of a different layout (e.g., 1 x 5), seven subregions (e.g., 1 x 7), or fifteen subregions (e.g., 3 x 5) for the set of subregions 720.
- nine subregions of different layouts e.g., 1 x 9
- five subregions of a different layout e.g., 1 x 5
- seven subregions e.g., 1 x 7
- fifteen subregions e.g., 3 x 5
- the blind-equalizer sequencing system 106 Having generated the image matrix 712 and/or identified the set of subregions 720, in some cases, the blind-equalizer sequencing system 106 generates or initializes the set of subregion image matrices 716 with subregion equalizer coefficients. For example, the blind-equalizer sequencing system 106 can generate the set of subregion image matrices 716 comprising equalizer coefficients that initially match the equalizer coefficients of the image matrix 712. Further, in some cases, the blind-equalizer sequencing system 106 can improve the accuracy of such subregion equalizer coefficients of a set of subregion image matrices over sequencing cycles of a sequencing run (e.g., by utilizing a decision-direct approach to adjust equalizer coefficients).
- the blind-equalizer sequencing system 106 can improve the accuracy of the equalizer coefficients of an image matrix over the course of a sequencing run. For example, in some cases, the blind-equalizer sequencing system 106 can correct for systematic differences in a current sequencing run relative to offline training of the equalizer.
- the blind-equalizer sequencing system 106 can generate a base call for a target cluster of oligonucleotides.
- the blind-equalizer sequencing system 106 can apply the image matrix 712 comprising the equalizer coefficients to the image 702 from a first channel.
- the blind-equalizer sequencing system 106 can apply non-linear distortion to the equalized image and extract signal values from the equalized image.
- the blind-equalizer sequencing system 106 can spatially normalize and compress the signal values of the modified image.
- the blind-equalizer sequencing system 106 can further correct for phasing and pre-phasing of the signal values of the modified image and normalize the signal values.
- the blind-equalizer sequencing system 106 can use the corrected signal values to make a base call and generate a quality score for the target cluster of oligonucleotides.
- the blind-equalizer sequencing system 106 can determine channelspecific image matrices comprising channel-specific equalizer coefficients. For instance, as shown in FIG. 7, the blind-equalizer sequencing system 106 can determine an image matrix 714 comprising additional equalizer coefficients for an image 704 from a second channel. As FIG. 7 illustrates, the blind-equalizer sequencing system 106 can access or otherwise receive the image 704 from the second channel consistent with the description above. In some embodiments, the image 704 depicts one or more additional signal values in the second channel for an additional expected response signal from the target cluster of oligonucleotides within an additional region 705 of the nucleotide-sample slide.
- the additional region 705 of the nucleotide- sample slide depicted by the image 704 from the second channel can be the same region as the region 703 of the nucleotide-sample slide depicted by the image 702 from the first channel.
- the blind-equalizer sequencing system 106 can determine, for the second channel, estimated point-spread-function values 710 based on the image 704 depicting one or more additional signal values corresponding to the target cluster of oligonucleotides.
- the blind-equalizer sequencing system 106 can determine additional estimated noise values in the second channel consistent with the description above.
- the blind-equalizer sequencing system 106 determines, for the second channel, an image matrix 714 using a same or similar process as performed for the image matrix 712 for the first channel. Accordingly, as indicated above, the blind-equalizer sequencing system 106 can determine the image matrix 714 by combining the estimated point-spread-function values 710 with the estimated cluster locations from the patterned arrangement 706 or an unpattemed arrangement (not shown). As shown in FIG. 7, in some embodiments, the blindequalizer sequencing system 106 can use the same estimated cluster locations (e.g., the patterned arrangement 706) for the image 704 from the second channel. As further shown in FIG. 7, in some embodiments, the blind-equalizer sequencing system 106 can likewise generate, for the second channel and the image 704, a set of subregion image matrices 718 comprising additional subregion equalizer coefficients, as described above.
- the blind-equalizer sequencing system 106 can determine a base call for the target cluster of oligonucleotides based on signal values that have been determined from multiple channels and from equalizer coefficients corresponding to the multiple channels. In particular, the blind-equalizer sequencing system 106 can determine a base call for a target cluster of oligonucleotides based on a single intensity value of the emitted signal in each channel for a given sequencing cycle.
- the blindequalizer sequencing system 106 can use a first intensity value (X) for a target cluster of oligonucleotides in a first channel and a second intensity value (Y) for the target cluster of oligonucleotides in a second channel to determine a probability that the signal values are located within the intensity-value boundaries of a certain nucleobase (e.g., A, C, G, or T).
- a certain nucleobase e.g., A, C, G, or T.
- FIG. 7 illustrates an example of equalizer coefficients that can be applied to modify such first and second intensity values for a target cluster of oligonucleotides in first and second channels.
- the image matrix 712 to the image 702 depicting one or more signal values (e.g., a single signal value) for an expected response signal in a first channel and from the target cluster of oligonucleotides — and applying the image matrix 714 to the image 704 depicting one or more additional signal values for an additional expected response signal in a second channel and from the target cluster of oligonucleotides — the blind-equalizer sequencing system 106 generates an estimated response signal that conveys more accurate intensity values for the target cluster.
- signal values e.g., a single signal value
- the blindequalizer sequencing system 106 by applying a subregion image matrix of the set of subregion image matrices 716 to a subregion, from the set of subregions 720 of the image 702, depicting one or more signal values for an expected response signal in a first channel and from the target cluster of oligonucleotides — and applying a subregion image matrix of the set of subregion image matrices 718 to a subregion, from a set of subregions 722 of the image 704, depicting one or more signal values for an expected response signal in a second channel and from the target cluster of oligonucleotides — the blindequalizer sequencing system 106 generates an estimated response signal that conveys more accurate intensity values for the target cluster.
- the set of subregions 722 of the image 704 in the second channel depict the same subregions of the nucleotide-sample slide as the set of subregions 720 of the image 702 in the first channel.
- the blind-equalizer sequencing system 106 Based on the estimated response signal that accounts for the equalizer coefficients or subregion equalizer coefficients in both the first and second channel, the blind-equalizer sequencing system 106 generates a base call for the target cluster of oligonucleotides depicted by both the image 702 and the image 704 for a given sequencing cycle.
- the blind-equalizer sequencing system 106 determines one or more additional base calls for additional target clusters of oligonucleotides depicted by the set of subregions 720 of the image 702.
- the blindequalizer sequencing system 106 By applying the image matrix 712 to the image 702 depicting additional signal values for additional expected response signals in a first channel and from additional target cluster of oligonucleotides depicted by the set of subregions 720 of the image 702 — and applying the image matrix 714 to the image 704 depicting additional signal values for additional expected response signals in a second channel and from the additional target cluster of oligonucleotides — the blindequalizer sequencing system 106 generates additional estimated response signals that convey more accurate intensity values for such additional target clusters.
- the blind-equalizer sequencing system 106 generates additional estimated response signals that convey more accurate intensity values for such additional target clusters.
- the blind-equalizer sequencing system 106 Based on the additional estimated response signals that account for the equalizer coefficients or subregion equalizer coefficients in both the first and second channel, the blind-equalizer sequencing system 106 generates additional base calls for the additional target clusters of oligonucleotides depicted by both the image 702 and the image 704 for a given sequencing cycle. [0136] As just described, the blind-equalizer sequencing system 106 can utilize an image matrix comprising equalizer coefficients to determine a base call for a target cluster.
- the blind-equalizer sequencing system 106 can generate the image matrix comprising equalizer coefficients by utilizing the estimated point-spread-function values and estimated cluster locations of the target cluster of oligonucleotides and neighboring clusters of oligonucleotides. In some cases, the blind-equalizer sequencing system 106 utilizes the estimated cluster locations of the target cluster of oligonucleotides and neighboring clusters of oligonucleotides to apply a distribution function to signal values of the target cluster of oligonucleotides and neighboring clusters of oligonucleotides. In accordance with one or more embodiments of the present disclosure, FIG. 8 illustrates the blind-equalizer sequencing system 106 applying such a distribution function.
- the blind-equalizer sequencing system 106 can receive an arrangement (e.g., patterned or unpattemed) of estimated cluster locations of a target cluster of oligonucleotides and neighboring clusters of oligonucleotides.
- the blindequalizer sequencing system 106 can set the estimated cluster location of the target cluster of oligonucleotides as a center coordinate 810 of an up-sampled arrangement 802 of estimated cluster locations. As shown in FIG.
- the blind-equalizer sequencing system 106 generates the up- sampled arrangement 802 of estimated cluster locations by applying an up-sampling factor to an initial arrangement (e.g., patterned arrangement) of estimated cluster locations within a region of a nucleotide-sample slide. For instance, as shown in FIG. 8, the blind-equalizer sequencing system 106 can multiply a pitch (e.g., AX, AY) between pixels representing estimated cluster locations by the up-sampling factor. As further shown in FIG. 8, the estimated cluster location of the target cluster of oligonucleotides can stay at the center coordinate 810 of the up-sampled arrangement 802 after application of the up-sampling factor.
- a pitch e.g., AX, AY
- the blind-equalizer sequencing system 106 can determine equalizer coefficients in part by utilizing an arrangement of estimated cluster locations of a target cluster of oligonucleotides and neighboring clusters of oligonucleotides to modify signal values of the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides. As shown in FIG. 8, the blind-equalizer sequencing system 106 can apply a distribution function 804 to modify one or more signal values 808 of the target cluster of oligonucleotides and additional signal values of the neighboring clusters of oligonucleotides as depicted in graph 806. As indicated by a value scale 812 shown in FIG.
- the distribution function 804 can comprise a Dirac delta function that sets (i) the one or more signal values 808 of the target cluster of oligonucleotides at the center coordinate 810 to one and (ii) the additional signal values of the neighboring clusters of oligonucleotides within a region of a nucleotide-sample slide to zero.
- the one or more signal values 808 of the target cluster of oligonucleotides at the center coordinate 810 of the up-sampled arrangement 802 is set to one and the additional signal values of the neighboring clusters of oligonucleotides at non-central coordinates are set to zero.
- the blind-equalizer sequencing system 106 can determine an image matrix comprising the equalizer coefficients by modifying and combining the estimated point-spread-function values and estimated noise values.
- a size and/or dimensions of the image matrix can be based on the sequencing device and/or size of a region of a nucleotide-sample slide depicted in the image.
- the dimensions of the image mask can be 7 x 7 pixels.
- the size and/or dimensions of the estimated point-spread-function values (e.g., matrix) can differ from the size and/or dimensions of the image matrix.
- the blind-equalizer sequencing system 106 can compensate for different properties of a sequencing device and/or a nucleotide-sample slide by modifying the estimated point-spread-function values and the estimated noise values of the target cluster of oligonucleotides.
- the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values by (i) sampling one or more values from the estimated point-spread-function values and generating convoluted estimated point- spread-function values for a given channel with the sampled values from the estimated point- spread-function values and (ii) combining the estimated point-spread-function values with the distribution function.
- the blind-equalizer sequencing system 106 can determine the image matrix comprising equalizer coefficients by combining the modified estimated point-spread- function values and the modified estimated noise values. In certain embodiments, the blindequalizer sequencing system 106 can modify the estimated noise values by sampling a random subset of noise values from the estimated noise values. In one or more implementations, the blindequalizer sequencing system 106 can combine the modified estimated point-spread-function values and the modified estimated noise values by dividing the modified estimated point-spread-function values with the modified estimated noise values.
- the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values and estimated noise values by transposing the estimated point-spread-function values and the estimated noise values.
- the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values by generating transposed delta estimated point- spread-function values (H ) T and square and symmetric estimated point-spread-function values( / T H) . Moreover, the blind-equalizer sequencing system 106 can modify the estimated noise values by generating square and symmetric transposed estimated noise values ( T 7).
- inv represents a multiplicative inverse operation.
- the characteristics of a minimum phase channel such as causality and stability, make the minimum phase channel’s inverse system unique. Consequently, after determining H representing or comprising the estimated point-spread-function values, the blind-equalizer sequencing system 106 can unique equalizer coefficients W based on the foregoing equation using a multiplicative inverse operation.
- the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values by generating transposed delta estimated point-spread-function values.
- transposed delta estimated point-spread-function values refers to a transposed matrix of estimated point-spread- function values modified by a distribution function.
- the transposed delta estimated point-spread-function values can accordingly be a transposition of estimated point-spread-function values modified by a Dirac delta function.
- the blind-equalizer sequencing system 106 can further modify the estimated point-spread-function values by generating square and symmetric estimated point-spread-function values.
- the term “square and symmetric estimated point-spread-function values” refers to estimated point-spread- function values combined with transposed estimated point-spread-function values.
- the square and symmetric estimated point-spread-function values can include multiplying the estimated point-spread-function values with a transposed (e.g., flipped) version of the estimated point-spread-function values.
- the estimated point-spread-function values can be generated by up-sampling the intermediate PSF 620 with a hamming interpolator.
- the blind-equalizer sequencing system 106 can modify estimated noise values by generating square and symmetric estimated noise values.
- square and symmetric transposed estimated noise values refers to estimated noise values combined with transposed estimated noise values.
- the blind-equalizer sequencing system 106 can multiply a matrix of estimated noise values with a transposed matrix of the estimated noise values.
- the estimated noise values are a randomly sampled subset of noise values.
- the estimated noise values can be generated by up-sampling the noise with a hamming interpolator.
- the blind-equalizer sequencing system 106 can further determine the image matrix comprising the equalizer coefficients ⁇ /indicated above by combining the transposed delta estimated point-spread-function values, the square and symmetric estimated point- spread-function values, and the square and symmetric transposed estimated noise values. For example, in one or more embodiments, the blind-equalizer sequencing system 106 can divide the transposed delta estimated point-spread-function values by the square and symmetric estimated point-spread-function values and the square and symmetric transposed estimated noise values.
- the blind-equalizer sequencing system 106 can generate the image matrix by reshaping the normalized equalizer coefficients in column major order.
- the blind-equalizer sequencing system 106 can update the equalizer coefficients during sequencing cycles.
- the blind-equalizer sequencing system 106 can utilize a decision directed approach and/or feedback loop to fine tune the equalizer coefficients that have been blindly initialized.
- the blind-equalizer sequencing system 106 can initialize an equalizer with the equalizer coefficients.
- an equalizer utilizing the equalizer coefficients can generate a more accurate estimated response signal.
- the blind-equalizer sequencing system 106 can determine an additional estimated response signal during a subsequent sequencing cycle.
- the blind-equalizer sequencing system 106 can utilize the additional estimated response signal to update the equalizer coefficients.
- the blind-equalizer sequencing system 106 can process one or more additional signal values and approximate an additional estimated response signal for at least one cluster of oligonucleotides. Based on determining the minimum mean squared error between the additional estimated response signal and the additional expected response signal, the blindequalizer sequencing system 106 can update the equalizer coefficients.
- FIG. 9 illustrates improved performance of the blindequalizer sequencing system 106 in terms of base-call-quality scores for nucleobase calls of a sequencing device.
- FIG. 9 shows a graph 902 depicting the percentage of nucleobase calls across a sequencing run that equal or exceed and a base-call-quality score (Q score) of 30 across cycle 0 through approximately cycle 325.
- Q score base-call-quality score
- the blind-equalizer sequencing system 106 generates a higher percentage of base calls meeting or exceeding a base- call-quality score (Q score) of 30 across multiple cycles in relation to a baseline or existing sequencing system that initializes or adjusts equalizer coefficients primarily or exclusively through a training approach of determining differences or losses as a basis for updating equalizer coefficients based on a comparison of predicted base calls and assumed base calls.
- Q score base- call-quality score
- FIG. 9 depicts the improved performance in terms of base-call-quality scores through (i) plot lines 904a and 904b representing a percentage of nucleobase calls determined by the blindequalizer sequencing system 106 that satisfy or exceed Q30 and (ii) plot lines 906a and 906b representing a percentage of nucleobase calls determined by a baseline sequencing system that satisfy or exceed Q30 and.
- plot lines 904a and 906a across cycle 0 through approximately cycle 150 for a first nucleotide read mate (Rl)
- the blind-equalizer sequencing system 106 generates a higher percentage of nucleobase calls that satisfy Q30 for Rl relative to the baseline sequencing system.
- the blind-equalizer sequencing system 106 likewise generates a higher percentage of nucleobase calls that satisfy Q30 for R2 relative to the baseline sequencing system.
- FIGS. 1-9, the corresponding text, and the examples provide a number of different methods, systems, devices, and non-transitory computer-readable media of the blind-equalizer sequencing system 106.
- one or more embodiments can also be described in terms of flowcharts comprising acts for accomplishing particular results, as shown in FIGS. 10A-10B.
- the series of acts may be performed with more or fewer acts.
- the acts may be performed in different orders.
- the acts described herein may be repeated or performed in parallel with one another or in parallel with different instances of the same or similar acts.
- FIG. 10A illustrates a flowchart of a series of acts 1000 for determining a base call for a cluster of oligonucleotides utilizing the equalizer coefficients in accordance with one or more embodiments. While FIG. 10A illustrates acts according to one embodiment, alternative embodiments may omit, add to, reorder, and/or modify any of the acts shown in FIG. 10A. In some implementations, the acts of FIG. 10A are performed as part of a method. In some instances, a non- transitory computer-readable medium stores instructions thereon that, when executed by at least one processor, cause a computing device to perform the acts of FIG. 10A. In some implementations, a system performs the acts of FIG. 10A. For example, in one or more cases, a system includes at least one processor and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to perform the acts of FIG. 10A.
- the series of acts 1000 includes an act 1002 for receiving signal values for an estimated response signal from a cluster of oligonucleotides. Additionally, the series of acts 1000 includes an act 1004 of determining estimated point-spread-function values based on the signal values for the cluster of oligonucleotides. Further, the series of acts 1000 includes an act 1006 of determining estimated noise values. The series of acts 1000 further includes an act 1008 of determining equalizer coefficients based on the estimated point-spread-function values and the estimated noise values. In some cases, the series of acts includes an act 1010 of determining a base call for the cluster of oligonucleotides utilizing the equalizer coefficients.
- the series of acts 1000 depicted in FIG. 10A can include acts to perform any of the operations described in the following clauses:
- a computer-implemented method comprising: receiving, for a sequencing cycle, signal values for an expected response signal from at least a cluster of oligonucleotides within a region of a nucleotide-sample slide; determining, for a channel, estimated point-spread-function values based on the signal values corresponding to at least the cluster of oligonucleotides; determining estimated noise values within the channel; determining, based on combining the estimated point-spread-function values and the estimated noise values within the channel, equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values with respect to the expected response signal from at least the cluster of oligonucleotides; and determining a base call for at least the cluster of oligonucleotides utilizing the equalizer coefficients.
- CLAUSE 2 The computer-implemented method of clause 1, wherein the signal values from the cluster of oligonucleotides corresponds to the estimated point-spread-function values combined with the expected response signal from the cluster of oligonucleotides summed with the estimated noise values.
- CLAUSE 3 The computer-implemented method of clause 1, further comprising determining the equalizer coefficients by adjusting the equalizer coefficients to minimize a mean squared error between one or more expected response signals corresponding to a set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide and one or more estimated response signals across the set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide.
- CLAUSE 4 The computer-implemented method of clause 3, wherein the expected response signal corresponds to the equalizer coefficients combined with the signal values from at least the cluster of oligonucleotides.
- CLAUSE 5 The computer-implemented method of clause 1, further comprising determining the estimated point-spread-function values by: receiving the signal values of the expected response signal from at least the cluster of oligonucleotides within a region of the nucleotide-sample slide in a spatial domain; converting the signal values of the expected response signal from at least the cluster of oligonucleotides to a frequency domain; determining a power spectral density of the signal values of the expected response signal in the frequency domain; and converting the power spectral density of the expected response signal the frequency domain to the spatial domain by applying an inverse fast Fourier transformation (IFFT) to the power spectral density of the expected response signal.
- IFFT inverse fast Fourier transformation
- CLAUSE 7 The computer-implemented method of clause 5, wherein the power spectral density is an average measurement of energy within a range of spectral bands.
- CLAUSE 8 The computer-implemented method of clause 5, wherein converting the power spectral density of the expected response signal in the frequency domain to the spatial domain further comprises taking a square root of the power spectral density of the expected response signal in the frequency domain.
- CLAUSE 9. The computer-implemented method of clause 5, wherein the expected response signal comprises a measurement of an amplitude of the expected response signal.
- CLAUSE 10 The computer-implemented method of clause 1, wherein the estimated noise values comprise independent identically distributed Gaussian noise.
- CLAUSE 11 The computer-implemented method of clause 1, wherein the channel is a minimum phase response channel.
- CLAUSE 12 The computer-implemented method of clause 1, further comprising: determining an estimated response signal from at least the cluster of oligonucleotides; and based on the estimated response signal, determining a base call for at least the cluster of oligonucleotides.
- CLAUSE 13 The computer-implemented method of clause 1, further comprising: initializing an equalizer utilizing the equalizer coefficients; during a subsequent sequencing cycle, determining an additional estimated response signal from at least an additional cluster of oligonucleotides; and based on the additional estimated response signal, updating the equalizer coefficients.
- CLAUSE 14 The computer-implemented method of clause 1, further comprising: determining, for the channel, target estimated point-spread-function values based on target signal values corresponding to a target cluster of oligonucleotides; determining, based on combining the target estimated point-spread-function values and the estimated noise values within the channel, target equalizer coefficients that compensate for the target estimated point-spread-function values and the estimated noise values with respect to a target expected response signal from the target cluster of oligonucleotides; and determining the base call for the target cluster of oligonucleotides utilizing the target equalizer coefficients.
- CLAUSE 15 The computer-implemented method of clause 1, further comprising: initializing an equalizer of a sequencing device utilizing the equalizer coefficients; during a subsequent sequencing cycle on the sequencing device, determining an additional estimated response signal from at least an additional cluster of oligonucleotides; and based on the additional estimated response signal, modifying the equalizer coefficients for the equalizer of the sequencing device.
- FIG. 10B illustrates a flowchart of a series of acts 1011 for determining a base call for a target cluster of oligonucleotides utilizing an image matrix in accordance with one or more embodiments. While FIG. 10B illustrates acts according to one embodiment, alternative embodiments may omit, add to, reorder, and/or modify any of the acts shown in FIG. 10B. In some implementations, the acts of FIG. 10B are performed as part of a method. In some instances, anon- transitory computer-readable medium stores instructions thereon that, when executed by at least one processor, cause a computing device to perform the acts of FIG. 1 OB. In some implementations, a system performs the acts of FIG. 10B. For example, in one or more cases, a system includes at least one processor and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to perform the acts of FIG. 10B.
- the series of acts 1011 can include an act 1012 of receiving an image depicting one or more signal values for an expected response signal from a target cluster of oligonucleotides.
- the series of acts 1011 can include an act 1014 of receiving estimated cluster locations for the target cluster or oligonucleotides and neighboring clusters of oligonucleotides.
- the series of acts 1011 can include an act 1016 of determining, estimated point-spread-function values and estimated noise values corresponding to the target cluster of oligonucleotides.
- the series of acts 1011 can include an act 1018 of determining an image matrix comprising equalizer coefficients. In some implementations, the series of acts 1011 can include an act 1020 of generating a base call for the target cluster of oligonucleotides by applying the image matrix to the image.
- the series of acts 1011 depicted in FIG. 10B can include acts to perform any of the operations described in the following clauses:
- a computer-implemented method comprising: receiving, for a sequencing cycle, an image depicting one or more signal values for an expected response signal from a target cluster of oligonucleotides within a region of a nucleotide- sample slide; receiving, for the region of the nucleotide-sample slide, estimated cluster locations for the target cluster of oligonucleotides and neighboring clusters of oligonucleotides within the region; determining, for a channel, estimated point-spread-function values and estimated noise values based on the one or more signal values corresponding to the target cluster of oligonucleotides; determining, based on combining the estimated point-spread-function values, the estimated cluster locations, and the estimated noise values, an image matrix comprising equalizer coefficients; and generating a base call for the target cluster of oligonucleotides by applying the image matrix to the image depicting the one or more signal values for the expected response signal from the target cluster of oligonucle
- CLAUSE 17 The computer-implemented method of clause 16, further comprising receiving the estimated cluster locations by: receiving a patterned arrangement of the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides arranged according to a pattern within the region of the nucleotide-sample slide; or receiving a non-pattemed arrangement of the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides arranged without a pattern within the region of the nucleotide-sample slide.
- CLAUSE 18 The computer-implemented method of clause 17, wherein the patterned arrangement of the estimated cluster locations comprises a grid of estimated nano well locations for nanowells comprising the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides.
- CLAUSE 19 The computer-implemented method of clause 16, further comprising determining the image matrix by determining an image mask comprising the equalizer coefficients.
- CLAUSE 20 The computer-implemented method of clause 16, further comprising determining the estimated point-spread-function values by: generating, from the image, a frequency domain matrix comprising values for a power spectral density of the one or more signal values of the expected response signal; generating, from the image, a spatial domain matrix comprising values for an up-sampled point-spread function by converting the power spectral density from a frequency domain to a spatial domain and combining comer regions from the spatial domain matrix; and determining the estimated point-spread-function values from the spatial domain matrix.
- CLAUSE 21 The computer-implemented method of clause 16, further comprising determining the estimated point-spread-function values by: generating, from the image, a frequency domain matrix comprising values for a power spectral density of the one or more signal values of the expected response signal; up-sampling the frequency domain matrix to generate an up-sampled power spectral density of the one or more signal values; generating, from the image, a spatial domain matrix comprising values for an up-sampled point-spread function by converting the up-sampled power spectral density from a frequency domain to a spatial domain; generating an intermediate spatial domain matrix comprising values for an intermediate point-spread function by combining comer regions from the spatial domain matrix; and up-sampling the intermediate spatial domain matrix.
- CLAUSE 22 The computer-implemented method of clause 20 or 21, further comprising generating the frequency domain matrix by: generating a convoluted matrix by combining the image of the region of the nucleotide- sample slide with a two-dimensional banning window; and applying a Fast Fourier Transform (FFT) to the convoluted matrix.
- FFT Fast Fourier Transform
- CLAUSE 23 The computer-implemented method of clause 20 or 21, further comprising: up-sampling the frequency domain matrix by an up-sampling factor; and up-sampling an arrangement of the estimated cluster locations within the region by the upsampling factor.
- CLAUSE 24 The computer-implemented method of clause 21, further comprising applying a two-dimensional hamming window to the intermediate spatial domain matrix.
- CLAUSE 25 The computer-implemented method of clause 16, further comprising determining the image matrix comprising equalizer coefficients by: modifying the estimated point-spread-function values; modifying the estimated noise values; and combining the modified estimated point-spread-function values and the modified estimated noise values to generate the image matrix.
- CLAUSE 26 The computer-implemented method of clause 16, further comprising determining the image matrix comprising equalizer coefficients by: generating transposed delta estimated point-spread-function values by transposing a combination of the estimated point-spread-function values with a distribution function; generating square and symmetric estimated point-spread function values by combining the estimated point-spread-function values with transposed estimated point-spread-function values; generating square and symmetric transposed estimated noise values by combining the estimated noise values with transposed estimated noise values; and combining the transposed delta estimated point-spread-function values, the square and symmetric estimated point-spread-function values, and the square and symmetric transposed estimated noise values to generate the image matrix.
- CLAUSE 27 The computer-implemented method of clause 26, wherein the distribution function comprises a Dirac delta function that sets the one or more signal values of the target cluster of oligonucleotides to one and sets additional signal values of the neighboring clusters of oligonucleotides to zero.
- the distribution function comprises a Dirac delta function that sets the one or more signal values of the target cluster of oligonucleotides to one and sets additional signal values of the neighboring clusters of oligonucleotides to zero.
- CLAUSE 28 The computer-implemented method of clause 16, further comprising: identifying, from the image, a set of subregions within the region of the nucleotide-sample slide; generating, from the image matrix comprising equalizer coefficients, a set of subregion image matrices comprising subregion equalizer coefficients; and determining one or more additional base calls for additional target clusters of oligonucleotides within the set of subregions by applying the set of subregion image matrices to the image depicting one or more additional signal values for additional expected response signals from the additional target clusters of oligonucleotides within the set of subregions.
- CLAUSE 29 The computer-implemented method of clause 28, wherein the subregion equalizer coefficients from a subregion image matrix of the set of subregion image matrices initially match the equalizer coefficients of the image matrix.
- CLAUSE 30 The computer-implemented method of clause 16, further comprising: receiving, for the sequencing cycle, an additional image depicting one or more additional signal values for an additional expected response signal from the target cluster of oligonucleotides; determining, for an additional channel, additional estimated point-spread-function values and additional estimated noise values based on the one or more additional signal values corresponding to the target cluster of oligonucleotides; determining, based on combining the additional estimated point-spread-function values, the estimated cluster locations, and the additional estimated noise values within the additional channel, an additional image matrix comprising additional equalizer coefficients for the additional channel; and generating the base call for the target cluster of oligonucleotides by applying the additional image matrix to the additional image depicting the one or more additional signal values for the additional expected response signal from the target cluster of oligonucleotides.
- nucleic acid sequencing techniques can be used in conjunction with a variety of nucleic acid sequencing techniques. Particularly applicable techniques are those wherein nucleic acids are attached at fixed locations in an array such that their relative positions do not change and wherein the array is repeatedly imaged. Embodiments in which images are obtained in different color channels, for example, coinciding with different labels used to distinguish one nucleotide base type from another are particularly applicable.
- the process to determine the nucleotide sequence of a target nucleic acid i.e., a nucleic-acid polymer
- Preferred embodiments include sequencing-by-synthesis (SBS) techniques.
- SBS techniques generally involve the enzymatic extension of a nascent nucleic acid strand through the iterative addition of nucleotides against a template strand.
- a single nucleotide monomer may be provided to a target nucleotide in the presence of a polymerase in each delivery.
- more than one type of nucleotide monomer can be provided to a target nucleic acid in the presence of a polymerase in a delivery.
- the SBS techniques described below can utilize single-read sequencing or paired-end sequencing.
- single-rea sequencing the sequencing device reads a fragment from one end to another to generate the sequence of base pairs.
- paired-end sequencing the sequencing device begins at one read, finishes reading a specified read length in the same direction and begins another read from the opposite end of the fragment.
- SBS can utilize nucleotide monomers that have a terminator moiety or those that lack any terminator moieties.
- Methods utilizing nucleotide monomers lacking terminators include, for example, pyrosequencing and sequencing using y-phosphate-labeled nucleotides, as set forth in further detail below.
- the number of nucleotides added in each cycle is generally variable and dependent upon the template sequence and the mode of nucleotide delivery.
- the terminator can be effectively irreversible under the sequencing conditions used as is the case for traditional Sanger sequencing which utilizes dideoxynucleotides, or the terminator can be reversible as is the case for sequencing methods developed by Solexa (now Illumina, Inc.).
- SBS techniques can utilize nucleotide monomers that have a label moiety or those that lack a label moiety. Accordingly, incorporation events can be detected based on a characteristic of the label, such as fluorescence of the label; a characteristic of the nucleotide monomer such as molecular weight or charge; a byproduct of incorporation of the nucleotide, such as release of pyrophosphate; or the like.
- a characteristic of the label such as fluorescence of the label
- a characteristic of the nucleotide monomer such as molecular weight or charge
- a byproduct of incorporation of the nucleotide such as release of pyrophosphate; or the like.
- the different nucleotides can be distinguishable from each other, or alternatively, the two or more different labels can be the indistinguishable under the detection techniques being used.
- the different nucleotides present in a sequencing reagent can have different labels and they can be distinguished using appropriate optics as exemplified by
- Preferred embodiments include pyrosequencing techniques. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M. and Nyren, P. (1996) "Real-time DNA sequencing using detection of pyrophosphate release.” Analytical Biochemistry 242(1), 84-9; Ronaghi, M. (2001) "Pyrosequencing sheds light on DNA sequencing.” Genome Res. 11(1), 3-11; Ronaghi, M., Uhlen, M. and Nyren, P.
- PPi inorganic pyrophosphate
- the nucleic acids to be sequenced can be attached to features in an array and the array can be imaged to capture the chemiluminescent signals that are produced due to incorporation of a nucleotides at the features of the array.
- An image can be obtained after the array is treated with a particular nucleotide type (e.g., A, T, C or G). Images obtained after addition of each nucleotide type will differ with regard to which features in the array are detected. These differences in the image reflect the different sequence content of the features on the array. However, the relative locations of each feature will remain unchanged in the images.
- the images can be stored, processed and analyzed using the methods set forth herein. For example, images obtained after treatment of the array with each different nucleotide type can be handled in the same way as exemplified herein for images obtained from different detection channels for reversible terminatorbased sequencing methods.
- cycle sequencing is accomplished by stepwise addition of reversible terminator nucleotides containing, for example, a cleavable or photobleachable dye label as described, for example, in WO 04/018497 and U.S. Pat. No. 7,057,026, the disclosures of which are incorporated herein by reference.
- This approach is being commercialized by Solexa (now Illumina Inc.), and is also described in WO 91/06678 and WO 07/123,744, each of which is incorporated herein by reference.
- the availability of fluorescently labeled terminators in which both the termination can be reversed and the fluorescent label cleaved facilitates efficient cyclic reversible termination (CRT) sequencing.
- Polymerases can also be coengineered to efficiently incorporate and extend from these modified nucleotides.
- the labels do not substantially inhibit extension under SBS reaction conditions.
- the detection labels can be removable, for example, by cleavage or degradation. Images can be captured following incorporation of labels into arrayed nucleic acid features.
- each cycle involves simultaneous delivery of four different nucleotide types to the array and each nucleotide type has a spectrally distinct label. Four images can then be obtained, each using a detection channel that is selective for one of the four different labels. Alternatively, different nucleotide types can be added sequentially, and an image of the array can be obtained between each addition step.
- each image will show nucleic acid features that have incorporated nucleotides of a particular type. Different features will be present or absent in the different images due the different sequence content of each feature. However, the relative position of the features will remain unchanged in the images. Images obtained from such reversible terminator- SBS methods can be stored, processed and analyzed as set forth herein. Following the image capture step, labels can be removed and reversible terminator moieties can be removed for subsequent cycles of nucleotide addition and detection. Removal of the labels after they have been detected in a particular cycle and prior to a subsequent cycle can provide the advantage of reducing background signal and crosstalk between cycles. Examples of useful labels and removal methods are set forth below.
- nucleotide monomers can include reversible terminators.
- reversible terminators/cleavable fluors can include fluor linked to the ribose moiety via a 3' ester linkage (Metzker, Genome Res. 15:1767-1776 (2005), which is incorporated herein by reference).
- Other approaches have separated the terminator chemistry from the cleavage of the fluorescence label (Ruparel et al., Proc Natl Acad Sci USA 102: 5932-7 (2005), which is incorporated herein by reference in its entirety).
- Ruparel et al described the development of reversible terminators that used a small 3' allyl group to block extension, but could easily be deblocked by a short treatment with a palladium catalyst.
- the fluorophore was attached to the base via a photocleavable linker that could easily be cleaved by a 30 second exposure to long wavelength UV light.
- disulfide reduction or photocleavage can be used as a cleavable linker.
- Another approach to reversible termination is the use of natural termination that ensues after placement of a bulky dye on a dNTP.
- the presence of a charged bulky dye on the dNTP can act as an effective terminator through steric and/or electrostatic hindrance.
- Some embodiments can utilize detection of four different nucleotides using fewer than four different labels.
- SBS can be performed utilizing methods and systems described in the incorporated materials of U.S. Patent Application Publication No. 2013/0079232.
- a pair of nucleotide types can be detected at the same wavelength, but distinguished based on a difference in intensity for one member of the pair compared to the other, or based on a change to one member of the pair (e.g. via chemical modification, photochemical modification or physical modification) that causes apparent signal to appear or disappear compared to the signal detected for the other member of the pair.
- nucleotide types can be detected under particular conditions while a fourth nucleotide type lacks a label that is detectable under those conditions, or is minimally detected under those conditions (e.g., minimal detection due to background fluorescence, etc.). Incorporation of the first three nucleotide types into a nucleic acid can be determined based on presence of their respective signals and incorporation of the fourth nucleotide type into the nucleic acid can be determined based on absence or minimal detection of any signal.
- one nucleotide type can include label(s) that are detected in two different channels, whereas other nucleotide types are detected in no more than one of the channels.
- An exemplary embodiment that combines all three examples is a fluorescent-based SBS method that uses a first nucleotide type that is detected in a first channel (e.g. dATP having a label that is detected in the first channel when excited by a first excitation wavelength), a second nucleotide type that is detected in a second channel (e.g. dCTP having a label that is detected in the second channel when excited by a second excitation wavelength), a third nucleotide type that is detected in both the first and the second channel (e.g.
- dTTP having at least one label that is detected in both channels when excited by the first and/or second excitation wavelength
- a fourth nucleotide type that lacks a label that is not, or minimally, detected in either channel (e.g. dGTP having no label).
- sequencing data can be obtained using a single channel.
- the first nucleotide type is labeled but the label is removed after the first image is generated, and the second nucleotide type is labeled only after a first image is generated.
- the third nucleotide type retains its label in both the first and second images, and the fourth nucleotide type remains unlabeled in both images.
- Some embodiments can utilize sequencing by ligation techniques. Such techniques utilize DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides.
- the oligonucleotides typically have different labels that are correlated with the identity of a particular nucleotide in a sequence to which the oligonucleotides hybridize.
- images can be obtained following treatment of an array of nucleic acid features with the labeled sequencing reagents. Each image will show nucleic acid features that have incorporated labels of a particular type. Different features will be present or absent in the different images due the different sequence content of each feature, but the relative position of the features will remain unchanged in the images.
- Some embodiments can utilize nanopore sequencing (Deamer, D. W. & Akeson, M. "Nanopores and nucleic acids: prospects for ultrarapid sequencing.” Trends Biotechnol. 18, 147- 151 (2000); Deamer, D. and D. Branton, “Characterization of nucleic acids by nanopore analysis”. Acc. Chem. Res. 35:817-825 (2002); Li, J., M. Gershow, D. Stein, E. Brandin, and J. A. Golovchenko, "DNA molecules and configurations in a solid-state nanopore microscope” Nat. Mater. 2:611-615 (2003), the disclosures of which are incorporated herein by reference in their entireties).
- the target nucleic acid passes through a nanopore.
- the nanopore can be a synthetic pore or biological membrane protein, such as a-hemolysin.
- each base-pair can be identified by measuring fluctuations in the electrical conductance of the pore.
- Some embodiments can utilize methods involving the real-time monitoring of DNA polymerase activity.
- Nucleotide incorporations can be detected through fluorescence resonance energy transfer (FRET) interactions between a fluorophore-bearing polymerase and y-phosphate- labeled nucleotides as described, for example, in U.S. Pat. No. 7,329,492 and U.S. Pat. No. 7,211,414 (each of which is incorporated herein by reference) or nucleotide incorporations can be detected with zero-mode waveguides as described, for example, in U.S. Pat. No.
- FRET fluorescence resonance energy transfer
- the illumination can be restricted to a zeptoliter-scale volume around a surface-tethered polymerase such that incorporation of fluorescently labeled nucleotides can be observed with low background (Levene, M. J. et al. "Zero-mode waveguides for single-molecule analysis at high concentrations.” Science 299, 682-686 (2003); Lundquist, P. M. et al.
- Some SBS embodiments include detection of a proton released upon incorporation of a nucleotide into an extension product.
- sequencing based on detection of released protons can use an electrical detector and associated techniques that are commercially available from Ion Torrent (Guilford, CT, a Life Technologies subsidiary) or sequencing methods and systems described in US 2009/0026082 Al; US 2009/0127589 Al; US 2010/0137143 Al; or US 2010/0282617 Al, each of which is incorporated herein by reference.
- Methods set forth herein for amplifying target nucleic acids using kinetic exclusion can be readily applied to substrates used for detecting protons. More specifically, methods set forth herein can be used to produce clonal populations of amplicons that are used to detect protons.
- the above SBS methods can be advantageously carried out in multiplex formats such that multiple different target nucleic acids are manipulated simultaneously.
- different target nucleic acids can be treated in a common reaction vessel or on a surface of a particular substrate. This allows convenient delivery of sequencing reagents, removal of unreacted reagents and detection of incorporation events in a multiplex manner.
- the target nucleic acids can be in an array format. In an array format, the target nucleic acids can be typically bound to a surface in a spatially distinguishable manner.
- the target nucleic acids can be bound by direct covalent attachment, attachment to a bead or other particle or binding to a polymerase or other molecule that is attached to the surface.
- the array can include a single copy of a target nucleic acid at each site (also referred to as a feature) or multiple copies having the same sequence can be present at each site or feature. Multiple copies can be produced by amplification methods such as, bridge amplification or emulsion PCR as described in further detail below.
- the methods set forth herein can use arrays having features at any of a variety of densities including, for example, at least about 10 features/cm 2 , 100 features/cm 2 , 500 features/cm 2 , 1,000 features/cm 2 , 5,000 features/cm 2 , 10,000 features/cm 2 , 50,000 features/cm 2 , 100,000 features/cm 2 , 1,000,000 features/cm 2 , 5,000,000 features/cm 2 , or higher.
- an advantage of the methods set forth herein is that they provide for rapid and efficient detection of a plurality of target nucleic acid in parallel. Accordingly, the present disclosure provides integrated systems capable of preparing and detecting nucleic acids using techniques known in the art such as those exemplified above.
- an integrated system of the present disclosure can include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more immobilized DNA fragments, the system comprising components such as pumps, valves, reservoirs, fluidic lines and the like.
- a flow cell can be configured and/or used in an integrated system for detection of target nucleic acids. Exemplary flow cells are described, for example, in US 2010/0111768 Al and US Ser. No.
- one or more of the fluidic components of an integrated system can be used for an amplification method and for a detection method.
- one or more of the fluidic components of an integrated system can be used for an amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method such as those exemplified above.
- an integrated system can include separate fluidic systems to carry out amplification methods and to carry out detection methods.
- Examples of integrated sequencing systems that are capable of creating amplified nucleic acids and also determining the sequence of the nucleic acids include, without limitation, the MiSeqTM platform (Illumina, Inc., San Diego, CA) and devices described in US Ser. No. 13/273,666, which is incorporated herein by reference.
- sample and its derivatives, is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target.
- the sample comprises DNA, RNA, PNA, LNA, chimeric or hybrid forms of nucleic acids.
- the sample can include any biological, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids.
- the term also includes any isolated nucleic acid sample such a genomic DNA, fresh- frozen or formalin-fixed paraffin-embedded nucleic acid specimen.
- the sample can be from a single individual, a collection of nucleic acid samples from genetically related members, nucleic acid samples from genetically unrelated members, nucleic acid samples (matched) from a single individual such as a tumor sample and normal tissue sample, or sample from a single source that contains two distinct forms of genetic material such as maternal and fetal DNA obtained from a maternal subject, or the presence of contaminating bacterial DNA in a sample that contains plant or animal DNA.
- the source of nucleic acid material can include nucleic acids obtained from a newborn, for example as typically used for newborn screening.
- the nucleic acid sample can include high molecular weight material such as genomic DNA (gDNA).
- the sample can include low molecular weight material such as nucleic acid molecules obtained from FFPE or archived DNA samples.
- low molecular weight material includes enzymatically or mechanically fragmented DNA.
- the sample can include cell-free circulating DNA.
- the sample can include nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples.
- the sample can be an epidemiological, agricultural, forensic or pathogenic sample.
- the sample can include nucleic acid molecules obtained from an animal such as a human or mammalian source.
- the sample can include nucleic acid molecules obtained from a non-mammalian source such as a plant, bacteria, virus or fungus.
- the source of the nucleic acid molecules may be an archived or extinct sample or species.
- forensic samples can include nucleic acids obtained from a crime scene, nucleic acids obtained from a missing persons DNA database, nucleic acids obtained from a laboratory associated with a forensic investigation or include forensic samples obtained by law enforcement agencies, one or more military services or any such personnel.
- the nucleic acid sample may be a purified sample or a crude DNA containing lysate, for example derived from a buccal swab, paper, fabric or other substrate that may be impregnated with saliva, blood, or other bodily fluids.
- the nucleic acid sample may comprise low amounts of, or fragmented portions of DNA, such as genomic DNA.
- target sequences can be present in one or more bodily fluids including but not limited to, blood, sputum, plasma, semen, urine and serum.
- target sequences can be obtained from hair, skin, tissue samples, autopsy or remains of a victim.
- nucleic acids including one or more target sequences can be obtained from a deceased animal or human.
- target sequences can include nucleic acids obtained from non-human DNA such a microbial, plant or entomological DNA.
- target sequences or amplified target sequences are directed to purposes of human identification.
- the disclosure relates generally to methods for identifying characteristics of a forensic sample.
- the disclosure relates generally to human identification methods using one or more target specific primers disclosed herein or one or more target specific primers designed using the primer design criteria outlined herein.
- a forensic or human identification sample containing at least one target sequence can be amplified using any one or more of the target-specific primers disclosed herein or using the primer criteria outlined herein.
- the components of the blind-equalizer sequencing system 106 can include software, hardware, or both.
- the components of the blind-equalizer sequencing system 106 can include one or more instructions stored on a non-transitory computer readable storage medium and executable by processors of one or more computing devices (e.g., the user client device 108). When executed by the one or more processors, the computer-executable instructions of the blind-equalizer sequencing system 106 can cause the computing devices to perform the failure source identification methods described herein.
- the components of the blind-equalizer sequencing system 106 can comprise hardware, such as special purpose processing devices to perform a certain function or group of functions. Additionally, or alternatively, the components of the blind-equalizer sequencing system 106 can include a combination of computer-executable instructions and hardware.
- components of the blind-equalizer sequencing system 106 performing the functions described herein with respect to the blind-equalizer sequencing system 106 may, for example, be implemented as part of a stand-alone application, as a module of an application, as a plug-in for applications, as a library function or functions that may be called by other applications, and/or as a cloud-computing model.
- components of the blind-equalizer sequencing system 106 may be implemented as part of a stand-alone application on a personal computing device or a mobile device.
- the components of the blind-equalizer sequencing system 106 may be implemented in any application that provides sequencing services including, but not limited to Illumina BaseSpace, Illumina DRAGEN, or Illumina TruSight software. “Illumina,” “BaseSpace,” “DRAGEN,” and “TruSight,” are either registered trademarks or trademarks of Illumina, Inc. in the United States and/or other countries.
- Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below.
- Embodiments within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
- one or more of the processes described herein may be implemented at least in part as instructions embodied in a non- transitory computer readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein).
- a processor receives instructions, from a non-transitory computer readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.
- a non-transitory computer readable medium e.g., a memory, etc.
- Computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system.
- Computer-readable media that store computerexecutable instructions are non-transitory computer-readable storage media (devices).
- Computer- readable media that carry computer-executable instructions are transmission media.
- embodiments of the disclosure can comprise at least two distinctly different kinds of computer-readable media: non-transitory computer-readable storage media (devices) and transmission media.
- Non-transitory computer-readable storage media includes RAM, ROM, EEPROM, CD-ROM, solid state drives (SSDs) (e.g., based on RAM), Flash memory, phasechange memory (PCM), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
- SSDs solid state drives
- PCM phasechange memory
- a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
- a network or another communications connection can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer- readable media.
- program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to non-transitory computer-readable storage media (devices) (or vice versa).
- computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a NIC), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system.
- a network interface module e.g., a NIC
- non-transitory computer- readable storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
- Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
- computer-executable instructions are executed on a general-purpose computer to turn the general-purpose computer into a special purpose computer implementing elements of the disclosure.
- the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
- the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like.
- the disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
- program modules may be located in both local and remote memory storage devices.
- Embodiments of the present disclosure can also be implemented in cloud computing environments.
- “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources.
- cloud computing can be employed in the marketplace to offer ubiquitous and convenient on-demand access to the shared pool of configurable computing resources.
- the shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
- a cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth.
- a cloud-computing model can also expose various service models, such as, for example, Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (laaS).
- SaaS Software as a Service
- PaaS Platform as a Service
- laaS Infrastructure as a Service
- a cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth.
- a “cloud-computing environment” is an environment in which cloud computing is employed.
- FIG. 11 illustrates a block diagram of a computing device 1100 that may be configured to perform one or more of the processes described above.
- the computing device 1100 may implement the blind-equalizer sequencing system 106 and the sequencing system 104.
- the computing device 1100 can comprise a processor 1102, a memory 1104, a storage device 1106, an I/O interface 1108, and a communication interface 1110, which may be communicatively coupled by way of a communication infrastructure 1112.
- the computing device 1100 can include fewer or more components than those shown in FIG. 11. The following paragraphs describe components of the computing device 1100 shown in FIG. 11 in additional detail.
- the processor 1102 includes hardware for executing instructions, such as those making up a computer program.
- the processor 1102 may retrieve (or fetch) the instructions from an internal register, an internal cache, the memory 1104, or the storage device 1106 and decode and execute them.
- the memory 1104 may be a volatile or nonvolatile memory used for storing data, metadata, and programs for execution by the processor(s).
- the storage device 1106 includes storage, such as a hard disk, flash disk drive, or other digital storage device, for storing data or instructions for performing the methods described herein.
- the I/O interface 1108 allows a user to provide input to, receive output from, and otherwise transfer data to and receive data from computing device 1100.
- the I/O interface 1108 may include a mouse, a keypad or a keyboard, a touch screen, a camera, an optical scanner, network interface, modem, other known I/O devices or a combination of such I/O interfaces.
- the I/O interface 1108 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers.
- the I/O interface 1108 is configured to provide graphical data to a display for presentation to a user.
- the graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
- the communication interface 1110 can include hardware, software, or both. In any event, the communication interface 1110 can provide one or more interfaces for communication (such as, for example, packet-based communication) between the computing device 1100 and one or more other computing devices or networks. As an example, and not by way of limitation, the communication interface 1110 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI.
- NIC network interface controller
- WNIC wireless NIC
- the communication interface 1110 may facilitate communications with various types of wired or wireless networks.
- the communication interface 1110 may also facilitate communications using various communication protocols.
- the communication infrastructure 1112 may also include hardware, software, or both that couples components of the computing device 1100 to each other.
- the communication interface 1110 may use one or more networks and/or protocols to enable a plurality of computing devices connected by a particular infrastructure to communicate with each other to perform one or more aspects of the processes described herein.
- the sequencing process can allow a plurality of devices (e.g., a client device, sequencing device, and server device(s)) to exchange information such as sequencing data and error notifications.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Bioethics (AREA)
- Signal Processing (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
This disclosure describes embodiments of methods, systems, and non-transitory computer readable media that can quickly and accurately determine equalizer coefficients for an equalizer based on estimated point-spread-function values and estimated noise values derived from expected response signals of oligonucleotide clusters. For example, the disclosed systems can receive signal values for expected response signals from one or more clusters of oligonucleotides incorporating labeled nucleobases. Based on the signal values, the disclosed systems can determine estimated point-spread-function values for one or more such clusters of oligonucleotides and estimated noise values within a channel. From the estimated point-spread-function values and the estimated noise values, the disclosed systems can determine equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values. The disclosed systems can further determine a base call for one or more such clusters of oligonucleotides by utilizing the equalizer coefficients.
Description
BLIND EQUALIZATION SYSTEMS FOR BASE CALLING APPLICATIONS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/649,210, entitled, “BLIND EQUALIZATION SYSTEMS FOR BASE CALLING APPLICATIONS,” filed on May 17, 2024, which is incorporated herein by reference in its entirety.
BACKGROUND
[0002] In recent years, biotechnology firms and research institutions have improved hardware and software platforms used for determining a sequence of nucleobases in a sample. For instance, some existing sequencing instruments and sequencing-data-analysis software (together “existing sequencing systems”) determine individual nucleotide bases of nucleic-acid sequences by using conventional Sanger sequencing or by using sequencing-by-synthesis (SBS). When using SBS, existing sequencing systems can monitor millions to billions of nucleic-acid polymers being synthesized in parallel to detect more accurate nucleobase calls. For instance, a camera in SBS platforms can capture images of irradiated fluorescent tags from nucleobases incorporated into such synthesized nucleic-acid sequences often grouped into clusters of oligonucleotides (e.g., clusters within nanowells of a flow cell). After capturing the images, a sequencing device uses specialized software to determine nucleobases that were detected in a given image based on the light signal captured in the image data. By iteratively incorporating nucleobases into the clusters of oligonucleotides and capturing images of the emitted light signals in various sequencing cycles, existing sequencing systems can determine the sequence of nucleobases present in clusters and determine nucleotide reads for the samples.
[0003] During a sequencing run on a sequencing instrument, some existing sequencing systems utilize an equalizer to process and analyze received images. For example, some existing systems convert energy depicted in images of light signals of oligonucleotide clusters into intensity values by applying an equalizer to the images. Some existing systems generate intensity values for oligonucleotide clusters by applying coefficients or weights associated with the equalizer to pixels in the images depicting energy intensities from the respective oligonucleotide cluster. By iteratively training the equalizer from sequencing cycle to sequencing cycle on predicted base call data and batch updating of coefficients, existing systems commonly determine which coefficients to apply to image pixels.
[0004] Despite improving equalizer coefficients on a given sequencing instrument, some training approaches of existing sequencing systems face limitations regarding accuracy, flexibility, and efficiency of base calling. For example, during an initial sequencing cycle, some existing
sequencing instruments directly process signal values and generate an estimated response signal. Such existing sequencing instruments subsequently use the estimated response signal to approximate the response signals actually emitted by clusters of oligonucleotides — where the response signals actually emitted by such clusters constitute signals that have yet to have their energy dispersed by an imaging device as part of image captures. However, existing systems cannot yet confidently determine if the estimated response signal represents the response signal actually emitted by oligonucleotide clusters before an imaging device disperses the response signal’s energy. In cases where the response signal actually emitted by the cluster does not match the estimated response signal, existing sequencing instruments generate equalizer coefficients that do not accurately compensate for the effects of the imaging device on the estimated response signal. [0005] In part due to estimated response signals failing to match actually emitted response signals, existing sequencing systems that train equalizers — primarily or exclusively by determining differences or losses as a basis for updating equalizer coefficients based on a comparison of predicted base calls and assumed base calls — can rely on unverified base call training data. For example, in instances where the base call training data is incorrect or contains mistakes (e.g., includes low resolution images and/or clusters of oligonucleotides corrupted by high polyclonality), some existing sequencing systems (e.g., equalizers) are poorly calibrated, resulting in inaccurate base calls and limited throughput. This is especially true while training development platforms (e.g., sequencing instruments under development or beta testing) and ill-maintained systems because it is difficult to benchmark the performance of such development platforms. For example, during the early stages of platform development, the accurate coefficients for the equalizer are unknown.
[0006] In addition to fomenting some inaccurate base calls, existing sequencing systems can limit the utility or fitness of equalizer coefficients to specific sequencing instruments. Indeed, accurate coefficients of equalizers can vary from sequencing instrument to sequencing instrument due to the variation of point spread functions, temperatures, optic systems, reagents, etc. across different sequencing instruments (e.g., a NextSeq 1000/2000 ® instrument versus a NovaSeq X ® instrument). Thus, selecting a set of equalizer coefficients that accurately compensate for a point spread function (point-spread function) and noise in each sequencing instrument and/or configuration of a sequencing platform is impractical if not impossible for existing sequencing systems. Moreover, because existing sequencing instruments cannot quickly determine the accuracy of equalizer coefficients, it is difficult to identify if and how the equalizer coefficients are contributing to sub-optimal performance of the equalizer.
[0007] In addition to the accuracy and limited flexibility, existing sequencing systems can inefficiently consume biochemical reagents and computing resources as part of training and
calibrating an equalizer. As suggested above, some existing systems progressively improve equalizer accuracy by iteratively updating the coefficients of the equalizer after a set of sequencing cycles and/or after each sequencing run based on cross-cycle training. However, updating the coefficients to compensate for sub-optimal coefficients requires extra computing resources and takes up additional time. For example, in some existing systems, it might take several sequencing runs to identify accurate coefficients of the equalizer.
[0008] These, along with additional problems and issues exist in current sequencing systems.
SUMMARY
[0009] This disclosure describes embodiments of methods, non-transitory computer readable media, and systems that that can solve one or more of the foregoing (or other problems) in the art. To solve such problems, the disclosed systems can quickly and accurately determine equalizer coefficients for an equalizer based on estimated point-spread-function values and estimated noise values derived from expected response signals from oligonucleotide clusters. For example, the disclosed systems can receive signal values for expected response signals from one or more clusters of oligonucleotides incorporating labeled nucleobases. Based on the signal values, the disclosed systems can determine estimated point-spread-function values for one or more such clusters of oligonucleotides and estimated noise values within a channel. From the estimated point-spread- function values and the estimated noise values, the disclosed systems can determine equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values. The disclosed systems can further determine a base call for one or more such clusters of oligonucleotides by utilizing the equalizer coefficients.
[0010] The disclosed systems can utilize such equalizer coefficients for a variety of basecalling applications described further below. For example, the disclosed systems can more accurately determine response signals and their corresponding nucleobase calls for a given sequencing cycle by (i) initializing an equalizer with accurate equalizer coefficients and without batch updating coefficients based on predicted base calls and (ii) determining nucleobase calls for clusters of oligonucleotides based on corrected signal values adjusted in part by the equalizer coefficients.
[0011] Additional features and advantages of one or more embodiments of the present disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of such example embodiments.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The detailed description will describe various embodiments with additional specificity and detail through the use of the accompanying drawings, which are summarized below.
[0013] FIG. 1 illustrates an environment in which a blind-equalizer sequencing system can operate in accordance with one or more embodiments of the present disclosure.
[0014] FIG. 2 illustrates an overview diagram of the blind-equalizer sequencing system determining a base call for one or more clusters by determining and utilizing equalizer coefficients in accordance with one or more embodiments of the present disclosure.
[0015] FIGS. 3A-3B illustrate the blind-equalizer sequencing system generating an image of a region of a nucleotide-sample slide and the blind-equalizer sequencing system depicting response signals based on fluorescent responses in different channels in accordance with one or more embodiments of the present disclosure.
[0016] FIG. 4 illustrates a blind equalizer measuring signal values for a pixel depicting the cluster of oligonucleotides in accordance with one or more embodiments.
[0017] FIG. 5 illustrates blind-equalizer sequencing system determining equalizer coefficients for one or more clusters of oligonucleotides in accordance with one or more embodiments of the present disclosure.
[0018] FIG. 6 illustrates the blind-equalizer sequencing system converting an image depicting signals from oligonucleotide clusters within a region of a nucleotide-sample slide from a frequency domain to a spatial domain and further determining estimated point-spread-function values based on an image representation in the spatial domain in accordance with one or more embodiments of the present disclosure.
[0019] FIG. 7 illustrates the blind-equalizer sequencing system applying an image matrix comprising equalizer coefficients to images depicting signals in one or more channels from oligonucleotide clusters in accordance with one or more embodiments of the present disclosure.
[0020] FIG. 8 illustrates the blind-equalizer sequencing system applying a distribution function (e.g., Dirac delta function) to one or more signal values of a target cluster of oligonucleotides and signal values of neighboring clusters of oligonucleotides within a region of a nucleotide-sample slide in accordance with one or more embodiments of the present disclosure.
[0021] FIG. 9 illustrates the improved performance of the blind-equalizer sequencing system in terms of base-call-quality scores for nucleobase calls relative to a baseline sequencing system in accordance with one or more embodiments of the present disclosure.
[0022] FIGS. 10A-10B illustrate a series of acts for determining a base call for one or more clusters of oligonucleotides using equalizer coefficients in accordance with one or more embodiments of the present disclosure.
[0023] FIG. 11 illustrates a block diagram of an example computing device in accordance with one or more embodiments of the present disclosure.
DETAILED DESCRIPTION
[0024] The disclosure describes one or more embodiments of a blind-equalizer sequencing system that quickly determines equalizer coefficients for an equalizer based on estimated point- spread-function values and estimated noise values derived from expected response signals of oligonucleotide clusters — but without relying on predicted base calls to initially adjust the equalizer coefficients. By determining and utilizing the equalizer coefficients, the blind-equalizer sequencing system can more precisely initialize the equalizer and more accurately determine nucleobase calls. [0025] In some implementations, for instance, the blind-equalizer sequencing system receives signal values for an expected response signal in a sequencing cycle from one or more clusters of oligonucleotides. Based on the signal values, the blind-equalizer sequencing system determines estimated point-spread-function values corresponding to the expected response signal of one or more such clusters of oligonucleotides. In the same such sequencing cycle, the blind-equalizer sequencing system can determine estimated noise values for a given channel. By combining the estimated point-spread-function values and the estimated noise values, the blind-equalizer sequencing system can determine the equalizer coefficients. Such equalizer coefficients compensate for the estimated point-spread-function values and the estimated noise values with respect to the expected response signal from one or more such cluster of oligonucleotides. As discussed in more detail below, the blind-equalizer sequencing system compensates for the estimated point-spread-function values and the estimated noise values by (i) rectifying the effects of capturing the expected response signal with an imaging device and (ii) accounting for noise in the blind-equalizer sequencing system. The blind-equalizer sequencing system can utilize the equalizer coefficients to determine a base call accurately and quickly for one or more clusters of oligonucleotides.
[0026] As suggested above, in one or more embodiments, the blind-equalizer sequencing system can identify or receive, for a sequencing cycle, signal values (e.g., pixel intensity, wavelength, and/or brightness values) for an expected response signal emitted by at least a cluster of oligonucleotides within a region of a nucleotide-sample slide (e.g., flow cell). Such an expected or ideal response signal represents a response signal emitted from a cluster of oligonucleotides before an imaging device disperses energy of the expected response signal through an act of capturing an image of the nucleotide-sample-slide region to which the cluster is immobilized. In certain embodiments, the blind-equalizer sequencing system can receive the signal values by determining, for a given channel, signal values that would accurately represent pixel intensity or wavelengths within one or more images of the expected response signal from an oligonucleotide cluster in a given channel — before image capture by an imaging device disperses energy of the expected response signal.
[0027] Based on the signal values corresponding to at least a cluster of oligonucleotides, the blind-equalizer sequencing system can determine estimated point-spread-function values within a channel. As discussed in more detail below, capturing an image of the expected response signal for the cluster of oligonucleotides with an imaging device distorts the expected response signal by dispersing the energy or light of the expected response signal. In one or more embodiments, the blind-equalizer sequencing system can represent the dispersed light with a point-spread function (PSF). However, when the blind-equalizer sequencing system captures an image of the expected response signal, the blind-equalizer sequencing system also typically captures the noise in the channel. Thus, in one or more embodiments, the blind-equalizer sequencing system depicts the signal values of the expected response signal in a captured image. In certain cases, the image can include the point-spread function of the expected response signal and noise within the channel.
[0028] As discussed in more detail below, the blind-equalizer sequencing system can determine the estimated point-spread-function values by converting the signal values from a spatial domain to a frequency domain and processing the signal values in the frequency domain while enforcing certain constraints on the signal values in the frequency domain. In one or more embodiments, the blind-equalizer sequencing system can utilize physical characteristics of the blind-equalizer sequencing system to simplify how the blind-equalizer sequencing system determines estimated point-spread-function values. For example, the imaging device can represent a minimum response channel where the response signals are a set of real coefficients limited to specified area (e.g., number of pixels). In one or more embodiments, based on the features of the imaging device and an expected response signal, the blind-equalizer sequencing system can convert the signal values from a spatial domain into a frequency domain and measure the average energy (e.g., power spectral density) of the signal response in the frequency domain. Subsequently, in some cases, the blind-equalizer sequencing system can utilize the power spectral density along with Hermitian symmetry to generate a real, two-dimensional finite, symmetric, estimated point-spread- function values.
[0029] In addition to determining estimated point-spread-function values, in some implementations, the blind-equalizer sequencing system can determine estimated noise values within the channel. As discussed in more detail below, sequencing devices have varying levels of noise and different sources of noise in the sequencing devices. Moreover, the blind-equalizer sequencing system generates equalizer coefficients that account for noise in the sequencing device because — without accounting for such noise — an equalizer will generate equalizer coefficients based on inaccurate signal-to-noise ratios. In some cases, the blind-equalizer sequencing system can determine the estimated noise values by measuring noise in the channel where a response signal is not present. In certain implementations, the blind-equalizer sequencing system can apply
independent identically distributed (IID) Gaussian noise for each cluster of oligonucleotides within the region of the nucleotide-sample slide.
[0030] Having determined the estimated point-spread-function values and the estimated noise values, the blind-equalizer sequencing system can determine equalizer coefficients by combining the estimated point-spread-function values and the estimated noise values. In particular, the blindequalizer sequencing system can determine equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values with respect to the estimated response signal. For instance, during a sequencing cycle in which an expected response signal interacts with a camera or other imaging device capturing an image in a channel, the imaging device disperses energy from the expected response signal. In some embodiments, the blind-equalizer sequencing system utilizes an equalizer to generate equalizer coefficients that mitigate or reverse the dispersion effects that response signals from oligonucleotide clusters experience in a given channel. More specifically, the equalizer mitigates or reverses the effects of an imaging device dispersing the energy of the expected response signal by inverting and concentrating the energy from the expected response signal.
[0031] Based on the equalizer coefficients, the blind-equalizer sequencing system can determine a nucleobase call for one or more clusters. For example, in a same sequencing cycle or subsequent sequencing cycle, the blind-equalizer sequencing system can apply the equalizer coefficients to one or more signal values of an expected response signal from a cluster of oligonucleotides and determine accurate intensity values for the cluster of oligonucleotides. Based on the more accurate intensity values for the estimated response signal, the blind-equalizer sequencing system determines a nucleobase call for the cluster of oligonucleotides.
[0032] The blind-equalizer sequencing system provides several technical advantages over existing sequencing systems that primarily or exclusively train an equalizer across sequencing cycles. In particular, the blind-equalizer sequencing system can improve the accuracy of nucleobase calling, increase the efficiency of nucleobase calling, and improve the flexibility of sequencing systems. For example, the blind-equalizer sequencing system can improve the accuracy of a specialized or special-purpose computer — that is, a sequencing device — determining base calls for nucleobases incorporated into oligonucleotide clusters by estimating equalizer coefficients in a feedforward manner. Unlike existing sequencing systems that determine coefficients by relying on some inaccurate base calls and response signals output during sequencing cycles, the blindequalizer sequencing system can utilize the properties of a physical sequencing device and a particular channel to determine accurate equalizer coefficients. For example, as discussed below with respect to FIGS. 7-8, the blind-equalizer sequencing system can account for a pitch and/or pattern of the nucleotide-sample slide and pixels of the image to generate the equalizer coefficients.
By estimating point-spread-function values and noise values from expected response signals of oligonucleotide clusters — rather than primarily or exclusively adjusting coefficients by determining differences or losses based on a comparison of predicted base calls and assumed base calls — the blind-equalizer sequencing system can determine equalizer coefficients that are blind to (or not directly dependent on) the characteristics (e.g., signal intensity, noise, etc.) of the base calls and/or the predicted base calls for a given sequencing cycle. Consequently, the blind-equalizer sequencing system can use equalizer coefficients that are not compromised by inaccurate base calls to measure more accurate and/or purer response signals for clusters of oligonucleotides. Based on the more accurate or purer cluster signal, the blind-equalizer sequencing system can likewise determine more accurate nucleobase calls for nucleobases incorporated by one or more clusters of oligonucleotides during a biochemical reaction.
[0033] In addition to increasing the accuracy of base calls, the blind-equalizer sequencing system can improve the computational and reagent-use efficiency of base calling. For example, the blind-equalizer sequencing system can accurately determine equalizer coefficients without going through several iterations of updating the equalizer coefficients, as existing sequencing systems do, based on predicted base calls in a series of sequencing cycles. As mentioned above, some existing sequencing systems rely on inaccurate base calls when determining coefficients for the equalizer. To compensate for relying on inaccurate base calls, such existing systems further process and update the coefficients in additional sequencing cycles or sequencing runs. But such approaches unnecessarily utilize computing resources to store, in memory, predicted base calls and corresponding losses from training and insert latencies of sequencing cycles consuming biochemical reagents to find improved coefficients for the equalizer. Moreover, some existing systems cannot identify the degree of inaccuracy of the coefficients for the equalizer, thereby making it difficult for existing systems to efficiently converge on accurate coefficients. Unlike such existing systems, the blind-equalizer sequencing system can quickly determine accurate equalizer coefficients with low error rates — without going through several iterations to initialize relatively more accurate equalizer coefficients. Moreover, unlike some existing systems that determine coefficients for the equalizer by utilizing iterations of training data, the blind-equalizer sequencing system can determine equalizer coefficients based on attributes of the sequencing system.
[0034] On top of improved base calls and processing efficiency, the blind-equalizer sequencing system improves the flexibility of sequencing systems. For instance, the blind-equalizer sequencing system provides an agnostic instrument method for determining equalizer coefficients that is blind (or not dependent on) specific base calls or actual response signals of oligonucleotide clusters in a specific sequencing device. By determining signal values for expected response signals of oligonucleotide clusters — and estimating point-spread-function values and noise values from
such expected response signals — the blind-equalizer sequencing system can determine equalizer coefficients that are agnostic to a given sequencing device on which sequencing and base calling occurs. Regardless of the optical parameters, signal levels, temperature of a sequencing device, locations of clusters of oligonucleotides, etc., the blind-equalizer sequencing system can determine accurate equalizer coefficients that initialize point-spread-function and noise compensation of an equalizer on various sequencing devices.
[0035] As illustrated by the foregoing discussion, the present disclosure utilizes a variety of terms to describe features and advantages of the blind-equalizer sequencing system. Additional detail is now provided regarding the meaning of such terms. For example, as used herein, the term “nucleotide-sample slide” (or “nucleotide-sample substrate”) refers to a plate or substrate, such as a flow cell, comprising oligonucleotides for sequencing nucleotide sequences from genomic samples or other sample nucleic-acid polymers. In particular, a nucleotide-sample slide can refer to a substrate containing fluidic channels through which reagents and buffers can travel as part of sequencing. For example, in one or more embodiments, a flow cell (e.g., a patterned flow cell or non-pattemed flow cell) may comprise small fluidic channels and oligonucleotide samples that can be bound to adapter sequences on the substrate. In other implementations, a nucleotide-sample slide can be an open substrate with one or more regions for oligonucleotide samples to be analyzed and the oligonucleotide samples may be positioned using charged pads or other means. In yet another implementation, the nucleotide-sample slide can be a membrane having a nanopore through which one or more oligonucleotide samples may pass.
[0036] As used herein, a flow cell or other nucleotide-sample slide can (i) include a device having a lid extending over a reaction structure to form a flow channel therebetween that is in communication with a plurality of reaction sites of the reaction structure and (ii) include a detection device that is configured to detect designated reactions that occur at or proximate to the reaction sites. A flow cell or other nucleotide-sample slide may include a solid-state light detection or “imaging” device, such as a Charge-Coupled Device (CCD) or Complementary Metal-Oxide Semiconductor (CMOS) (light) detection device. As one specific example, a flow cell may be configured to fluidically and electrically couple to a cartridge (having an integrated pump), which may be configured to fluidically and/or electrically couple to a bioassay system. A cartridge and/or bioassay system may deliver a reaction solution to reaction sites of a flow cell according to a predetermined protocol (e.g., sequencing-by-synthesis), and perform a plurality of imaging events. For example, a cartridge and/or bioassay system may direct one or more reaction solutions through the flow channel of the flow cell, and thereby along the reaction sites. At least one of the reaction solutions may include four types of nucleobases having the same or different fluorescent labels. The nucleobases may bind to the reaction sites of the flow cell, such as to corresponding
oligonucleotides at the reaction sites. The cartridge and/or bioassay system may then illuminate the reaction sites using an excitation light source (e.g., solid-state light sources, such as lightemitting diodes (LEDs)). The excitation light may provide emission signals (e.g., light of a wavelength or wavelengths that differ from the excitation light and, potentially, each other) that may be detected by the light sensors of the flow cell.
[0037] Relatedly, as used herein, the term “region of a nucleotide-sample slide” (or “nucleotide-sample slide section”) refers to a section or part of a nucleotide-sample slide, such as a section of a surface of the nucleotide-sample slide. In particular, a region of a nucleotide-sample slide can refer to a discrete section of a nucleotide-sample slide that differs from other sections of the nucleotide-sample slide. For instance, a region of a nucleotide-sample slide can include a well (e.g., a nano- well) or wells of a patterned flow cell or a discrete subsection of a non -pattered flow cell (e.g., a subsection corresponding to a cluster). In some cases, a region of a nucleotide-sample slide includes a tile or a sub-tile having clusters of the same or similar oligonucleotide growing in parallel.
[0038] Additionally, as used herein, the term “labeled nucleobase” refers to a nucleobase having a fluorescent or light-based indicator or fluorescent dye indicator of the classification of the nucleobase. In particular, a labeled nucleobase can refer to a nucleobase that incorporates a fluorescent or light-based indicator or fluorescent dye indicator to identify the type of base (e.g., adenine, cytosine, thymine, or guanine). For example, in one or more embodiments, a labeled nucleobase includes a nucleobase having a fluorescent tag that emits a signal that either by itself or together with another fluorescent tag identifies the base type. Accordingly, a nucleobase may be identified by a mixture of dyes (or a mixture of fluorescent tags) that together indicate the nucleobase type (e.g., “ON”/ “ON” expected response signals and/or estimated response signals). Based on intensity values for a signal emitted by labeled nucleobases in a cluster of oligonucleotides, such as signals in 16 quadrature amplitude modulation (QAM) or pulse amplitude modulation (PAM) 4 format, the type of base (e.g., adenine, cytosine, thymine, or guanine) can be determined in certain embodiments of the crosstalk-aware-base-calling system.
[0039] Moreover, as used herein, the term “cluster of oligonucleotides” (or simply “cluster”) refers to a localized group or collection of DNA or RNA molecules on a nucleotide-sample slide, such as a flow cell, or other solid surface. In particular, a cluster includes tens, hundreds, thousands, or more copies of a cloned or the same DNA or RNA segment. For example, in one or more embodiments, a cluster includes a grouping of oligonucleotides immobilized in a section of a flow cell or other nucleotide-sample slide. In some embodiments, clusters are evenly spaced or organized in a systematic structure within a patterned flow cell. By contrast, in some cases, clusters are randomly organized within a non-pattemed flow cell. A cluster of oligonucleotides can be
imaged utilizing one or more light signals. For instance, an oligonucleotide-cluster image may be captured by a camera during a sequencing cycle of light emitted by irradiated fluorescent tags incorporated into oligonucleotides from one or more clusters on a flow cell.
[0040] As further used herein the term “response signal” refers to a signal emitted, reflected, or otherwise communicated from a labeled nucleobase or a group of labeled nucleobases (e.g., labeled nucleobases added to a cluster of oligonucleotides). In particular, a response signal can refer to a signal indicating the type of base. For example, a response signal can include a light signal emitted or reflected from a fluorescent tag of a nucleobase or fluorescent tags of multiple nucleobases incorporated into oligonucleotides. As indicated above, a nucleobase incorporated into a cluster may (in response to a laser) likewise emit a signal that can be identified as a mixture of dyes (or a mixture of fluorescent tags) that together indicate the nucleobase type (e.g., a cluster with “ON”/ “ON” illumination indicators). In some implementations, the blind-equalizer sequencing system triggers the response signal through an external stimulus, such as a laser or other light source. In some cases, blind-equalizer sequencing system triggers the response signal through some internal stimuli. Further, in some embodiments, the blind-equalizer sequencing system observes the response signal using a filter applied when capturing an image of the nucleotide- sample slide (e.g., section of the nucleotide-sample slide). As suggested above, in certain instances, a response signal includes an aggregate of the signals provided by each labeled nucleobase added to individual oligonucleotides in a cluster of oligonucleotides.
[0041] Relatedly, as used herein the term “expected response signal” refers to an expected signal emitted, reflected, or otherwise communicated from a labeled nucleobase or a group of labeled nucleobases (e.g., labeled nucleobases added to a cluster of oligonucleotides). In particular, an expected response signal can represent input data (e.g., sparse point sources, impulse responses) captured by a camera or other imaging device in one or more channels — before the camera or imaging device disperses the energy emitted by the labeled nucleobase(s). As explained further below, in some cases, an expected response signal is represented as a value or multiple values within a matrix, such as a matrix X.
[0042] As used herein, the term “estimated response signal” refers to a signal or indicator communicated from a labeled nucleobase or a group of labeled nucleobases (e.g., labeled nucleobases added to a cluster of oligonucleotides) transmitted, transferred, or otherwise relayed through an equalizer. For example, in some embodiments, the blind-equalizer sequencing system generates the estimated response signal applying the equalizer coefficients (e.g., weights) to pixel intensities. In some embodiments, the estimated response signal can be an output of the equalizer for any position and/or location of one or more clusters of oligonucleotides. In some cases, the estimated response signal mimics the expected response signal from one or more clusters of
oligonucleotides. Additionally, in some implementations, the blind-equalizer sequencing system can take on a binary format such as “0 or 1” or “ON or OFF.” In certain cases, the estimated response signal is a two-dimensional sparse matrix. As explained further below, in some cases, an estimated response signal is represented as a value or multiple values within a matrix, such as a matrix Z.
[0043] As used herein, the term “channel” refers to a range or filter of light, intensity, or color used to transmit, detect, and/or measure a response signal from a cluster of oligonucleotides. For example, the channel can include a particular range of light, intensity, or color of a laser used to illicit a fluorescent signal from fluorescent tags on nucleobases incorporated into oligonucleotides within a cluster. In some embodiments, the blind-equalizer sequencing system utilizes a two- channel implementation by, for instance, using two different ranges of light, intensities, or colors to illicit signals from clusters per sequencing cycle and capturing two corresponding images of a region of a nucleotide-sample slide per sequencing cycle. The first and second images can capture the intensity values of the emitted response signal from the clusters that correspond to first and second light ranges. In some cases, an equalizer corresponds to or is specific to a given channel. For example, a first channel corresponds to first equalizer and a second channel corresponds to second equalizer. In some embodiments, the blind-equalizer sequencing system can utilize a single channel implementation, three-channel implementation, or four-channel implementation. In one or more cases, the channel can take on a matrix form describing the estimated response signal in multiple channels.
[0044] As used herein, the term “signal value” refers to a value indicating the intensity and/or energy emitted, reflected, or otherwise communicated from an expected response signal. In particular, a signal value measures energy emitted by labeled nucleobases incorporated by a cluster of oligonucleotides — as indicated by both a point-spread function and noise within a channel and/or sequencing device. In some embodiments, the signal value can refer to a value and/or measurement associated with a color intensity (e.g., wavelength) or a light intensity (e.g., brightness) of one or more pixels from an image of one or more expected response signals in the channel. In some cases, the blind-equalizer sequencing system captures several images of one or more clusters of oligonucleotides with labeled nucleobases using different filters (or intensity channels). Thus, a signal value can correspond to the intensity of the expected response signal as observed through a particular filter.
[0045] As used herein, the term “transmission medium” refers to a system or substance that acts as a pathway for transmitting or communicating information. In some embodiments, the transmission medium is a camera or other imaging device that transmits an expected response signal for a cluster of oligonucleotides from a nucleotide-sample slide to an equalizer by capturing
an image of the expected response signal. Relatedly, the term “imaging device” refers to a device or sensor that detects, captures, and/or conveys information in the form of a visual image. For example, in some embodiments, a camera or other imaging device can capture an image of the expected response signal of one or more clusters emitted during SBS and transmit the image of the expected response signal to the equalizer. As mentioned above and discussed in more detail below, the transmission medium can distort and/or disperse the energy or light of response signals in a captured image.
[0046] As used herein, the term “point-spread function” (or simply “PSF”) refers to a function that describes a response of an imaging device or other optical system to a point source. In some embodiments, the point-spread function can measure the response — that is, a dispersion of energy from a response signal — caused by an imaging device capturing an image in a channel. For instance, in some cases, the point-spread function shows how light emitted from an input response signal blurs and/or spreads when the blind-equalizer sequencing system captures an image of the input response with an imaging device within a channel.
[0047] Relatedly, the term “estimated point-spread-function values” refer to estimated values representing the response of an imaging device (e.g., a form of transmission medium) to an expected response signal. For instance, the blind-equalizer sequencing system can use autocorrelation and physical aspects of sequencing device to determine the estimated point-spread- function values for the expected response signal in a given channel. As a further example, the blindequalizer sequencing system can convert the signal values of the expected response signal from a spatial domain to a frequency domain, and in some embodiments, based on the regular spacing of the expected response signals from one or more clusters of oligonucleotides, the blind-equalizer sequencing system can measure the signal values (e.g., signal energy) of one or more clusters of oligonucleotides in the frequency domain. Moreover, the blind-equalizer sequencing system can force the expected response signal to be real and symmetric by utilizing Hermitian symmetry in the frequency domain. As mentioned above, in one or more embodiments, the estimated point-spread- function values correspond to a specific channel. For instance, in some cases, first estimated point- spread-function values correspond to a first channel and a second estimated point-spread-function values correspond to a second channel. In certain implementations, the blind-equalizer sequencing system utilizes the first estimated point-spread-function values to determine equalizer coefficients for the first channel and the second estimated point-spread-function values to determine equalizer coefficients for the second channel.
[0048] As used herein, the term “estimated noise values” refers to an estimated amount of interference or distortion affecting a quality or quantification of a response signal (e.g., expected response signal). For example, in some embodiments, the estimated noise values can be estimated
background noise associated with a sequencing system. In some cases, the estimated noise values can be independent identically distributed (IID) Gaussian noise for each cluster of oligonucleotides within a region of a nucleotide-sample slide.
[0049] As used herein, the term “equalizer coefficients” refers to weights or values applied to an image of clusters of oligonucleotides that adjust for (or reduce) noise from adjacent clusters of oligonucleotides and/or the distortion and/or dispersion effects of a transmission medium. Accordingly, an equalizer coefficient can include a weighted value that, when applied to an image of oligonucleotide clusters, adjusts for inter-symbol interference (e.g., crosstalk) of one cluster of oligonucleotides on a target cluster of oligonucleotides. For example, in one or more embodiments, the equalizer coefficients apply weighted values to the image to measure one or more signal values of the target cluster of oligonucleotides while minimizing the signal values of the neighboring clusters of oligonucleotides. As a further example, the equalizer coefficients can represent an equalizer response that, when applied to one or more signal values, generates an estimated response signal (e.g., output) that mimics the estimated response signal (input) by mitigating or reversing the effects of an imaging device on the expected response signal while capturing an image of the expected response signal. In one or more embodiments, the blind-equalizer sequencing system utilizes an equalizer to apply equalizer coefficients to an image with pixels that represent one or more signal values from one or more clusters of oligonucleotides. As discussed in more detail below, in certain implementations, the equalizer coefficients when applied to pixels of the image, minimize the least mean square between the expected response signal and the estimated response signal for one or more clusters of oligonucleotides.
[0050] In one or more embodiments, equalizer coefficients can be pixel coefficients. As used herein, the term “pixel coefficients” refers to weighted coefficients that mix and/or combine one or more signal values of pixels that depict expected response signals from one or more clusters of oligonucleotides. For example, the blind-equalizer sequencing system can multiply signal values of one or more pixels with the pixel coefficients and calculate a weighted sum of the signal values of the pixels. The blind-equalizer sequencing system can use the weighted sum of the signal values to make a base call.
[0051] As further used herein, the term “image matrix” refers to a matrix that includes one or more values, such as equalizer coefficients, that adjust (or reduces) intensity data from an image for noise or distortion. Accordingly, an image matrix can include equalizer coefficients that that increase or maximize a signal-to-noise ratio of an expected response signal affected by noise and/or crosstalk. In some embodiments, blind-equalizer sequencing system can modify the intensity data (e.g., signal values) by applying the image matrix to the pixels depicting signal values of the
expected response signals. In one or more implementations, the image matrix constitutes an image mask that applies to pixel values in an image.
[0052] As used herein, the term “equalizer” refers to a model or system that can use a function to convert dispersed energy of a response signal into values representing one or more estimated response signals from one or more clusters of oligonucleotides and/or reduce noise that is part of such a response signal. In particular, an equalizer includes a model that converts received dispersed- over-pixels intensity energy (e.g., signal values) into an intensity value representing light emitted from a cluster and intensity values for adjacent clusters by linearly weighting pixel intensities and/or energy. For example, in some cases, the equalizer receives an input image and gathers signal values (e.g., light energy) across pixels in the image and converts the energy to an intensity value for one or more clusters during a sequencing cycle in a channel. In some embodiments, the equalizer utilizes equalizer coefficients (e.g., an image matrix comprising equalizer coefficients) to increase or maximize the signal-to-noise ratio of the intensity data by weighting signal values depicted in the image to determine a weighted sum of the signal values of the pixels. Accordingly, in one or more embodiments, the equalizer can combine the signal values from one or more clusters of oligonucleotides to increase or maximize one or more signal values of a target cluster of oligonucleotides and minimize the signal values (e.g., crosstalk) from adjacent clusters of oligonucleotides while accounting for the amplified noise in a sequencing device. In some embodiments, the equalizer is a linear equalizer that utilizes a linear filter that can be designed or optimized to filter out noise. In some embodiments, the linear filter can be applied to each cluster individually or across an entire image. In one or more embodiments, the equalizer can utilize different equalizer coefficients for different channels.
[0053] As used herein, the term “estimated cluster locations” refers to an approximated position of clusters of oligonucleotides and/or nanowell locations holding clusters of oligonucleotides on a nucleotide-sample slide. In one or more embodiments, the blind-equalizer sequencing system determines the estimated cluster locations based on the configuration of the nucleotide-sample slide. For example, in some implementations, the nucleotide-sample slide is patterned and distributes clusters of oligonucleotides across the nucleotide-sample slide according to a patterned arrangement. Alternatively, in one or more cases, the nucleotide-sample slide can be an unpattemed arrangement and randomly distribute clusters of oligonucleotides over the nucleotide-sample slide.
[0054] As used herein, the term “nucleobase call” (or simply “base call”) refers to a determination or prediction of a particular nucleobase (or nucleobase pair) for an oligonucleotide (e.g., nucleotide read) during a sequencing cycle or for a genomic coordinate of a sample genome. In particular, a nucleobase call can indicate a determination or prediction of the type of nucleobase
that has been incorporated within an oligonucleotide on a nucleotide-sample slide (e.g., read-based nucleobase calls). In some cases, for a nucleotide read, a nucleobase call includes a determination or a prediction of a nucleobase based on intensity values resulting from fluorescent-tagged nucleotides added to an oligonucleotide of a nucleotide-sample slide (e.g., in a cluster of a flow cell). As suggested above, a single nucleobase call can be an adenine (A) call, a cytosine (C) call, a guanine (G) call, a thymine (T) call, or an uracil (U) call.
[0055] Additionally, as used herein, the term “sequencing cycle” (or “cycle”) refers to an iteration of adding or incorporating a nucleobase to an oligonucleotide or an iteration of adding or incorporating nucleobases to oligonucleotides in parallel. In particular, a cycle can include an iteration of taking an analyzing one or more images with data indicating individual nucleobases added or incorporated into an oligonucleotide or to oligonucleotides in parallel. Accordingly, cycles can be repeated as part of sequencing a nucleic-acid polymer. For example, in one or more embodiments, each sequencing cycle involves either single reads in which DNA or RNA strands are read in only a single direction or paired-end reads in which DNA or RNA strands are read from both ends. Further, in certain cases, each sequencing cycle involves a camera taking an image of the nucleotide-sample slide or multiple sections of the nucleotide-sample slide to generate image data for determining a particular nucleobase added or incorporated into particular oligonucleotides. Following the image capture stage, a sequencing system can remove certain fluorescent labels from incorporated nucleobases and perform another sequencing cycle until the nucleic-acid polymer has been completely sequenced. In one or more embodiments, a sequencing cycle includes a cycle within a Sequencing By Synthesis (SBS) run.
[0056] Additional detail will now be provided regarding the blind-equalizer sequencing system in relation to illustrative figures portraying example embodiments and implementations of the blind-equalizer sequencing system. For example, FIG. 1 illustrates a schematic diagram of a computing system 100 in which a blind-equalizer sequencing system 106 operates in accordance with one or more embodiments. As illustrated, the computing system 100 includes one or more server device(s) 102 connected to a user client device 108 and a sequencing device 114 via a network 112. While FIG. 1 shows an embodiment of the blind-equalizer sequencing system 106, alternative embodiments and configurations are possible.
[0057] As further shown in FIG. 1, the server device(s) 102, the user client device 108, and the sequencing device 114 are connected via the network 112. Each of the components of the computing system 100 can communicate via the network 112. The network 112 comprises any suitable network over which computing devices can communicate. Example networks are discussed in additional detail below in relation to FIG. 11.
[0058] As further shown in FIG. 1, the computing system 100 includes the sequencing device 114. The sequencing device 114 comprises a device for sequencing a whole genome or other nucleic-acid polymer. In some embodiments, the sequencing device 114 analyzes samples to generate data utilizing computer implemented methods and systems described herein either directly or indirectly on the sequencing device 114. In one or more embodiments, the sequencing device 114 utilizes Sequencing By Synthesis (SBS) to sequence whole genomes or other nucleic-acid polymers. As shown, in some embodiments, the sequencing device 114 bypasses the network 112 and communicates directly with the user client device 108.
[0059] As further depicted by FIG. 1, the computing system 100 includes the server device(s) 102. The server device(s) 102 may generate, receive, analyze, store, receive, and transmit electronic data, such as data for sequencing nucleic-acid polymers. The server device(s) 102 may receive data from the sequencing device 114. For example, the server device(s) 102 may gather and/or receive sequencing data including nucleobase call data, quality data, and other data relevant to sequencing nucleic-acid polymers. The server device(s) 102 may also communicate with the user client device 108. In particular, the server device(s) 102 can send read data, nucleic-acid polymer sequences, error data, and other information to the user client device 108. In some embodiments, the server device(s) 102 comprise distributed servers, where the server device(s) 102 include a number of server devices distributed across the network 112 and located in different physical locations. The server device(s) 102 can comprise a content server, an application server, a communication server, a web-hosting server, or another type of server.
[0060] As further shown in FIG. 1, the server device(s) 102 can include a sequencing system 104. Generally, the sequencing system 104 analyzes sequencing data received from the sequencing device 114 to determine nucleotide sequences for whole genomic samples or other nucleic-acid polymers. For example, the sequencing system 104 can receive raw data (e.g., base-call data for nucleotide reads) from the sequencing device 114 and determine a nucleic acid sequence for a genomic sample. To illustrate, the sequencing system 104 can receive data for nucleotide reads from the sequencing device 114, and the sequencing system 104 generates variant calls (or other nucleobase calls) for a genomic sample from the nucleotide reads. In some embodiments, the sequencing system 104 determines the sequences of nucleobases in DNA and/or RNA.
[0061] As further illustrated in FIG. 1, the sequencing device 114 includes the blind-equalizer sequencing system 106. Generally, the blind-equalizer sequencing system 106 determines equalizer coefficients the compensate for the estimated point-spread-function values and estimated noise values that minimize a mean squared error or other measured difference between the expected response signal and the estimated response signal for at least a cluster of oligonucleotides. More specifically, in some embodiments, the blind-equalizer sequencing system 106 receives signal
values (e.g., intensity values) for at least a cluster of oligonucleotides in a given sequencing cycle. The blind-equalizer sequencing system 106 determines (i) an estimated point-spread-function values based on the signal values corresponding to one or more clusters of oligonucleotides and (ii) estimated noise values for a given channel. The blind-equalizer sequencing system 106 further determines equalizer coefficients by combining the estimated point-spread-function values and the estimated noise values within the channel with respect to one or more cluster of oligonucleotides. The blind-equalizer sequencing system 106 further determines a base call for one or more cluster of oligonucleotides by utilizing the equalizer coefficients.
[0062] The computing system 100 illustrated in FIG. 1 further includes the user client device 108. The user client device 108 can generate, store, receive, and send digital data. In particular, the user client device 108 can receive sequencing data from the sequencing device 114. Furthermore, the user client device 108 may communicate with the server device(s) 102 to receive nucleobase calls, nucleotide sequences, and variant call files. The user client device 108 can present sequencing data to a user associated with the user client device 108.
[0063] The user client device 108 illustrated in FIG. 1 may comprise various types of client devices. For example, in some embodiments, the user client device 108 includes non-mobile devices, such as desktop computers or servers, or other types of client devices. In yet other embodiments, the user client device 108 includes mobile devices, such as laptops, tablets, mobile telephones, smartphones, etc. Additional details with regard to the user client device 108 are discussed below with respect to FIG. 11.
[0064] As further illustrated in FIG. 1, the user client device 108 includes a sequencing application 110. The sequencing application 110 may be a web application or a native application on the user client device 108 (e.g., a mobile application, desktop application, etc.). The sequencing application 110 can comprise instructions that (when executed) cause the user client device 108 to receive or request data from the blind-equalizer sequencing system 106 and present sequencing data. Furthermore, the sequencing application 110 can comprise instructions that (when executed) cause the user client device 108 to provide a graphical visualization of a read pileup or read alignment for nucleotide reads for a genomic sample.
[0065] As further illustrated in FIG. 1, the blind-equalizer sequencing system 106 may be located on the user client device 108 as part of the sequencing application 110. As illustrated, in some embodiments, the blind-equalizer sequencing system 106 is implemented by (e.g., located entirely or in part on) the user client device 108. In yet other embodiments, the blind-equalizer sequencing system 106 is implemented by one or more other components of the computing system 100. In particular, the blind-equalizer sequencing system 106 can be implemented in a variety of different ways across the server device(s) 102, the user client device 108, and the sequencing device
114. In one example, the blind-equalizer sequencing system 106 is located in part on the sequencing device 114 and also the server device(s) 102. In particular, the blind-equalizer sequencing system 106 can determine equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values on the sequencing device 114 and make a base call for at least the cluster of oligonucleotides utilizing the equalizer coefficients as part of the server device(s) 102.
[0066] Though FIG. 1 illustrates the components of computing system 100 communicating via the network 112, in some embodiments, the components of computing system 100 communicate directly with each other, bypassing the network. For instance, and as previously mentioned, the user client device 108 can communicate directly with the sequencing device 114. Additionally, the user client device 108 can communicate directly with the blind-equalizer sequencing system 106, bypassing the network 112. Moreover, the blind-equalizer sequencing system 106 can access one or more databases housed on the server device(s) 102 or elsewhere in the computing system 100.
[0067] The following paragraphs provide further details concerning the blind-equalizer sequencing system 106. In accordance with one or more embodiments, FIG. 2 depicts an overview of the blind-equalizer sequencing system 106 generating equalizer coefficients and determining a base call for one or more clusters of oligonucleotides utilizing the equalizer coefficients. As an overview of FIG. 2, the blind-equalizer sequencing system 106 performs a series of acts that includes an act 202 of receiving signal values for an expected response signal from one or more clusters of oligonucleotides, an act 204 of determining estimated point-spread-function values, an act 206 of determining estimated noise values, an act 208 of determining equalizer coefficients, and an act 210 of determining a base call utilizing the equalizer coefficients.
[0068] As just mentioned, FIG. 2 illustrates the act 202 of receiving signal values for an expected response signal from one or more clusters of oligonucleotides. In some embodiments, the blind-equalizer sequencing system 106 may receive signal values for an expected response signal by capturing expected response signals for one or more clusters of oligonucleotides that the blindequalizer sequencing system 106 excites through laser (e.g., light). For example, during a sequencing cycle, the blind-equalizer sequencing system 106 can direct a light source with a specified wavelength at a nucleotide-sample slide (or portion of the nucleotide-sample slide) and capture with a camera or other imaging device an image of the clusters within the nucleotide-sample slide emitting an expected response signal. In some embodiments, the blind-equalizer sequencing system 106 captures multiple images of clusters emitting expected response signals. For instance, the blind-equalizer sequencing system 106 can capture multiple images using various filters or imaging devices. To illustrate, in some embodiments, the blind-equalizer sequencing system 106 utilizes a two-channel implementation by capturing two images of a section of the nucleotide-
sample slide per sequencing cycle. In particular, the blind-equalizer sequencing system 106 captures a first image using a first filter and captures a second image using a second filter. The first and second images can capture the intensity of the emitted signal from one or more clusters that correspond to the filter.
[0069] In the alternative to two-channel sequencing, the blind-equalizer sequencing system 106 can implement sequencing runs using other channel -based approaches. In some implementations, the blind-equalizer sequencing system 106 utilizes a four-channel implementation and captures four different images of the section of the flow cell. Similar to the two-channel implementation, the blind-equalizer sequencing system 106 can capture each image for the four-channel implementation using a different filter. Each image can capture an intensity of the emitted signal (e.g., estimated response signal) based on the filter used for that image. Thus, in some cases, each of the four images depicts the emitted signal with a different intensity. Additionally, the blind-equalizer sequencing system 106 can utilize a three-channel implementation and capture three images of the section of the nucleotide-sample slide and using a specific filter capture the intensity of the emitted signal.
[0070] As further indicated in FIG. 2, the blind-equalizer sequencing system can perform an act 204 of determining estimated point-spread-function values. As indicated above, the estimated point-spread-function values estimate how the camera or imaging device distorts and/or disperses the expected response signal while capturing an image of the cluster of oligonucleotides emitting light in a given channel. In some embodiments, based on the captured image depicting signal values corresponding to one or more clusters, the blind-equalizer sequencing system can determine the estimated point-spread-function values by leveraging characteristics of the imaging device, sequencing device, and/or the expected response signal. For example, in some embodiments, the blind-equalizer sequencing system 106 can assume that the clusters of oligonucleotides depicted in a captured image are centered on pixels and regularly spaced according to the layout of the nucleotide-sample slide by combining the captured image with the arrangement of the nucleotide- sample slide as discussed in more detail below. In some cases, since the blind-equalizer sequencing system depicts the signal values in a captured image, the blind-equalizer sequencing system 106 receives the signal values in a spatial domain (e.g., two-dimensional matrix depicting the intensity of pixels in an image). As discussed in more detail below, in one or more embodiments, the blindequalizer sequencing system 106 can analyze the signal values in the captured images by converting the image from the spatial domain to a frequency domain which can be alternate representation of the signal values of the expected response signal. In certain implementations, the blind-equalizer sequencing system can determine a power spectral density for the signal values within the frequency domain and determine the estimated point-spread-function values by converting the
power spectral density from the frequency domain to the spatial domain with an inverse fast Fourier transformation.
[0071] As further shown in FIG. 2, the blind-equalizer sequencing system can perform an act 206 of determining estimated noise values. For example, in some embodiments, the blind-equalizer sequencing system 106 can measure the power of the expected response signal at the comers of the power spectral density grid. Based on the measured power, the blind-equalizer sequencing system 106 can determine the estimated noise values. In some cases, the estimated noise values include independent identically distributed (IID) Gaussian noise, and the blind-equalizer sequencing system can use the IID Gaussian noise as the estimated noise values.
[0072] As further shown in FIG. 2, after determining the estimated point-spread-function values and the estimated noise values, the blind-equalizer sequencing system can perform an act 208 of determining equalizer coefficients. For example, the blind-equalizer sequencing system 106 can determine the equalizer coefficients based on combining the estimated point-spread-function values and the estimated noise values of the given channel. As described further below, in some embodiments, the blind-equalizer sequencing system 106 determines the equalizer coefficients based on an assumption that the expected response signal equals the estimated response signal. For example, based on the expected response signal equaling the estimated response signal, the blindequalizer sequencing system 106 can determine equalizer coefficients that minimize the mean squared error between the expected response signal (e.g., system input) combined with the estimated point-spread-function values and estimated noise values and the estimated response signal (e.g., system output).
[0073] After determining the equalizer coefficients, blind-equalizer sequencing system 106 performs an act 210 of determining a base call utilizing the equalizer coefficients. For example, the blind-equalizer sequencing system 106 can apply the equalizer coefficients to the signal values depicted by pixels in an image of the region of the nucleotide-sample slide and generate an estimated response signal (e.g., output) that conveys more accurate intensity values by reducing the amount of crosstalk from adjacent clusters of oligonucleotides. Based on intensity values of one or more clusters of oligonucleotides having less inter-cluster interference, the blind-equalizer sequencing system 106 can make a more accurate nucleobase call for one or more clusters of oligonucleotides.
[0074] As indicated above, the blind-equalizer sequencing system 106 can determine base calls and corresponding response signals for a cluster of oligonucleotides. In accordance with one or more embodiments, FIGS. 3A-3B shows the blind-equalizer sequencing system 106 capturing and image of expected response signals and determining a nucleobase call based on the estimated response signals for a cluster of oligonucleotides in different channels for a given sequencing cycle.
[0075] As mentioned above, an estimated response signal and expected response signal indicates whether and/or to what degree a cluster provides a fluorescent response in a given channel during sequencing. However, as mentioned above, when the blind-equalizer sequencing system 106 captures an image of the expected response signal with a camera or other imaging device, the camera or imaging device disperses the light emitted by the expected response signal. FIG. 3 A describes how the blind-equalizer sequencing system 106 captures an image of the expected response signal and how the camera or imaging system affects the expected response signal in accordance with one or more embodiments.
[0076] As shown in FIG. 3 A, the blind-equalizer sequencing system 106 can include a nucleotide-sample slide 306 with one or more clusters of oligonucleotides 308. As discussed above, in some embodiments, the blind-equalizer sequencing system 106 can generate an expected response signal 310 by exciting a fluorescent tag with a laser. As further shown in FIG. 3 A, the blind-equalizer sequencing system 106 can transmit or communicate the expected response signal 310 from the nucleotide-sample slide 306 to an equalizer by capturing an image 312 of the expected response signal 310 with a camera 302 or other imaging device. Generally, when the blindequalizer sequencing system 106 transmits the expected response signal 310 with the camera 302, the camera 302 distorts the expected response signal 310 by dispersing the energy or light of the expected response signal 310. As indicated above, the equalizer aims to generate an accurate representation (e.g., estimated response signal) of the expected response signal 310 by undoing the distorting effects of the camera 302 while accounting for noise in the sequencing device. Generally, the blind-equalizer sequencing system 106 and equalizer undo the distortion effects of the camera 302 by identifying characteristics of the camera 302 and generating a mathematical model (e.g., matrix model) that represents the characteristics of the camera 302.
[0077] In many instances, identifying the characteristics of the camera 302 and generating a matrix model for the camera 302 can be complex, time consuming, and computationally intensive. However, the blind-equalizer sequencing system 106 circumvents these issues by simplifying the characteristics of the camera 302 and the expected response signal 310 (e.g., a transmitted, expected response signal). For example, the blind-equalizer sequencing system 106 can represent the expected response signal 310 for the cluster of oligonucleotides 304 as binary decisions (e.g., the cluster is either on or off). As discussed in more detail below, by representing expected response signals (e.g., including the expected response signal 310) for one or more clusters of oligonucleotides in a single channel as a 0 or 1, the blind-equalizer sequencing system 106 can generate a two-dimensional matrix representing the expected response signals of clusters of oligonucleotides within a region of the nucleotide-sample slide. In certain implementations, the blind-equalizer sequencing system 106 can utilize the two-dimensional matrix to reduce the
interference between adjacent clusters and determine the estimated point-spread-function values. For instance, in some embodiments each expected response signal 310 for each cluster of the one or more clusters of oligonucleotides 308 in image 312 can have a corresponding PSF. Moreover, in one or more embodiments, the PSF of neighboring clusters overlaps. As another example, the blind-equalizer sequencing system 106 can simplify the characteristics of the camera 302 by representing the expected response signal 310 as finite and real. In other words, the blind-equalizer sequencing system 106 can generate an image 312 where the area of the expected response signal 310 is limited to a certain number of pixels.
[0078] By limiting the area of the expected response signal 310, the blind-equalizer sequencing system 106 can identify a characteristic of the camera 302. More specifically, the blind-equalizer sequencing system 106 can determine that the camera 302 is a minimum phase channel. For example, based on a minimum phase channel, the blind-equalizer sequencing system 106 can determine that the channel has causal and stable characteristics that make the channel’s inverse system unique (e.g., by applying a multiplicative inverse operation) that can be used to estimate channel-specific equalizer coefficients for an image depicting a region of a nucleotide-sample slide. After determining estimated point-spread-function values, the blind-equalizer sequencing system 106 can determine such equalizer coefficients based on estimated point-spread-function values, estimated noise values, and estimated cluster locations. For instance, the blind-equalizer sequencing system 106 can utilize the unique causality and stability characteristics of the channel — and combine sampled values of the estimated point-spread-function values and estimated noise values by applying a distribution function and/or a multiplicative inverse operation — to determine equalizer coefficients. Further detail regarding the blind-equalizer sequencing system 106 applying such a distribution function (e.g., Delta distribution function) and/or a multiplicative inverse operation (e.g., inv) in the context of determining equalizer coefficients is described below with regard to FIG. 8.
[0079] As mentioned above, the blind-equalizer sequencing system 106 can determine if the estimated response signal on or off. FIG. 3B shows the on/off status of sets of estimated response signals and expected response signals in two different intensity channels for a cluster of oligonucleotides corresponding a particular type of nucleotide base in accordance with one or more embodiments. To illustrate, FIG. 3B depicts light intensity in a particular frequency (e.g., frequency band) emitting or not emitting from the cluster of oligonucleotides 322 in cropped images shown in rows alongside nucleobase calls of adenine (A) 328, cytosine (C) 320, thymine (T) 332, and guanine (G) 334.
[0080] For example, as shown in FIG. 3B, when making a nucleobase call of adenine (A) 328 for the cluster of oligonucleotides 322, the blind-equalizer sequencing system 106 determines that
the expected response signals and the estimated response signals indicate that the cluster of oligonucleotides 322 is “on” (e.g., illuminated or emits light intensity in a particular frequency) in both a first channel captured by a first-channel image 324 and a second channel captured by a second-channel image 326. When making anucleobase call of a cytosine (C) 330 for the cluster of oligonucleotides 322, by contrast, the blind-equalizer sequencing system 106 determines an expected response signal and/or the estimated response signal of the cluster of oligonucleotides 322 is “on” in the first channel captured by the first-channel image 324 and “off’ (e.g., not illuminated or not emitting light intensity in a particular frequency) in the second channel captured by the second-channel image 326. When making a nucleobase call of a thymine (T) 332 for the cluster of oligonucleotides 322, the blind-equalizer sequencing system 106 determines expected response signal and/or the estimated response signal indicating that the cluster of oligonucleotides 322 is “off’ in the first channel captured by the first-channel image 324 and “on” in the second channel captured by the second-channel image 326. Finally, when making a nucleobase call of a guanine (G) 314 for the cluster of oligonucleotides 322, the blind-equalizer sequencing system 106 determines the expected response signal and/or the estimated response signal indicating that the cluster of oligonucleotides 322 is “off’ in both the first channel captured by the first-channel image and the second channel captured by the second-channel image 326.
[0081] As previously discussed, the illumination status (e.g., on/active/detectable or off/inactive/undetectable status) of the expected response signal and/or the estimated response signal can take a couplet form or continuous form. In some instances, the illumination status of the expected response signal and/or the estimated response signal can be represented as an illumination indicator. For instance, if an illumination indicator is “on” (and emits light intensity in a particular frequency) in the intensity channel during sequencing, the “on” status can be represented by a 1. Conversely, if the illumination indicator is “off’ (and does not emit detectable light intensity in a particular frequency) in the intensity channel during sequencing, the “off’ status can be represented by a 0.
[0082] As discussed above, in one or more embodiments, the blind-equalizer sequencing system 106 can utilize signal values to determine equalizer coefficients. FIG. 4 illustrates a model for measuring signal values from an expected response signal in a given channel in accordance with one or more embodiments. As discussed above, in some embodiments, the blind-equalizer sequencing system 106 can excite a fluorescent tag that emits an expected response signal by directing a laser (e.g., light) at clusters of oligonucleotides within a region of the nucleotide-sample slide. As shown in FIG. 4, the blind-equalizer sequencing system 106 can perform the act of measuring signal values. As previously mentioned, the expected response signal 404 can be an input into a camera or other imaging device indicating the on or off status of the cluster of
oligonucleotides in a channel. In one or more embodiments the expected response signal 404 can be sparse point sources (e.g., impulse responses) that represent the locations and/or positions of clusters of oligonucleotides (e.g., wells) on the nucleotide-sample slide. In some cases, the expected response signals 404 can be binary signals, such as “on” or “off’ (or alternatively 1 or 0) in a particular channel. In some embodiments, the expected response signal 404 of one or more clusters are regularly spaced based on the layout of the nucleotide-sample slide.
[0083] As indicated above, the blind-equalizer sequencing system 106 can input the expected response signal 404 into a camera 406 by capturing an image of the expected response signal 404. As discussed above, in some embodiments, the point-spread function 408 can depict the response of the camera 406 on the expected response signal 404 in an image. For instance, in one or more implementations, the response of the camera 406 disperses the expected response signal 404 so that it is no longer a sparse point or impulse but a point-spread function 408. As discussed above, in certain embodiments, the expected response signals 404 can be binary values representing the illumination of the expected response signal 404. As described in more detail below, in some cases, the blind-equalizer sequencing system 106 can generate a matrix of the expected response signals 404 of one or more clusters by utilizing a distribution function. For example, the blind-equalizer sequencing system 106 can generate a grid (e.g., matrix) of the locations of the cluster of oligonucleotides and set the signal values of the expected response signals of clusters of oligonucleotides to one or zero.
[0084] As further shown in FIG. 4, the blind-equalizer sequencing system can add noise values 410 to the point-spread function 408. For instance, as described above, when the equalizer directly inverts the point-spread function 408 without considering the noise values 410 in the channel, the blind-equalizer sequencing system 106 can amplify the noise in the channel and generate inaccurate intensity values for a cluster of oligonucleotides leading to inaccurate base calls. Thus, in one or more embodiments, the blind-equalizer sequencing system can account for noise in the system by adding noise values 410 to the convolution of the point-spread function 408 and expected response signal 404.
[0085] As further shown in FIG. 4, the blind-equalizer sequencing system 106 can combine the point-spread function 408 and the noise values 410 to generate the signal values 412. In some embodiments, the blind-equalizer sequencing system 106 can combine the point-spread function 408 and the noise values 410 for according to a system model for measuring signal values 414 modeled as: Yxl,yl = ^y-psf lx_psf hxpsf,ypsf xx1-xpsf,yl-ypsf +Vxi,yi- In the aforementioned function, fyl yl represents the signal values (e.g., pixel intensities) for a given pixel
hXpsf>ypsf represents the response (e.g., point-spread function) of the channel, Xxi-xpsf,yi-ypsf
represents the expected response signal, and vxl yl represents noise values in the system. The system model for measuring the signal values 414 can show that the signal values (e.g., pixel intensities) at a given pixel is the two-dimensional convolution of the point-spread function 408 and the expected response signal 404 with noise values 410.
[0086] In some embodiments, the aforementioned equation can be re-written or re-organized. As shown in FIG. 4, the blind-equalizer sequencing system can utilize a system model for measuring signal values in matrix form 416. For example, the system model for measuring signal values in matrix form 416 can be modeled as Y = H ■ X + V, where Y represents signal values as a matrix of pixel intensities, H represents a matrix that contains a point-spread function for the expected response signal 404 , X represents the expected response signal 404 in a matrix format (e.g., sparse point matrix), and V represents noise values 410 in a matrix format.
[0087] In one or more embodiments, the blind-equalizer sequencing system 106 can capture one or more images of the expected response signal 404 in the channel where pixels of the image depict signal values 412 of the expected response signal 404 combined with noise values 410. As described in more detail below, in one or more cases, the blind-equalizer sequencing system can utilize the image to determine equalizer coefficients.
[0088] As indicated above, the blind-equalizer sequencing system can receive signal values from a captured image comprising signal values for at least a cluster of oligonucleotides and further invert estimated point-spread-function values and estimated noise values — corresponding to the signal values — together to determine equalizer coefficients. In accordance with one or more embodiments, FIG. 5 illustrates the blind-equalizer sequencing system determining equalizer coefficients. As shown in FIG. 5, the blind-equalizer sequencing system 106 can input an expected response signal 504 for a cluster of oligonucleotides into an imaging device 506 and generate signal values 512 by combining estimated point-spread-function values 508 and estimated noise values 510. As described in more detail below in FIG. 6, the blind-equalizer sequencing system can determine the estimated point-spread-function values 508 by leveraging physical characteristics of the system.
[0089] As shown in FIG. 5, the blind-equalizer sequencing system 106 can determine estimated noise values 510. As discussed above, the blind-equalizer sequencing system can determine more accurate base calls by accounting for noise in the sequencing device (e.g., the sequencing device 114). For instance, in some cases, noise from the sequencing device can originate from the method of illuminating the clusters of oligonucleotides, the sensor in the optical system, DC offset, spatial crosstalk, etc. In some cases, the blind-equalizer sequencing system 106 can account for one or more sources of noise by determining estimated noise values 510. For example, in one or more embodiments, the blind-equalizer sequencing system 106 determines the
estimated noise values 510 by applying an independent identically distributed (IID) Gaussian noise. As used herein the term, “independent identically distributed Gaussian noise” refers to random signal disturbances (e.g., noise) that are statistically unrelated and identically distributed along a Gaussian (e.g., bell-shaped) distribution. Alternatively, in some embodiments, the blind-equalizer sequencing system 106can determine the estimated noise values 510 by converting the signal values 512 from a spatial domain to a frequency domain and measuring a band within the frequency domain that does not have a signal.
[0090] As further shown in FIG. 5, the blind-equalizer sequencing system 106 can receive or determine signal values 512 for the expected response signal 504. In one or more embodiments, the signal values 512 correspond to the estimated point-spread-function values 508 combined with the expected response signal 504 summed with the estimated noise values 510. In one or more embodiments, the blind-equalizer sequencing system 106 can combine the estimated point-spread- function value 508 and the expected response signal 504 by performing a two-dimensional convolution of the estimated point-spread-function values 508 with the expected response signal 504.
[0091] Moreover, in certain implementations, the blind-equalizer sequencing system 106 can simplify the analysis of the signal values and expected response signal 504 by assuming that the pixels in the captured image align with the center of the estimated point-spread-function values 508. In some embodiments, where the pixel does not align with the center of the estimated point- spread-function values 508, the blind-equalizer sequencing system 106 can utilize one or more interpolation methods to determine signal values between pixel centers and/or align the center of the estimated point-spread-function values 508 with a pixel.
[0092] As further shown in FIG. 5, the blind-equalizer sequencing system 106 can receive the signal values for the estimated response signal 518 from one or more clusters of oligonucleotides within a region of the nucleotide-sample slide. As discussed above, generally, existing systems can apply an equalizer to the signal values and generate an estimated response signal 518 or output from the equalizer. In some cases, existing system models for an equalizer can be modeled as:
= !yeq lxeq WXeq,yeq Yxl-Xeq,yl-yeq . In the existing system model, Zxl,yl represents an output response signal, wXeq, yeq represents weighting coefficients applied pixels depicting the signal values 512,
represents signal values 512, and vxl yl represents estimated noise values 510. Generally, the existing system model shows how the equalizer 514 can apply a weighting to each received pixel and generate an estimated response signal 518 by summing the weighted pixels.
[0093] As discussed above, some existing sequencing instruments that employ the existing system model are prone to inaccuracies and inefficiencies. However, in some embodiments, the blind-equalizer sequencing system 106 can utilize the existing system model to determine more accurate equalizer coefficients. For example, in one or more implementations, the blind-equalizer sequencing system 106 can limit the signal values to certain number of pixels around the cluster of oligonucleotides. The limited system model can be modeled as Zxl,yl =
represents an estimated response signal 518, wXe? .^represents weighting coefficients applied pixels depicting the signal values, and x'L-xeq,y'L-yeq, represents signal values 512. In some embodiments, the blind-equalizer sequencing system 106 can reformat the limited system model into a limited system model in matrix form where: Z = V/ ■ Y. In the limited system model in matrix form, Z represents the estimated response signal 518, W represents equalizer coefficients 516, and Y represents signal values 512 for at least the cluster of oligonucleotides. In some cases, the blind-equalizer sequencing system 106 can determine the estimated response signal 518 by multiplying the signal values (e.g., received pixel intensities) with the equalizer coefficients 516 that represent the equalizer response.
[0094] As further shown in FIG. 5, the blind-equalizer sequencing system 106 can utilize blind-equalizer sequencing system model 520. As discussed above, some existing systems utilize a decision-directed approach that determines and/or estimates and output response signal by directly processing the signal values 512. However, as discussed above, such an approach can lead to inaccurate weighted coefficients, output response signals, and base calls. In one or more implementations, the blind-equalizer sequencing system 106 can utilize the system model for measuring signal values in matrix form as discussed in FIG. 4 and the limited system model in matrix form as discussed above to determine accurate equalizer coefficients 516 in a feedforward approach.
[0095] As discussed in more detail below, the blind-equalizer sequencing system 106 can determine estimated point-spread-function values 508 (H). As discussed above in reference to FIG. 4, the blind-equalizer sequencing system 106 can represent signal values 512 for an expected response signal 504 according to the following equation: Y = H ■ X + V. Generally, in the given equation, the point-spread function (H) and noise values (7) are unknown variables. In some embodiments, the blind-equalizer sequencing system 106 can determine estimated point-spread- function values 508 and replace the point-spread function (H) with the estimated point-spread- function values (H) and the noise values (7) with estimated noise values (7) and representing the signal values 512 according to a new estimated system model: Y = H ■ X + 7 In one or more embodiments, the blind-equalizer sequencing system 106 can generate a system model with a blind
equalizer by combining the new estimated system model (e.g., Y = H ■ X + 7) with the limited system model in matrix form (e.g., : Z = W ■ Y). For instance, in some embodiments, the blindequalizer sequencing system 106 can replace the signal values (F) in the limited system model with the estimated point-spread-function values (H), estimated noise values (7), and the expected response signal (X). For example, in some cases, the blind-equalizer sequencing system model 520 can be modeled as: Z = W ■ (H ■ X + 7), where Z represents the estimated response signal 518, W represents the equalizer coefficients 516, H represents the estimated point-spread-function values 508, X represents the estimated response signal 518, and V represents the estimated noise values 510.
[0096] In some embodiments, the blind-equalizer sequencing system 106 can determine equalizer coefficients on a cluster-by-cluster basis. For example, in one or more implementations, the blind-equalizer sequencing system 106 can determine, for a target cluster of oligonucleotides, a target estimated point-spread-function value based on targets signal values corresponding to the target cluster of oligonucleotides. For example, as described above, the blind-equalizer sequencing system 106 can receive an image of a target cluster of oligonucleotides and determine a pointspread function 408 (as depicted in FIG. 4) based on the target signal values. Moreover, as previously mentioned, the blind-equalizer sequencing system 106 can determine target equalizer coefficients by combining the target estimated point-spread-function values and the estimated noise values within the channel. In one or more cases, the target equalizer coefficients can compensate for the target estimated point-spread-function values and the estimated noise values with respect to the target expected response signal. In certain implementations, the blind-equalizer sequencing system 106 can determine a base call for the target cluster by utilizing the target equalizer coefficients.
[0097] In some embodiments, the blind-equalizer sequencing system 106 determines the equalizer coefficients 516 by assuming that the estimated response signal 518 equals the estimated response signal 518. In other words, in some embodiments, the blind-equalizer sequencing system 106 assumes that the imaging device 506 did not disperse the expected response signal 504 and the blind-equalizer sequencing system 106 outputs an estimated response signal 518 that mirrors the input (e.g., the expected response signal 504). In one or more implementations, based on the expected response signal 504 equaling the estimated response signal 518 and determining the estimated point-spread-function values 508 and the estimated noise values 510, the blind-equalizer sequencing system 106 can determine equalizer coefficients 516 that compensate for the estimated point-spread-function values 508 and the estimated noise values 510 because the equalizer coefficients 516 are the only unknown variable in the in the blind-equalizer sequencing system model 520.
[0098] As just mentioned, in some embodiments, the blind-equalizer sequencing system 106 can utilize the blind-equalizer sequencing system model 520 to determine accurate equalizer coefficients 516. For example, in some cases, the blind-equalizer sequencing system 106 can minimize the mean squared error between the estimated response signal 518 and the expected response signal 504 in the blind-equalizer sequencing system model 520. For example, in an embodiment where the estimated response signal 518 and the expected response signal 504 for at least a cluster of nucleotides both equal 1 (e.g., “on”), the blind-equalizer sequencing system 106 can determine equalizer coefficients 516 that minimize the error or distance between the estimated response signal 518 and the expected response signal 504 in the blind-equalizer sequencing system model 520. In some embodiments, the blind-equalizer sequencing system 106 can determine the equalizer coefficients 516 utilizing a least-squares approach.
[0099] Moreover, in some embodiments, the blind-equalizer sequencing system 106 can adjust the equalizer coefficients 516 to minimize the mean squared error between one or more expected response signals corresponding to a set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide and one or more estimated response signals (e.g., including the estimated response signal 518) across the set of neighboring clusters of oligonucleotides. In some embodiments, the blind-equalizer sequencing system 106 can optimize the equalizer response so that the cluster at the center of the estimated point-spread-function values has an output of one and the set of neighboring (e.g., adjacent wells) have an output of zero.
[0100] As further shown in FIG. 5, the blind-equalizer sequencing system 106 can determine a base call 519 utilizing the equalizer coefficients 516. As described above, once the blind-equalizer sequencing system 106 determines the equalizer coefficients in some embodiments, the blindequalizer sequencing system 106 can utilize the equalizer coefficients to make a base call for one or more clusters of oligonucleotides. For example, the blind-equalizer sequencing system 106 can utilize an equalizer that applies the equalizer coefficients to signal values and generates an estimated response signal. In some embodiments, the blind-equalizer sequencing system 106 may utilize a linear equalizer to determine an intensity value for one or more clusters by processing signal values depicted in received images. Generally, a linear equalizer is a linear filter that can be designed or optimized to filter out noise. In certain embodiments, the equalizer can convert signal values representing light or energy dispersed over one or more pixels into the estimated response signal representing accurate intensity values for at least a cluster of oligonucleotides by linearly weighting pixel intensities with the equalizer coefficients and summing the weighted pixel intensities. In some embodiments, the linear filter can be applied to each cluster individually or across an entire image. In certain implementations, the blind-equalizer sequencing system 106 can use the more accurate intensity values to determine a base call.
[0101] For example, in one or more embodiments, the blind-equalizer sequencing system 106 determines a base call by analyzing the intensity values associated with the cluster of oligonucleotides. As previously mentioned, the emitted signals of the cluster can indicate the type of nucleotide base. For instance, in some embodiments, the blind-equalizer sequencing system 106 can analyze the intensity values for signals from the given cluster in both channels or in each of multiple channels (e.g., concurrently) to determine the nucleobase call. In some embodiments, based on the intensity values of the signal of the cluster in each channel, the blind-equalizer sequencing system 106 can calculate, utilizing an expectation maximization and Gaussian probability distributions, the probability that the signal falls within the intensity-value boundaries of a certain base (A, C, G, or T). The blind-equalizer sequencing system 106 can then call the nucleobase incorporated into the cluster by selecting the intensity -value boundaries of the nucleobase with the highest probability. For example, based on the intensity values emitted by the signal of the cluster, the blind-equalizer sequencing system 106 can determine that the intensityvalues boundaries of the nucleobase with the highest probability for the cluster is adenine (A).
[0102] As mentioned above, the blind-equalizer sequencing system 106 can determine the estimated point-spread-function values, such as the estimated point-spread-function values 508 depicted in FIG. 5. In accordance with one or more embodiments, FIG. 6, illustrates the blindequalizer sequencing system 106 determining the estimated point-spread-function values by converting one or more signal values from one or more cluster of oligonucleotides from a spatial domain to a frequency domain. FIG. 6 shows the blind-equalizer sequencing system 106 receiving signal values by receiving an initial image 602 depicting signal values from clusters of oligonucleotides within a larger region (or super region) of a nucleotide-sample slide and cropping the initial image 602 to generate an image 604 of a region of the nucleotide-sample slide, where the image 604 depicts an estimated response signal from a target cluster of oligonucleotides located within the region of a flow cell or other nucleotide-sample slide and in the context of a spatial domain. As used herein, the term “spatial domain” refers to a two-dimensional matrix (e.g., grid) depicting or representing an image comprising one or more pixels, where each pixel (e.g., element) corresponds to the intensity and/or location of a pixel on the image. In some cases, the image and corresponding pixels can depict one or more signal values for an expected response signal from a target cluster of oligonucleotides and/or expected response signals from neighboring clusters of oligonucleotides. As part of or in the alternative to receiving an image depicting one or more signal values for an estimated response signal from a target cluster of oligonucleotides within a region of a nucleotide-sample slide, the blind-equalizer sequencing system 106 can receive a channelspecific image depicting a signal value for an expected response signal from a target cluster of oligonucleotides within a pixel or sub-pixel of the channel-specific image. In some such
embodiments, the channel-specific image depicts (i) a single signal value for an expected response signal from a target cluster of oligonucleotides within a pixel or sub-pixel of the channel -specific image and (ii) additional signal values for additional expected response signals from additional target cluster of oligonucleotides within other pixels or other sub-pixels of the channel-specific image.
[0103] As further shown in FIG. 6, the blind-equalizer sequencing system 106 receives the image 604 by either cropping the image 604 of the region of the nucleotide-sample slide from the initial image 602 or accessing or receiving the image 604 of the region of the nucleotide-sample slide without such cropping. In some embodiments, for instance, the blind-equalizer sequencing system 106 selects a region (e.g., a sub-tile) of a nucleotide-sample slide from a larger or super region (e.g., tile) of the nucleotide-sample slide and crops the image 604 of the region of the nucleotide-sample slide from a center of the initial image 602. In addition or in the alternative to a center, in one or more embodiments, the blind-equalizer sequencing system 106 can select and crop an image of a region of the nucleotide-sample slide from any location of the initial image 602. In the alternative to cropping the initial image 602, the blind-equalizer sequencing system 106 can access or receive the image 604 of the region of the nucleotide-sample slide without such cropping. For instance, a camera or other imaging device of a sequencing device may initially capture the image 604 of the region of the nucleotide-sample slide and save the initially captured version of the image 604 for further processing.
[0104] Relatedly, the blind-equalizer sequencing system 106 can determine a size of the image 604 of the region of the nucleotide-sample slide. For example, in some embodiments, the size and/or dimension of the image 604 of the region of the nucleotide-sample slide can comprise 256 x 256 pixels. In one or more implementations, the blind-equalizer sequencing system 106 can select and/or crop a different size of the image 604 from the initial image 602. For example, the blind-equalizer sequencing system 106 can select an image depicting a particular side or region of a nucleotide-sample slide based on the particular sequencing device used to process a genomic sample and determining corresponding nucleotide reads.
[0105] As further shown in FIG. 6, the blind-equalizer sequencing system 106 can generate a convoluted matrix 608 depicting modified signal values of expected response signals from the target cluster of oligonucleotides and neighboring clusters of oligonucleotides by combining the image 604 of the region of the nucleotide-sample slide with a Hanning window 606 (e.g., two- dimensional Hanning window). As depicted in FIG. 6, the convoluted matrix 608 includes a matrix that modifies the values of pixels of an image. For example, the blind-equalizer sequencing system 106 can extract and/or highlight certain features and/or patterns of an image and include such features and/or patterns in the convoluted matrix 608. In one or more embodiments, the blind-
equalizer sequencing system 106 can generate the convoluted matrix 608 by performing elementwise multiplication between the pixels within the image 604 of the region of the nucleotide- sample slide and the Hanning window 606. Moreover, in certain embodiments, the convoluted matrix 608 can comprise a convoluted image depicting the modified signal values of the pixels in the initial image 602. In some embodiments, prior to converting the signal values from the spatial domain to the frequency domain, the blind-equalizer sequencing system 106 can further remove the DC offset (e.g., low frequency noise) from the signal values.
[0106] After generating the convoluted matrix 608 and as further shown in FIG. 6, the blindequalizer sequencing system 106 can convert the signal values in the convoluted matrix 608from the spatial domain to a frequency domain. As used herein, the term “frequency domain” refers to a domain that represents one or more signal values in terms of frequency components (e.g., sine and cosine components). In some cases, the frequency domain comprises points that represent a particular frequency (e.g., pixel intensity) present in an image or matrix in the spatial domain. For instance, in one or more embodiments, the frequency domain can express the rate of change of pixel intensities in an image. In some implementations, the blind-equalizer sequencing system 106 can perform additional image analysis in the frequency domain, as further described below.
[0107] As further shown in FIG. 6, in one or more embodiments, the blind-equalizer sequencing system 106 can transform the convoluted matrix 608 into a frequency domain matrix 610 depicting a power spectral density 612 of the signal values for the expected response signals. As shown in FIG. 6, the term “frequency domain matrix” refers to a matrix that includes one or more values representing power spectral density of a signal (e.g., an expected response signal) from a target cluster of oligonucleotides in a frequency domain. In some cases, the frequency domain matrix 610 constitutes or can be referred to as a frequency domain image. In one or more embodiments, the blind-equalizer sequencing system 106 generates the frequency domain matrix 610 by applying a Fast Fourier Transformation (FFT) to the convoluted matrix 608. For instance, in one or more cases, the blind-equalizer sequencing system 106 can apply a non-equispaced Fast Fourier (NFFT) transformation. In some instances, the NFFT can preserve the original data while enabling analysis of the signal values in the initial image 602. Moreover, in certain implementations, the blind-equalizer sequencing system 106 can generate the convoluted matrix 608 in the form of a complex or real matrix by generating conjugate (e.g., complement) matrix of the convoluted matrix 608 and combining (e.g., multiplying) the conjugate matrix with the convoluted matrix 608. Regardless of the format, in some cases, the blind-equalizer sequencing system 106 can normalize the convoluted matrix 608.
[0108] As further illustrated in FIG. 6, in some embodiments, as part of or when the blindequalizer sequencing system 106 converts the signal values from the spatial domain to the
frequency domain and generates the frequency domain matrix 610, the blind-equalizer sequencing system 106 can determine the power spectral density 612 of the signal values in the frequency domain and/or within the frequency domain matrix 610. As used herein, the term “power spectral density” refers to a distribution of power of signal values over frequency. For example, power spectral density can comprise or constitute an average measurement of energy (e.g., response) within a range of spectral bands (wavelengths or frequencies). In some cases, the power spectral density can be represented as a measure of the power spectral density or an average energy over a region of the nucleotide-sample slide. For example, the blind-equalizer sequencing system 106 can measure the power spectral density over a central tile or other tile of a nucleotide-sample slide. Thus, values for the power spectral density can indicate an accumulation and/or average of the energy from at least one cluster of oligonucleotides within a region of the nucleotide-sample slide. For example, in some embodiments, thousands to millions of clusters of oligonucleotides will be “on” or “off’ (e.g., have on or off expected response signals). To represent a power spectral density in a given region of a nucleotide-sample slide, the blind-equalizer sequencing system 106 can convert a power spectral density from a frequency domain to a spatial domain to generate estimated point-spread-function values from a point-spread function based on the on/off status of the cluster of oligonucleotides within the given region of the nucleotide-sample slide.
[0109] As further indicated by FIG. 6, the power spectral density 612 can collect at the comers of the frequency domain matrix 610. For example, in a comer coordinate system, the power spectral density 612 gathers at the comers of the frequency domain matrix 610. As further shown in FIG. 6, the blind-equalizer sequencing system 106 can generate an up-sampled power spectral density 614 by up-sampling the frequency domain matrix 610. In some cases, the blind-equalizer sequencing system 106 up-samples the frequency domain matrix 610 according to an up-sampling factor. Such an up-sampling factor includes a value that scales or represents a degree to which a matrix and/or image is expanded. To illustrate, in one or more embodiments, when or as part of up-sampling the frequency domain matrix 610 that is 256 x 256 pixels with an up-sampling factor of four, the blind-equalizer sequencing system 106 scales the frequency domain matrix 610 to the dimensions of the up-sampled power spectral density 614 of 1024 x 1024 pixels. In one or more embodiments, the up-sampling factor reduces the interpolation between different frequencies and/or signal values in the frequency domain matrix 610.
[0110] Generally, signals can be real or complex based on the components of the signal. In one or more embodiments, a signal is complex when it includes two different signals, such as where a first signal comprises real components and a second signal comprises imaginary components. In certain implementations, a signal is real when it only contains real numbers without any complex or imaginary components. As further background, signals can include an amplitude and phase. In
one or more embodiments, the amplitude indicates the height or magnitude of the light emitted by a signal and the phase indicates the position or timing of the signal relative to a reference point. In one or more embodiments, the transmission medium can determine if a signal is real or complex. For example, the transmission medium can be real or complex. In some embodiments, the transmission medium is complex if it distorts the amplitude and phase while transmitting a signal. [OHl] For example, as described above, the blind-equalizer sequencing system 106 uses an imaging device as the transmission medium for transmitting the expected response signal. In some cases, based on the blind-equalizer sequencing system 106 using an imaging device as the transmission medium for the expected response signal, the blind-equalizer sequencing system 106 only measures the magnitude or amplitude of the expected response signal without considering the phase of the expected response signal in the frequency domain. For example, as discussed above, the blind-equalizer sequencing system 106 can measure one or more signal values (e.g., light intensity) of the expected response signal in a captured image. In one or more cases, based on the blind-equalizer sequencing system 106 solely measuring the amplitude of the expected response signal, the blind-equalizer sequencing system 106 can determine that the transmission medium, imaging device, is real. By determining that the imaging device is real, the blind-equalizer sequencing system 106 can determine the estimated point-spread-function values by forcing it to be real.
[0112] As further illustrated in FIG. 6, the blind-equalizer sequencing system 106 can determine estimated point-spread-function values 624 by converting the up-sampled power spectral density 614 of the expected response signal from the frequency domain to the spatial domain. In one or more implementations, the blind-equalizer sequencing system 106 can enforce certain constrains while converting the power spectral density 612 and/or up-sampled power spectral density 614 from the frequency domain to the spatial domain. For example, as discussed above, the blind-equalizer sequencing system 106 can determine that the transmission medium, imaging device, is real and finite. In one or more embodiments, based on characteristics of the imaging device, the blind-equalizer sequencing system 106 can ensure that the estimated point-spread- function values are real by enforcing Hermitian symmetry to the power spectral density 612 and/or up-sampled power spectral density 614 in the frequency domain.
[0113] As mentioned above, in one or more cases, the blind-equalizer sequencing system 106 can determine the power spectral density 612 and/or the up-sampled power spectral density 614 in the frequency domain. Because signals can be complex or include different components, in one or more embodiments, the power spectral density 612 and/or up-sampled power spectral density 614has an amplitude component that represents the energy of the signal values across different spectral bands. Accordingly, in one or more implementations, the blind-equalizer sequencing
system 106 can determine the estimated point-spread-function values 624 in part by determining the amplitude of the estimated point-spread-function values 624. For instance, the blind-equalizer sequencing system 106 can determine the amplitude of the estimated point-spread-function values 624 by taking the square root of the amplitude components of the power spectral density 612 and/or the up-sampled power spectral density 614 and converting the amplitude of the estimated point- spread-function values 624 from the frequency domain to the spatial domain.
[0114] In addition or independent of determining an amplitude, in some embodiments, the blind-equalizer sequencing system 106 can convert the up-sampled power spectral density 614 from the frequency domain to the spatial domain by taking the square root of the up-sampled power spectral density 614 and applying an inverse Fast Fourier transform (IFFT) to the up-sampled power spectral density 614 of the estimated response signal. As used herein, the term “inverse Fast Fourier transform” refers to a mathematical operation that reverses the transformation performed by a Fast Fourier transformation. For example, the IFFT can take the frequency and/or amplitude components in the frequency domain and reconstruct the image in the spatial domain.
[0115] For instance, as shown in FIG. 6, the blind-equalizer sequencing system 106 can generate a spatial domain matrix 616 depicting an up-sampled PSF 618. As used herein, the term “spatial domain matrix” refers to a matrix that includes one or more values representing a point spread function for a signal from a target cluster of oligonucleotides in a spatial domain. For instance, the spatial domain matrix 616 can include values representing changes to one or more signal values of the target cluster of oligonucleotides occurring from a change between the frequency domain to the spatial domain. In one or more embodiments, the spatial domain matrix 616 can be a spatial domain image depicting the PSF and/or up-sampled PSF 618 of the expected response signal. Relatedly, an up-sampled PSF comprises a function that describes an up-sampled response of an imaging device or other optical system to a point source. In one or more embodiments, the up-sampled PSF 618 can comprise a function that determines values for upsampling the power spectral density 612 in the spatial domain. As shown in FIG. 6, in some embodiments, based on a comer-coordinate system, the up-sampled PSF 618 can be captured in comers of the spatial domain matrix 616.
[0116] In addition or without up-sampled PSF values, in some embodiments, the blindequalizer sequencing system 106 can further enforce symmetry on estimated point-spread-function values 624 in the spatial domain and normalize the amplitude of the estimated point-spread- function values. For instance, in one or more implementations, the blind-equalizer sequencing system 106 can generate an intermediate PSF 620. As depicted in FIG. 6, an intermediate PSF includes a transformed matrix or image of the up-sampled PSF 618 at a center coordinate of the matrix and/or image depicting values from the intermediate PSF 620. As indicated by FIG. 6, the
blind-equalizer sequencing system 106 can generate the intermediate PSF by selecting (e.g., cropping) and combining data of the up-sampled PSF 618 in the comer regions of the spatial domain matrix 616. In one or more embodiments, the blind-equalizer sequencing system 106 can further normalize the cropped and combined data from the up-sampled PSF 618 and set the intermediate PSF 620 as a center coordinate within the matrix and/or image depicting the intermediate PSF 620.
[0117] In addition to generating the intermediate PSF 620, as further shown in FIG. 6, the blind-equalizer sequencing system 106 can generate the estimated point-spread-function values 624 by applying a Hamming window 622 to the intermediate PSF 620. In some embodiments, applying the Hamming window 622 involves a two-dimensional convolution with a padding Hamming interpolator. In some cases, the blind-equalizer sequencing system 106 can up-sample the intermediate PSF 620 by the up-sampling factor. As shown in FIG. 6, by convolving the Hamming window 622 with the intermediate PSF 620 and up-scaling the intermediate PSF 620, the blind-equalizer sequencing system 106 generates the estimated point-spread-function values 624. Thus, in one or more embodiments, the blind-equalizer sequencing system 106 can determine estimated point-spread-function values that are real, two-dimensional, finite, padded, and/or symmetric.
[0118] By determining the estimated point-spread-function values 624 as depicted in FIG. 6, the blind-equalizer sequencing system 106 can determine such estimated PSF values relevant to a given channel for a camera or other imaging device of a sequencing device. In addition or in the alternative to determining estimated PSF values and equalizer coefficients for an image in an initial or a single channel, the blind-equalizer sequencing system 106 can determine such estimated PSF values — and determine equalizer coefficients based on such estimated PSF values — for images of a given region of a nucleotide-sample slide in different channels (e.g., a first channel and a second channel). As shown in FIG. 7, the blind-equalizer sequencing system 106 can utilize such estimated point-spread-function values to improve a signal-to-noise ratio of a target cluster of oligonucleotides. In accordance with one or more embodiments of the present disclosure, FIG. 7 illustrates the blind-equalizer sequencing system 106 generating an image matrix comprising equalizer coefficients for one or more channels and applying the image matrix to subregions of a flow cell or other nucleotide-sample slide within an image.
[0119] As shown in FIG. 7, the blind-equalizer sequencing system 106 can receive an image 702 from a first channel. As discussed above, the blind-equalizer sequencing system 106 can access or select the image 702 depicting a region 703 of the nucleotide-sample slide. In some cases, the image 702 from the first channel is selected and cropped from an initial image. Moreover, as shown in FIG. 7, the blind-equalizer sequencing system 106 can determine estimated point-spread-
function values 708 for the first channel — consistent with the process depicted in FIG. 6 and described above.
[0120] As mentioned above, the blind-equalizer sequencing system 106 can utilize the physical characteristics of a sequencing device to determine equalizer coefficients. FIG. 7 depicts and the following paragraphs describe how the blind-equalizer sequencing system 106 utilizes the nucleotide-sample slide and aspects of the sequencing device to determine equalizer coefficients. FIG. 7 further illustrates that the blind-equalizer sequencing system 106 can receive estimated cluster locations for a target cluster and neighboring clusters of oligonucleotides. For example, in some embodiments, the blind-equalizer sequencing system 106 can receive the estimated cluster locations by receiving a patterned arrangement 706 of the estimated cluster locations for the target cluster and the neighboring clusters of oligonucleotides arranged according to a pattern within the region 703 of the nucleotide-sample slide. As used herein, the term “patterned arrangement” refers to a patterned configuration of estimated cluster locations or estimated well locations comprising either clusters of oligonucleotides (e.g., comprising clusters of oligonucleotides or unseeded lawn). In one or more embodiments, the patterned arrangement can include a patterned distribution of the estimated cluster locations within a region of the nucleotide-sample slide. For example, in some cases, the patterned arrangement can take a form of, but is not limited to, a grid taking a shape of a square, rectangle, triangle, rhombus, hexagon, or diamond. In certain implementations, the patterned arrangement can include a pitch between estimated cluster locations. For example, the pitch can indicate an estimated distance between pixels depicting clusters of oligonucleotides.
[0121] In contrast to a patterned arrangement, in certain implementations, the blind-equalizer sequencing system 106 can receive the estimated cluster locations by receiving an unpattemed arrangement of estimated cluster locations for a target cluster and neighboring clusters of oligonucleotides. As used herein, the term “unpattemed arrangement” refers to a randomly or unevenly distributed configuration of estimated cluster locations or estimated well locations (e.g., comprising clusters of oligonucleotides or unseeded lawn).
[0122] Based on a patterned arrangement of estimated cluster locations within a region of a nucleotide-sample slide, however, the blind-equalizer sequencing system 106 can determine a grid of estimated nanowell locations for nanowells comprising a target cluster of oligonucleotides and neighboring clusters of oligonucleotides. To illustrate, based on a square patterned arrangement of estimated cluster locations, the blind-equalizer sequencing system 106 can generate a square grid of the estimated nanowell locations for nanowells including the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides.
[0123] As mentioned above, the blind-equalizer sequencing system 106 can utilize the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of
oligonucleotides and the estimated point-spread-function values 708 to generate an image matrix 712 with equalizer coefficients. As suggested by FIG. 7, for instance, the blind-equalizer sequencing system 106 can combine the estimated point-spread-function values 708 and the estimated cluster locations to determine the image matrix 712 comprising the equalizer coefficients. As discussed below in FIG. 8, in some cases, the blind-equalizer sequencing system 106 can generate the image matrix 712 comprising equalizer coefficients by identifying the pitch of the patterned arrangement 706 for the region 703 of the nucleotide-sample slide and utilizing the pitch of the patterned arrangement 706 to generate a convolution matrix for a given channel. In one or more embodiments, the blind-equalizer sequencing system 106 utilizes the convolution matrix for the channel to determine the equalizer coefficients.
[0124] From the image matrix 712, as further shown in FIG. 7, the blind-equalizer sequencing system 106 can generate, for the first channel, a set of subregion image matrices 716 to subsequently apply to a set of subregions 720 of the image 702 of the region 703 of nucleotide- sample slide. As its name suggests, a subregion image matrix can constitute a subregion of a larger image matrix (e.g., the image matrix 712) and include equalizer coefficients for editing and/or processing a subregion of an image. By being a container of such equalizer coefficients, a subregion image matrix applies such coefficients and increases or maximizes signal-to-noise ratio of an expected response signal affected by noise and/or crosstalk.
[0125] Because a subregion image matrix includes equalizer coefficients, in one or more instances, the subregion image matrix can include subregion equalizer coefficients. Such subregion equalizer coefficients can accordingly include weighted values that can be applied to a subregion of an image of clusters of oligonucleotides that adjust for (or reduce) inter-symbol interference (e.g., crosstalk) between clusters of oligonucleotides.
[0126] To apply such subregion equalizer coefficients, as further shown in FIG. 7, the blindequalizer sequencing system 106 identifies corresponding subregions of an image. In some cases, for instance, the blind-equalizer sequencing system 106 identifies a set of subregions 720 from the image 702 of the region 703 of the nucleotide-sample slide. As its name implies, a subregion of an image that is part of a set of subregions includes or corresponds to a subregion of a nucleotide- sample slide from a larger region. Accordingly, in some cases, the subregion differs in size, dimension, and location of the region 703 of the nucleotide-sample slide but can nevertheless be within the region 703.
[0127] Accordingly, as further shown in FIG. 7, the blind-equalizer sequencing system 106 can identify, for a channel, the set of subregions 720 within the image 702 of the region 703 of the nucleotide-sample slide. In some embodiments, the blind-equalizer sequencing system 106 can select the number, size, and/or dimensions of the subregions in the set of subregions 720. For
example, as shown in FIG. 7, the blind-equalizer sequencing system 106 identifies nine such subregions (e.g., 3 x 3) for the set of subregions 720. By contrast, in some cases, the blind-equalizer sequencing system 106 can identify, from an image of a nucleotide-sample-slide region, nine subregions of different layouts (e.g., 1 x 9), five subregions of a different layout (e.g., 1 x 5), seven subregions (e.g., 1 x 7), or fifteen subregions (e.g., 3 x 5) for the set of subregions 720.
[0128] Having generated the image matrix 712 and/or identified the set of subregions 720, in some cases, the blind-equalizer sequencing system 106 generates or initializes the set of subregion image matrices 716 with subregion equalizer coefficients. For example, the blind-equalizer sequencing system 106 can generate the set of subregion image matrices 716 comprising equalizer coefficients that initially match the equalizer coefficients of the image matrix 712. Further, in some cases, the blind-equalizer sequencing system 106 can improve the accuracy of such subregion equalizer coefficients of a set of subregion image matrices over sequencing cycles of a sequencing run (e.g., by utilizing a decision-direct approach to adjust equalizer coefficients). Likewise, in one or more embodiments, the blind-equalizer sequencing system 106 can improve the accuracy of the equalizer coefficients of an image matrix over the course of a sequencing run. For example, in some cases, the blind-equalizer sequencing system 106 can correct for systematic differences in a current sequencing run relative to offline training of the equalizer.
[0129] In one or more embodiments, the blind-equalizer sequencing system 106 can generate a base call for a target cluster of oligonucleotides. In particular, the blind-equalizer sequencing system 106 can apply the image matrix 712 comprising the equalizer coefficients to the image 702 from a first channel. In one or more embodiments, the blind-equalizer sequencing system 106 can apply non-linear distortion to the equalized image and extract signal values from the equalized image. In certain cases, the blind-equalizer sequencing system 106 can spatially normalize and compress the signal values of the modified image. In some implementations, the blind-equalizer sequencing system 106 can further correct for phasing and pre-phasing of the signal values of the modified image and normalize the signal values. The blind-equalizer sequencing system 106 can use the corrected signal values to make a base call and generate a quality score for the target cluster of oligonucleotides.
[0130] As indicated above, the blind-equalizer sequencing system 106 can determine channelspecific image matrices comprising channel-specific equalizer coefficients. For instance, as shown in FIG. 7, the blind-equalizer sequencing system 106 can determine an image matrix 714 comprising additional equalizer coefficients for an image 704 from a second channel. As FIG. 7 illustrates, the blind-equalizer sequencing system 106 can access or otherwise receive the image 704 from the second channel consistent with the description above. In some embodiments, the image 704 depicts one or more additional signal values in the second channel for an additional
expected response signal from the target cluster of oligonucleotides within an additional region 705 of the nucleotide-sample slide. In some embodiments, the additional region 705 of the nucleotide- sample slide depicted by the image 704 from the second channel can be the same region as the region 703 of the nucleotide-sample slide depicted by the image 702 from the first channel.
[0131] As further shown in FIG. 7 and previously described in FIG. 6, the blind-equalizer sequencing system 106 can determine, for the second channel, estimated point-spread-function values 710 based on the image 704 depicting one or more additional signal values corresponding to the target cluster of oligonucleotides. Relatedly, in one or more cases, the blind-equalizer sequencing system 106 can determine additional estimated noise values in the second channel consistent with the description above.
[0132] As depicted in FIG. 7, the blind-equalizer sequencing system 106 determines, for the second channel, an image matrix 714 using a same or similar process as performed for the image matrix 712 for the first channel. Accordingly, as indicated above, the blind-equalizer sequencing system 106 can determine the image matrix 714 by combining the estimated point-spread-function values 710 with the estimated cluster locations from the patterned arrangement 706 or an unpattemed arrangement (not shown). As shown in FIG. 7, in some embodiments, the blindequalizer sequencing system 106 can use the same estimated cluster locations (e.g., the patterned arrangement 706) for the image 704 from the second channel. As further shown in FIG. 7, in some embodiments, the blind-equalizer sequencing system 106 can likewise generate, for the second channel and the image 704, a set of subregion image matrices 718 comprising additional subregion equalizer coefficients, as described above.
[0133] As previously indicated, in some cases, the blind-equalizer sequencing system 106 can determine a base call for the target cluster of oligonucleotides based on signal values that have been determined from multiple channels and from equalizer coefficients corresponding to the multiple channels. In particular, the blind-equalizer sequencing system 106 can determine a base call for a target cluster of oligonucleotides based on a single intensity value of the emitted signal in each channel for a given sequencing cycle. For example, in a two-channel implementation, the blindequalizer sequencing system 106 can use a first intensity value (X) for a target cluster of oligonucleotides in a first channel and a second intensity value (Y) for the target cluster of oligonucleotides in a second channel to determine a probability that the signal values are located within the intensity-value boundaries of a certain nucleobase (e.g., A, C, G, or T).
[0134] FIG. 7 illustrates an example of equalizer coefficients that can be applied to modify such first and second intensity values for a target cluster of oligonucleotides in first and second channels. By applying the image matrix 712 to the image 702 depicting one or more signal values (e.g., a single signal value) for an expected response signal in a first channel and from the target
cluster of oligonucleotides — and applying the image matrix 714 to the image 704 depicting one or more additional signal values for an additional expected response signal in a second channel and from the target cluster of oligonucleotides — the blind-equalizer sequencing system 106 generates an estimated response signal that conveys more accurate intensity values for the target cluster. Alternatively, by applying a subregion image matrix of the set of subregion image matrices 716 to a subregion, from the set of subregions 720 of the image 702, depicting one or more signal values for an expected response signal in a first channel and from the target cluster of oligonucleotides — and applying a subregion image matrix of the set of subregion image matrices 718 to a subregion, from a set of subregions 722 of the image 704, depicting one or more signal values for an expected response signal in a second channel and from the target cluster of oligonucleotides — the blindequalizer sequencing system 106 generates an estimated response signal that conveys more accurate intensity values for the target cluster. In some embodiments, the set of subregions 722 of the image 704 in the second channel depict the same subregions of the nucleotide-sample slide as the set of subregions 720 of the image 702 in the first channel. Based on the estimated response signal that accounts for the equalizer coefficients or subregion equalizer coefficients in both the first and second channel, the blind-equalizer sequencing system 106 generates a base call for the target cluster of oligonucleotides depicted by both the image 702 and the image 704 for a given sequencing cycle.
[0135] In addition to determining a base call for such an initial target cluster of oligonucleotides depicted by a subregion from the set of subregions 720 of the image 702, in some instances, the blind-equalizer sequencing system 106 also determines one or more additional base calls for additional target clusters of oligonucleotides depicted by the set of subregions 720 of the image 702. By applying the image matrix 712 to the image 702 depicting additional signal values for additional expected response signals in a first channel and from additional target cluster of oligonucleotides depicted by the set of subregions 720 of the image 702 — and applying the image matrix 714 to the image 704 depicting additional signal values for additional expected response signals in a second channel and from the additional target cluster of oligonucleotides — the blindequalizer sequencing system 106 generates additional estimated response signals that convey more accurate intensity values for such additional target clusters. Additionally or alternatively, by applying the set of subregion image matrices 716 to the set of subregions 720 of the image 702 depicting additional signal values for additional expected response signals in a first channel and from additional target cluster of oligonucleotides — and applying the set of subregion image matrices 718 to the set of subregions 722 of the image 704 depicting additional signal values for additional expected response signals in a second channel and from the additional target cluster of oligonucleotides — the blind-equalizer sequencing system 106 generates additional estimated
response signals that convey more accurate intensity values for such additional target clusters. Based on the additional estimated response signals that account for the equalizer coefficients or subregion equalizer coefficients in both the first and second channel, the blind-equalizer sequencing system 106 generates additional base calls for the additional target clusters of oligonucleotides depicted by both the image 702 and the image 704 for a given sequencing cycle. [0136] As just described, the blind-equalizer sequencing system 106 can utilize an image matrix comprising equalizer coefficients to determine a base call for a target cluster. In particular embodiments, the blind-equalizer sequencing system 106 can generate the image matrix comprising equalizer coefficients by utilizing the estimated point-spread-function values and estimated cluster locations of the target cluster of oligonucleotides and neighboring clusters of oligonucleotides. In some cases, the blind-equalizer sequencing system 106 utilizes the estimated cluster locations of the target cluster of oligonucleotides and neighboring clusters of oligonucleotides to apply a distribution function to signal values of the target cluster of oligonucleotides and neighboring clusters of oligonucleotides. In accordance with one or more embodiments of the present disclosure, FIG. 8 illustrates the blind-equalizer sequencing system 106 applying such a distribution function.
[0137] As discussed above, the blind-equalizer sequencing system 106 can receive an arrangement (e.g., patterned or unpattemed) of estimated cluster locations of a target cluster of oligonucleotides and neighboring clusters of oligonucleotides. In some embodiments, the blindequalizer sequencing system 106 can set the estimated cluster location of the target cluster of oligonucleotides as a center coordinate 810 of an up-sampled arrangement 802 of estimated cluster locations. As shown in FIG. 8, the blind-equalizer sequencing system 106 generates the up- sampled arrangement 802 of estimated cluster locations by applying an up-sampling factor to an initial arrangement (e.g., patterned arrangement) of estimated cluster locations within a region of a nucleotide-sample slide. For instance, as shown in FIG. 8, the blind-equalizer sequencing system 106 can multiply a pitch (e.g., AX, AY) between pixels representing estimated cluster locations by the up-sampling factor. As further shown in FIG. 8, the estimated cluster location of the target cluster of oligonucleotides can stay at the center coordinate 810 of the up-sampled arrangement 802 after application of the up-sampling factor.
[0138] In some embodiments, the blind-equalizer sequencing system 106 can determine equalizer coefficients in part by utilizing an arrangement of estimated cluster locations of a target cluster of oligonucleotides and neighboring clusters of oligonucleotides to modify signal values of the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides. As shown in FIG. 8, the blind-equalizer sequencing system 106 can apply a distribution function 804 to modify one or more signal values 808 of the target cluster of oligonucleotides and additional signal values
of the neighboring clusters of oligonucleotides as depicted in graph 806. As indicated by a value scale 812 shown in FIG. 8, in some embodiments, the distribution function 804 can comprise a Dirac delta function that sets (i) the one or more signal values 808 of the target cluster of oligonucleotides at the center coordinate 810 to one and (ii) the additional signal values of the neighboring clusters of oligonucleotides within a region of a nucleotide-sample slide to zero. For example, as shown in FIG. 8, the one or more signal values 808 of the target cluster of oligonucleotides at the center coordinate 810 of the up-sampled arrangement 802 is set to one and the additional signal values of the neighboring clusters of oligonucleotides at non-central coordinates are set to zero.
[0139] In some embodiments, the blind-equalizer sequencing system 106 can determine an image matrix comprising the equalizer coefficients by modifying and combining the estimated point-spread-function values and estimated noise values. For instance, in some cases, a size and/or dimensions of the image matrix can be based on the sequencing device and/or size of a region of a nucleotide-sample slide depicted in the image. To illustrate, based on the qualities of a first sequencing device and an image depicting 256 x 256-pixel region of the nucleotide-sample slide, the dimensions of the image mask can be 7 x 7 pixels. Relatedly, in one or more cases, the size and/or dimensions of the estimated point-spread-function values (e.g., matrix) can differ from the size and/or dimensions of the image matrix.
[0140] Accordingly, in certain embodiments, the blind-equalizer sequencing system 106 can compensate for different properties of a sequencing device and/or a nucleotide-sample slide by modifying the estimated point-spread-function values and the estimated noise values of the target cluster of oligonucleotides. For example, in some implementations, the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values by (i) sampling one or more values from the estimated point-spread-function values and generating convoluted estimated point- spread-function values for a given channel with the sampled values from the estimated point- spread-function values and (ii) combining the estimated point-spread-function values with the distribution function.
[0141] In some cases, the blind-equalizer sequencing system 106 can determine the image matrix comprising equalizer coefficients by combining the modified estimated point-spread- function values and the modified estimated noise values. In certain embodiments, the blindequalizer sequencing system 106 can modify the estimated noise values by sampling a random subset of noise values from the estimated noise values. In one or more implementations, the blindequalizer sequencing system 106 can combine the modified estimated point-spread-function values and the modified estimated noise values by dividing the modified estimated point-spread-function values with the modified estimated noise values.
[0142] As just described, in one or more embodiments, the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values and estimated noise values by transposing the estimated point-spread-function values and the estimated noise values. For example, in some cases, the blind-equalizer sequencing system 106 can modify the estimated point- spread-function values and estimated noise values according to the following equation: V/ = HTH+VTV’ where W represents the equalizer coefficients, H represents the estimated point-spread- function values, T represents transposition, and V represents estimated noise values. In some embodiments, as shown by the foregoing equation, the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values by generating transposed delta estimated point- spread-function values (H )T and square and symmetric estimated point-spread-function values( /T H) . Moreover, the blind-equalizer sequencing system 106 can modify the estimated noise values by generating square and symmetric transposed estimated noise values ( T7).
[0143] In one or more embodiments, the blind-equalizer sequencing system 106 can also model the foregoing equation in the inverse as follows: W = HTH+VTV = inv HTH + VTV) * (H )T, where inv represents a multiplicative inverse operation. As suggested above, the characteristics of a minimum phase channel, such as causality and stability, make the minimum phase channel’s inverse system unique. Consequently, after determining H representing or comprising the estimated point-spread-function values, the blind-equalizer sequencing system 106 can unique equalizer coefficients W based on the foregoing equation using a multiplicative inverse operation.
[0144] As indicated by the equation above for W, in some embodiments, the blind-equalizer sequencing system 106 can modify the estimated point-spread-function values by generating transposed delta estimated point-spread-function values. As used herein, the term “transposed delta estimated point-spread-function values” refers to a transposed matrix of estimated point-spread- function values modified by a distribution function. In some cases, the transposed delta estimated point-spread-function values can accordingly be a transposition of estimated point-spread-function values modified by a Dirac delta function.
[0145] As further indicated by the equation above for V/ and in addition to modifying estimated point-spread-function values by transposition, in one or more cases, the blind-equalizer sequencing system 106 can further modify the estimated point-spread-function values by generating square and symmetric estimated point-spread-function values. As used herein, the term “square and symmetric estimated point-spread-function values” refers to estimated point-spread- function values combined with transposed estimated point-spread-function values. For example, the square and symmetric estimated point-spread-function values can include multiplying the
estimated point-spread-function values with a transposed (e.g., flipped) version of the estimated point-spread-function values. As mentioned above, in one or more embodiments, the estimated point-spread-function values can be generated by up-sampling the intermediate PSF 620 with a hamming interpolator.
[0146] In addition to modifying estimated point-spread-function values and as likewise indicated by the equation above for V/, in certain cases, the blind-equalizer sequencing system 106 can modify estimated noise values by generating square and symmetric estimated noise values. As used herein, the term “square and symmetric transposed estimated noise values” refers to estimated noise values combined with transposed estimated noise values. To illustrate, the blind-equalizer sequencing system 106 can multiply a matrix of estimated noise values with a transposed matrix of the estimated noise values. In some cases, the estimated noise values are a randomly sampled subset of noise values. For example, in some cases, the estimated noise values can be generated by up-sampling the noise with a hamming interpolator.
[0147] In one or more cases, the blind-equalizer sequencing system 106 can further determine the image matrix comprising the equalizer coefficients ^/indicated above by combining the transposed delta estimated point-spread-function values, the square and symmetric estimated point- spread-function values, and the square and symmetric transposed estimated noise values. For example, in one or more embodiments, the blind-equalizer sequencing system 106 can divide the transposed delta estimated point-spread-function values by the square and symmetric estimated point-spread-function values and the square and symmetric transposed estimated noise values.
[0148] In some embodiments, the blind-equalizer sequencing system 106 can further normalize the equalizer coefficients by forcing a combination of the equalizer coefficient and the
channel to be one at a center pixel according to the following equation: V/ = =
inv(WH ) * W, where V/ represents normalized equalizer coefficients. In some embodiments, the blind-equalizer sequencing system 106 can generate the image matrix by reshaping the normalized equalizer coefficients in column major order.
[0149] In addition to initially determining equalizer coefficients V/ as indicated above or by the descriptions of FIGS. 2 - 8, in some embodiments, the blind-equalizer sequencing system 106 can update the equalizer coefficients during sequencing cycles. For example, the blind-equalizer sequencing system 106 can utilize a decision directed approach and/or feedback loop to fine tune the equalizer coefficients that have been blindly initialized. For example, the blind-equalizer sequencing system 106 can initialize an equalizer with the equalizer coefficients. As discussed above, an equalizer utilizing the equalizer coefficients can generate a more accurate estimated response signal. Relatedly, the blind-equalizer sequencing system 106 can determine an additional
estimated response signal during a subsequent sequencing cycle. Subsequently, the blind-equalizer sequencing system 106 can utilize the additional estimated response signal to update the equalizer coefficients. For example, the blind-equalizer sequencing system 106 can process one or more additional signal values and approximate an additional estimated response signal for at least one cluster of oligonucleotides. Based on determining the minimum mean squared error between the additional estimated response signal and the additional expected response signal, the blindequalizer sequencing system 106 can update the equalizer coefficients.
[0150] As discussed above, the blind-equalizer sequencing system 106 can improve the accuracy of base calling relative to existing sequencing systems. In accordance with one or more embodiments of the present disclosure, FIG. 9 illustrates improved performance of the blindequalizer sequencing system 106 in terms of base-call-quality scores for nucleobase calls of a sequencing device. In particular, FIG. 9 shows a graph 902 depicting the percentage of nucleobase calls across a sequencing run that equal or exceed and a base-call-quality score (Q score) of 30 across cycle 0 through approximately cycle 325. As shown in FIG. 9, the blind-equalizer sequencing system 106 generates a higher percentage of base calls meeting or exceeding a base- call-quality score (Q score) of 30 across multiple cycles in relation to a baseline or existing sequencing system that initializes or adjusts equalizer coefficients primarily or exclusively through a training approach of determining differences or losses as a basis for updating equalizer coefficients based on a comparison of predicted base calls and assumed base calls.
[0151] FIG. 9 depicts the improved performance in terms of base-call-quality scores through (i) plot lines 904a and 904b representing a percentage of nucleobase calls determined by the blindequalizer sequencing system 106 that satisfy or exceed Q30 and (ii) plot lines 906a and 906b representing a percentage of nucleobase calls determined by a baseline sequencing system that satisfy or exceed Q30 and. As shown by the plot lines 904a and 906a across cycle 0 through approximately cycle 150 for a first nucleotide read mate (Rl), the blind-equalizer sequencing system 106 generates a higher percentage of nucleobase calls that satisfy Q30 for Rl relative to the baseline sequencing system. As further shown by the plot lines 904b and 906b across approximately cycle 150 through approximately cycle 325 for a second nucleotide read mate (R2), the blind-equalizer sequencing system 106 likewise generates a higher percentage of nucleobase calls that satisfy Q30 for R2 relative to the baseline sequencing system.
[0152] FIGS. 1-9, the corresponding text, and the examples provide a number of different methods, systems, devices, and non-transitory computer-readable media of the blind-equalizer sequencing system 106. In addition to the foregoing, one or more embodiments can also be described in terms of flowcharts comprising acts for accomplishing particular results, as shown in FIGS. 10A-10B. In some embodiments the series of acts may be performed with more or fewer
acts. Further, the acts may be performed in different orders. Additionally, the acts described herein may be repeated or performed in parallel with one another or in parallel with different instances of the same or similar acts.
[0153] FIG. 10A illustrates a flowchart of a series of acts 1000 for determining a base call for a cluster of oligonucleotides utilizing the equalizer coefficients in accordance with one or more embodiments. While FIG. 10A illustrates acts according to one embodiment, alternative embodiments may omit, add to, reorder, and/or modify any of the acts shown in FIG. 10A. In some implementations, the acts of FIG. 10A are performed as part of a method. In some instances, a non- transitory computer-readable medium stores instructions thereon that, when executed by at least one processor, cause a computing device to perform the acts of FIG. 10A. In some implementations, a system performs the acts of FIG. 10A. For example, in one or more cases, a system includes at least one processor and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to perform the acts of FIG. 10A.
[0154] As shown in FIG. 10A, in one or more implementations, the series of acts 1000 includes an act 1002 for receiving signal values for an estimated response signal from a cluster of oligonucleotides. Additionally, the series of acts 1000 includes an act 1004 of determining estimated point-spread-function values based on the signal values for the cluster of oligonucleotides. Further, the series of acts 1000 includes an act 1006 of determining estimated noise values. The series of acts 1000 further includes an act 1008 of determining equalizer coefficients based on the estimated point-spread-function values and the estimated noise values. In some cases, the series of acts includes an act 1010 of determining a base call for the cluster of oligonucleotides utilizing the equalizer coefficients.
[0155] For example, the series of acts 1000 depicted in FIG. 10A (or a series of acts 1011 depicted in FIG. 10B) can include acts to perform any of the operations described in the following clauses:
CLAUSE 1. A computer-implemented method comprising: receiving, for a sequencing cycle, signal values for an expected response signal from at least a cluster of oligonucleotides within a region of a nucleotide-sample slide; determining, for a channel, estimated point-spread-function values based on the signal values corresponding to at least the cluster of oligonucleotides; determining estimated noise values within the channel; determining, based on combining the estimated point-spread-function values and the estimated noise values within the channel, equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values with respect to the expected response signal from at least the cluster of oligonucleotides; and
determining a base call for at least the cluster of oligonucleotides utilizing the equalizer coefficients.
CLAUSE 2. The computer-implemented method of clause 1, wherein the signal values from the cluster of oligonucleotides corresponds to the estimated point-spread-function values combined with the expected response signal from the cluster of oligonucleotides summed with the estimated noise values.
CLAUSE 3. The computer-implemented method of clause 1, further comprising determining the equalizer coefficients by adjusting the equalizer coefficients to minimize a mean squared error between one or more expected response signals corresponding to a set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide and one or more estimated response signals across the set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide.
CLAUSE 4. The computer-implemented method of clause 3, wherein the expected response signal corresponds to the equalizer coefficients combined with the signal values from at least the cluster of oligonucleotides.
CLAUSE 5. The computer-implemented method of clause 1, further comprising determining the estimated point-spread-function values by: receiving the signal values of the expected response signal from at least the cluster of oligonucleotides within a region of the nucleotide-sample slide in a spatial domain; converting the signal values of the expected response signal from at least the cluster of oligonucleotides to a frequency domain; determining a power spectral density of the signal values of the expected response signal in the frequency domain; and converting the power spectral density of the expected response signal the frequency domain to the spatial domain by applying an inverse fast Fourier transformation (IFFT) to the power spectral density of the expected response signal.
CLAUSE 6. The computer-implemented method of clause 5, further comprising enforcing Hermitian symmetry for the expected response signal in the frequency domain.
CLAUSE 7. The computer-implemented method of clause 5, wherein the power spectral density is an average measurement of energy within a range of spectral bands.
CLAUSE 8. The computer-implemented method of clause 5, wherein converting the power spectral density of the expected response signal in the frequency domain to the spatial domain further comprises taking a square root of the power spectral density of the expected response signal in the frequency domain.
CLAUSE 9. The computer-implemented method of clause 5, wherein the expected response signal comprises a measurement of an amplitude of the expected response signal.
CLAUSE 10. The computer-implemented method of clause 1, wherein the estimated noise values comprise independent identically distributed Gaussian noise.
CLAUSE 11. The computer-implemented method of clause 1, wherein the channel is a minimum phase response channel.
CLAUSE 12. The computer-implemented method of clause 1, further comprising: determining an estimated response signal from at least the cluster of oligonucleotides; and based on the estimated response signal, determining a base call for at least the cluster of oligonucleotides.
CLAUSE 13. The computer-implemented method of clause 1, further comprising: initializing an equalizer utilizing the equalizer coefficients; during a subsequent sequencing cycle, determining an additional estimated response signal from at least an additional cluster of oligonucleotides; and based on the additional estimated response signal, updating the equalizer coefficients.
CLAUSE 14. The computer-implemented method of clause 1, further comprising: determining, for the channel, target estimated point-spread-function values based on target signal values corresponding to a target cluster of oligonucleotides; determining, based on combining the target estimated point-spread-function values and the estimated noise values within the channel, target equalizer coefficients that compensate for the target estimated point-spread-function values and the estimated noise values with respect to a target expected response signal from the target cluster of oligonucleotides; and determining the base call for the target cluster of oligonucleotides utilizing the target equalizer coefficients.
CLAUSE 15. The computer-implemented method of clause 1, further comprising: initializing an equalizer of a sequencing device utilizing the equalizer coefficients; during a subsequent sequencing cycle on the sequencing device, determining an additional estimated response signal from at least an additional cluster of oligonucleotides; and based on the additional estimated response signal, modifying the equalizer coefficients for the equalizer of the sequencing device.
[0156] FIG. 10B illustrates a flowchart of a series of acts 1011 for determining a base call for a target cluster of oligonucleotides utilizing an image matrix in accordance with one or more embodiments. While FIG. 10B illustrates acts according to one embodiment, alternative embodiments may omit, add to, reorder, and/or modify any of the acts shown in FIG. 10B. In some
implementations, the acts of FIG. 10B are performed as part of a method. In some instances, anon- transitory computer-readable medium stores instructions thereon that, when executed by at least one processor, cause a computing device to perform the acts of FIG. 1 OB. In some implementations, a system performs the acts of FIG. 10B. For example, in one or more cases, a system includes at least one processor and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to perform the acts of FIG. 10B.
[0157] As shown in FIG. 10B, in one or more implementations, the series of acts 1011 can include an act 1012 of receiving an image depicting one or more signal values for an expected response signal from a target cluster of oligonucleotides. In some cases, the series of acts 1011 can include an act 1014 of receiving estimated cluster locations for the target cluster or oligonucleotides and neighboring clusters of oligonucleotides. In certain embodiments, the series of acts 1011 can include an act 1016 of determining, estimated point-spread-function values and estimated noise values corresponding to the target cluster of oligonucleotides. In one or more embodiments, the series of acts 1011 can include an act 1018 of determining an image matrix comprising equalizer coefficients. In some implementations, the series of acts 1011 can include an act 1020 of generating a base call for the target cluster of oligonucleotides by applying the image matrix to the image.
[0158] For example, the series of acts 1011 depicted in FIG. 10B (or the series of acts 1000 depicted in FIG. 10A) can include acts to perform any of the operations described in the following clauses:
CLAUSE 16. A computer-implemented method comprising: receiving, for a sequencing cycle, an image depicting one or more signal values for an expected response signal from a target cluster of oligonucleotides within a region of a nucleotide- sample slide; receiving, for the region of the nucleotide-sample slide, estimated cluster locations for the target cluster of oligonucleotides and neighboring clusters of oligonucleotides within the region; determining, for a channel, estimated point-spread-function values and estimated noise values based on the one or more signal values corresponding to the target cluster of oligonucleotides; determining, based on combining the estimated point-spread-function values, the estimated cluster locations, and the estimated noise values, an image matrix comprising equalizer coefficients; and generating a base call for the target cluster of oligonucleotides by applying the image matrix to the image depicting the one or more signal values for the expected response signal from the target cluster of oligonucleotides.
CLAUSE 17. The computer-implemented method of clause 16, further comprising receiving the estimated cluster locations by: receiving a patterned arrangement of the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides arranged according to a pattern within the region of the nucleotide-sample slide; or receiving a non-pattemed arrangement of the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides arranged without a pattern within the region of the nucleotide-sample slide.
CLAUSE 18. The computer-implemented method of clause 17, wherein the patterned arrangement of the estimated cluster locations comprises a grid of estimated nano well locations for nanowells comprising the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides.
CLAUSE 19. The computer-implemented method of clause 16, further comprising determining the image matrix by determining an image mask comprising the equalizer coefficients.
CLAUSE 20. The computer-implemented method of clause 16, further comprising determining the estimated point-spread-function values by: generating, from the image, a frequency domain matrix comprising values for a power spectral density of the one or more signal values of the expected response signal; generating, from the image, a spatial domain matrix comprising values for an up-sampled point-spread function by converting the power spectral density from a frequency domain to a spatial domain and combining comer regions from the spatial domain matrix; and determining the estimated point-spread-function values from the spatial domain matrix.
CLAUSE 21. The computer-implemented method of clause 16, further comprising determining the estimated point-spread-function values by: generating, from the image, a frequency domain matrix comprising values for a power spectral density of the one or more signal values of the expected response signal; up-sampling the frequency domain matrix to generate an up-sampled power spectral density of the one or more signal values; generating, from the image, a spatial domain matrix comprising values for an up-sampled point-spread function by converting the up-sampled power spectral density from a frequency domain to a spatial domain; generating an intermediate spatial domain matrix comprising values for an intermediate point-spread function by combining comer regions from the spatial domain matrix; and up-sampling the intermediate spatial domain matrix.
CLAUSE 22. The computer-implemented method of clause 20 or 21, further comprising generating the frequency domain matrix by: generating a convoluted matrix by combining the image of the region of the nucleotide- sample slide with a two-dimensional banning window; and applying a Fast Fourier Transform (FFT) to the convoluted matrix.
CLAUSE 23. The computer-implemented method of clause 20 or 21, further comprising: up-sampling the frequency domain matrix by an up-sampling factor; and up-sampling an arrangement of the estimated cluster locations within the region by the upsampling factor.
CLAUSE 24. The computer-implemented method of clause 21, further comprising applying a two-dimensional hamming window to the intermediate spatial domain matrix.
CLAUSE 25. The computer-implemented method of clause 16, further comprising determining the image matrix comprising equalizer coefficients by: modifying the estimated point-spread-function values; modifying the estimated noise values; and combining the modified estimated point-spread-function values and the modified estimated noise values to generate the image matrix.
CLAUSE 26. The computer-implemented method of clause 16, further comprising determining the image matrix comprising equalizer coefficients by: generating transposed delta estimated point-spread-function values by transposing a combination of the estimated point-spread-function values with a distribution function; generating square and symmetric estimated point-spread function values by combining the estimated point-spread-function values with transposed estimated point-spread-function values; generating square and symmetric transposed estimated noise values by combining the estimated noise values with transposed estimated noise values; and combining the transposed delta estimated point-spread-function values, the square and symmetric estimated point-spread-function values, and the square and symmetric transposed estimated noise values to generate the image matrix.
CLAUSE 27. The computer-implemented method of clause 26, wherein the distribution function comprises a Dirac delta function that sets the one or more signal values of the target cluster of oligonucleotides to one and sets additional signal values of the neighboring clusters of oligonucleotides to zero.
CLAUSE 28. The computer-implemented method of clause 16, further comprising: identifying, from the image, a set of subregions within the region of the nucleotide-sample slide;
generating, from the image matrix comprising equalizer coefficients, a set of subregion image matrices comprising subregion equalizer coefficients; and determining one or more additional base calls for additional target clusters of oligonucleotides within the set of subregions by applying the set of subregion image matrices to the image depicting one or more additional signal values for additional expected response signals from the additional target clusters of oligonucleotides within the set of subregions.
CLAUSE 29. The computer-implemented method of clause 28, wherein the subregion equalizer coefficients from a subregion image matrix of the set of subregion image matrices initially match the equalizer coefficients of the image matrix.
CLAUSE 30. The computer-implemented method of clause 16, further comprising: receiving, for the sequencing cycle, an additional image depicting one or more additional signal values for an additional expected response signal from the target cluster of oligonucleotides; determining, for an additional channel, additional estimated point-spread-function values and additional estimated noise values based on the one or more additional signal values corresponding to the target cluster of oligonucleotides; determining, based on combining the additional estimated point-spread-function values, the estimated cluster locations, and the additional estimated noise values within the additional channel, an additional image matrix comprising additional equalizer coefficients for the additional channel; and generating the base call for the target cluster of oligonucleotides by applying the additional image matrix to the additional image depicting the one or more additional signal values for the additional expected response signal from the target cluster of oligonucleotides.
[0159] The methods described herein can be used in conjunction with a variety of nucleic acid sequencing techniques. Particularly applicable techniques are those wherein nucleic acids are attached at fixed locations in an array such that their relative positions do not change and wherein the array is repeatedly imaged. Embodiments in which images are obtained in different color channels, for example, coinciding with different labels used to distinguish one nucleotide base type from another are particularly applicable. In some embodiments, the process to determine the nucleotide sequence of a target nucleic acid (i.e., a nucleic-acid polymer) can be an automated process. Preferred embodiments include sequencing-by-synthesis (SBS) techniques.
[0160] SBS techniques generally involve the enzymatic extension of a nascent nucleic acid strand through the iterative addition of nucleotides against a template strand. In traditional methods of SBS, a single nucleotide monomer may be provided to a target nucleotide in the presence of a polymerase in each delivery. However, in the methods described herein, more than one type of
nucleotide monomer can be provided to a target nucleic acid in the presence of a polymerase in a delivery.
[0161] The SBS techniques described below can utilize single-read sequencing or paired-end sequencing. In single-rea sequencing, the sequencing device reads a fragment from one end to another to generate the sequence of base pairs. In contrast, during paired-end sequencing, the sequencing device begins at one read, finishes reading a specified read length in the same direction and begins another read from the opposite end of the fragment.
[0162] SBS can utilize nucleotide monomers that have a terminator moiety or those that lack any terminator moieties. Methods utilizing nucleotide monomers lacking terminators include, for example, pyrosequencing and sequencing using y-phosphate-labeled nucleotides, as set forth in further detail below. In methods using nucleotide monomers lacking terminators, the number of nucleotides added in each cycle is generally variable and dependent upon the template sequence and the mode of nucleotide delivery. For SBS techniques that utilize nucleotide monomers having a terminator moiety, the terminator can be effectively irreversible under the sequencing conditions used as is the case for traditional Sanger sequencing which utilizes dideoxynucleotides, or the terminator can be reversible as is the case for sequencing methods developed by Solexa (now Illumina, Inc.).
[0163] SBS techniques can utilize nucleotide monomers that have a label moiety or those that lack a label moiety. Accordingly, incorporation events can be detected based on a characteristic of the label, such as fluorescence of the label; a characteristic of the nucleotide monomer such as molecular weight or charge; a byproduct of incorporation of the nucleotide, such as release of pyrophosphate; or the like. In embodiments, where two or more different nucleotides are present in a sequencing reagent, the different nucleotides can be distinguishable from each other, or alternatively, the two or more different labels can be the indistinguishable under the detection techniques being used. For example, the different nucleotides present in a sequencing reagent can have different labels and they can be distinguished using appropriate optics as exemplified by the sequencing methods developed by Solexa (now Illumina, Inc.).
[0164] Preferred embodiments include pyrosequencing techniques. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M. and Nyren, P. (1996) "Real-time DNA sequencing using detection of pyrophosphate release." Analytical Biochemistry 242(1), 84-9; Ronaghi, M. (2001) "Pyrosequencing sheds light on DNA sequencing." Genome Res. 11(1), 3-11; Ronaghi, M., Uhlen, M. and Nyren, P. (1998) “A sequencing method based on realtime pyrophosphate.” Science 281(5375), 363; U.S. Pat. No. 6,210,891; U.S. Pat. No. 6,258,568 and U.S. Pat. No. 6,274,320, the disclosures of which are incorporated herein by reference in their
entireties). In pyrosequencing, released PPi can be detected by being immediately converted to adenosine triphosphate (ATP) by ATP sulfurylase, and the level of ATP generated is detected via luciferase-produced photons. The nucleic acids to be sequenced can be attached to features in an array and the array can be imaged to capture the chemiluminescent signals that are produced due to incorporation of a nucleotides at the features of the array. An image can be obtained after the array is treated with a particular nucleotide type (e.g., A, T, C or G). Images obtained after addition of each nucleotide type will differ with regard to which features in the array are detected. These differences in the image reflect the different sequence content of the features on the array. However, the relative locations of each feature will remain unchanged in the images. The images can be stored, processed and analyzed using the methods set forth herein. For example, images obtained after treatment of the array with each different nucleotide type can be handled in the same way as exemplified herein for images obtained from different detection channels for reversible terminatorbased sequencing methods.
[0165] In another exemplary type of SBS, cycle sequencing is accomplished by stepwise addition of reversible terminator nucleotides containing, for example, a cleavable or photobleachable dye label as described, for example, in WO 04/018497 and U.S. Pat. No. 7,057,026, the disclosures of which are incorporated herein by reference. This approach is being commercialized by Solexa (now Illumina Inc.), and is also described in WO 91/06678 and WO 07/123,744, each of which is incorporated herein by reference. The availability of fluorescently labeled terminators in which both the termination can be reversed and the fluorescent label cleaved facilitates efficient cyclic reversible termination (CRT) sequencing. Polymerases can also be coengineered to efficiently incorporate and extend from these modified nucleotides.
[0166] Preferably in reversible terminator-based sequencing embodiments, the labels do not substantially inhibit extension under SBS reaction conditions. However, the detection labels can be removable, for example, by cleavage or degradation. Images can be captured following incorporation of labels into arrayed nucleic acid features. In particular embodiments, each cycle involves simultaneous delivery of four different nucleotide types to the array and each nucleotide type has a spectrally distinct label. Four images can then be obtained, each using a detection channel that is selective for one of the four different labels. Alternatively, different nucleotide types can be added sequentially, and an image of the array can be obtained between each addition step. In such embodiments, each image will show nucleic acid features that have incorporated nucleotides of a particular type. Different features will be present or absent in the different images due the different sequence content of each feature. However, the relative position of the features will remain unchanged in the images. Images obtained from such reversible terminator- SBS methods can be stored, processed and analyzed as set forth herein. Following the image capture step, labels can be
removed and reversible terminator moieties can be removed for subsequent cycles of nucleotide addition and detection. Removal of the labels after they have been detected in a particular cycle and prior to a subsequent cycle can provide the advantage of reducing background signal and crosstalk between cycles. Examples of useful labels and removal methods are set forth below.
[0167] In particular embodiments some or all of the nucleotide monomers can include reversible terminators. In such embodiments, reversible terminators/cleavable fluors can include fluor linked to the ribose moiety via a 3' ester linkage (Metzker, Genome Res. 15:1767-1776 (2005), which is incorporated herein by reference). Other approaches have separated the terminator chemistry from the cleavage of the fluorescence label (Ruparel et al., Proc Natl Acad Sci USA 102: 5932-7 (2005), which is incorporated herein by reference in its entirety). Ruparel et al described the development of reversible terminators that used a small 3' allyl group to block extension, but could easily be deblocked by a short treatment with a palladium catalyst. The fluorophore was attached to the base via a photocleavable linker that could easily be cleaved by a 30 second exposure to long wavelength UV light. Thus, either disulfide reduction or photocleavage can be used as a cleavable linker. Another approach to reversible termination is the use of natural termination that ensues after placement of a bulky dye on a dNTP. The presence of a charged bulky dye on the dNTP can act as an effective terminator through steric and/or electrostatic hindrance. The presence of one incorporation event prevents further incorporations unless the dye is removed. Cleavage of the dye removes the fluor and effectively reverses the termination. Examples of modified nucleotides are also described in U.S. Pat. No. 7,427,673, and U.S. Pat. No. 7,057,026, the disclosures of which are incorporated herein by reference in their entireties.
[0168] Additional exemplary SBS systems and methods which can be utilized with the methods and systems described herein are described in U.S. Patent Application Publication No. 2007/0166705, U.S. Patent Application Publication No. 2006/0188901, U.S. Pat. No. 7,057,026, U.S. Patent Application Publication No. 2006/0240439, U.S. Patent Application Publication No. 2006/0281109, PCT Publication No. WO 05/065814, U.S. Patent Application Publication No. 2005/0100900, PCT Publication No. WO 06/064199, PCT Publication No. WO 07/010,251, U.S. Patent Application Publication No. 2012/0270305 and U.S. Patent Application Publication No. 2013/0260372, the disclosures of which are incorporated herein by reference in their entireties.
[0169] Some embodiments can utilize detection of four different nucleotides using fewer than four different labels. For example, SBS can be performed utilizing methods and systems described in the incorporated materials of U.S. Patent Application Publication No. 2013/0079232. As a first example, a pair of nucleotide types can be detected at the same wavelength, but distinguished based on a difference in intensity for one member of the pair compared to the other, or based on a change to one member of the pair (e.g. via chemical modification, photochemical modification or physical
modification) that causes apparent signal to appear or disappear compared to the signal detected for the other member of the pair. As a second example, three of four different nucleotide types can be detected under particular conditions while a fourth nucleotide type lacks a label that is detectable under those conditions, or is minimally detected under those conditions (e.g., minimal detection due to background fluorescence, etc.). Incorporation of the first three nucleotide types into a nucleic acid can be determined based on presence of their respective signals and incorporation of the fourth nucleotide type into the nucleic acid can be determined based on absence or minimal detection of any signal. As a third example, one nucleotide type can include label(s) that are detected in two different channels, whereas other nucleotide types are detected in no more than one of the channels. The aforementioned three exemplary configurations are not considered mutually exclusive and can be used in various combinations. An exemplary embodiment that combines all three examples, is a fluorescent-based SBS method that uses a first nucleotide type that is detected in a first channel (e.g. dATP having a label that is detected in the first channel when excited by a first excitation wavelength), a second nucleotide type that is detected in a second channel (e.g. dCTP having a label that is detected in the second channel when excited by a second excitation wavelength), a third nucleotide type that is detected in both the first and the second channel (e.g. dTTP having at least one label that is detected in both channels when excited by the first and/or second excitation wavelength) and a fourth nucleotide type that lacks a label that is not, or minimally, detected in either channel (e.g. dGTP having no label).
[0170] Further, as described in the incorporated materials of U.S. Patent Application Publication No. 2013/0079232, sequencing data can be obtained using a single channel. In such so- called one-dye sequencing approaches, the first nucleotide type is labeled but the label is removed after the first image is generated, and the second nucleotide type is labeled only after a first image is generated. The third nucleotide type retains its label in both the first and second images, and the fourth nucleotide type remains unlabeled in both images.
[0171] Some embodiments can utilize sequencing by ligation techniques. Such techniques utilize DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides. The oligonucleotides typically have different labels that are correlated with the identity of a particular nucleotide in a sequence to which the oligonucleotides hybridize. As with other SBS methods, images can be obtained following treatment of an array of nucleic acid features with the labeled sequencing reagents. Each image will show nucleic acid features that have incorporated labels of a particular type. Different features will be present or absent in the different images due the different sequence content of each feature, but the relative position of the features will remain unchanged in the images. Images obtained from ligation-based sequencing methods can be stored, processed and analyzed as set forth herein. Exemplary SBS systems and methods
which can be utilized with the methods and systems described herein are described in U.S. Pat. No. 6,969,488, U.S. Pat. No. 6,172,218, and U.S. Pat. No. 6,306,597, the disclosures of which are incorporated herein by reference in their entireties.
[0172] Some embodiments can utilize nanopore sequencing (Deamer, D. W. & Akeson, M. "Nanopores and nucleic acids: prospects for ultrarapid sequencing." Trends Biotechnol. 18, 147- 151 (2000); Deamer, D. and D. Branton, "Characterization of nucleic acids by nanopore analysis". Acc. Chem. Res. 35:817-825 (2002); Li, J., M. Gershow, D. Stein, E. Brandin, and J. A. Golovchenko, "DNA molecules and configurations in a solid-state nanopore microscope" Nat. Mater. 2:611-615 (2003), the disclosures of which are incorporated herein by reference in their entireties). In such embodiments, the target nucleic acid passes through a nanopore. The nanopore can be a synthetic pore or biological membrane protein, such as a-hemolysin. As the target nucleic acid passes through the nanopore, each base-pair can be identified by measuring fluctuations in the electrical conductance of the pore. (U.S. Pat. No. 7,001,792; Soni, G. V. & Meller, "A. Progress toward ultrafast DNA sequencing using solid-state nanopores." Clin. Chem. 53, 1996-2001 (2007); Healy, K. "Nanopore-based single-molecule DNA analysis." Nanomed. 2, 459-481 (2007); Cockroft, S. L., Chu, J., Amorin, M. & Ghadiri, M. R. "A single-molecule nanopore device detects DNA polymerase activity with single-nucleotide resolution." J. Am. Chem. Soc. 130, 818-820 (2008), the disclosures of which are incorporated herein by reference in their entireties). Data obtained from nanopore sequencing can be stored, processed and analyzed as set forth herein. In particular, the data can be treated as an image in accordance with the exemplary treatment of optical images and other images that is set forth herein.
[0173] Some embodiments can utilize methods involving the real-time monitoring of DNA polymerase activity. Nucleotide incorporations can be detected through fluorescence resonance energy transfer (FRET) interactions between a fluorophore-bearing polymerase and y-phosphate- labeled nucleotides as described, for example, in U.S. Pat. No. 7,329,492 and U.S. Pat. No. 7,211,414 (each of which is incorporated herein by reference) or nucleotide incorporations can be detected with zero-mode waveguides as described, for example, in U.S. Pat. No. 7,315,019 (which is incorporated herein by reference) and using fluorescent nucleotide analogs and engineered polymerases as described, for example, in U.S. Pat. No. 7,405,281 and U.S. Patent Application Publication No. 2008/0108082 (each of which is incorporated herein by reference). The illumination can be restricted to a zeptoliter-scale volume around a surface-tethered polymerase such that incorporation of fluorescently labeled nucleotides can be observed with low background (Levene, M. J. et al. "Zero-mode waveguides for single-molecule analysis at high concentrations." Science 299, 682-686 (2003); Lundquist, P. M. et al. "Parallel confocal detection of single molecules in real time." Opt. Lett. 33, 1026-1028 (2008); Korlach, J. et al. "Selective aluminum
passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nano structures." Proc. Natl. Acad. Sci. USA 105, 1176-1181 (2008), the disclosures of which are incorporated herein by reference in their entireties). Images obtained from such methods can be stored, processed and analyzed as set forth herein.
[0174] Some SBS embodiments include detection of a proton released upon incorporation of a nucleotide into an extension product. For example, sequencing based on detection of released protons can use an electrical detector and associated techniques that are commercially available from Ion Torrent (Guilford, CT, a Life Technologies subsidiary) or sequencing methods and systems described in US 2009/0026082 Al; US 2009/0127589 Al; US 2010/0137143 Al; or US 2010/0282617 Al, each of which is incorporated herein by reference. Methods set forth herein for amplifying target nucleic acids using kinetic exclusion can be readily applied to substrates used for detecting protons. More specifically, methods set forth herein can be used to produce clonal populations of amplicons that are used to detect protons.
[0175] The above SBS methods can be advantageously carried out in multiplex formats such that multiple different target nucleic acids are manipulated simultaneously. In particular embodiments, different target nucleic acids can be treated in a common reaction vessel or on a surface of a particular substrate. This allows convenient delivery of sequencing reagents, removal of unreacted reagents and detection of incorporation events in a multiplex manner. In embodiments using surface-bound target nucleic acids, the target nucleic acids can be in an array format. In an array format, the target nucleic acids can be typically bound to a surface in a spatially distinguishable manner. The target nucleic acids can be bound by direct covalent attachment, attachment to a bead or other particle or binding to a polymerase or other molecule that is attached to the surface. The array can include a single copy of a target nucleic acid at each site (also referred to as a feature) or multiple copies having the same sequence can be present at each site or feature. Multiple copies can be produced by amplification methods such as, bridge amplification or emulsion PCR as described in further detail below.
[0176] The methods set forth herein can use arrays having features at any of a variety of densities including, for example, at least about 10 features/cm2, 100 features/cm2, 500 features/cm2, 1,000 features/cm2, 5,000 features/cm2, 10,000 features/cm2, 50,000 features/cm2, 100,000 features/cm2, 1,000,000 features/cm2, 5,000,000 features/cm2, or higher.
[0177] An advantage of the methods set forth herein is that they provide for rapid and efficient detection of a plurality of target nucleic acid in parallel. Accordingly, the present disclosure provides integrated systems capable of preparing and detecting nucleic acids using techniques known in the art such as those exemplified above. Thus, an integrated system of the present disclosure can include fluidic components capable of delivering amplification reagents and/or
sequencing reagents to one or more immobilized DNA fragments, the system comprising components such as pumps, valves, reservoirs, fluidic lines and the like. A flow cell can be configured and/or used in an integrated system for detection of target nucleic acids. Exemplary flow cells are described, for example, in US 2010/0111768 Al and US Ser. No. 13/273,666, each of which is incorporated herein by reference. As exemplified for flow cells, one or more of the fluidic components of an integrated system can be used for an amplification method and for a detection method. Taking a nucleic acid sequencing embodiment as an example, one or more of the fluidic components of an integrated system can be used for an amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method such as those exemplified above. Alternatively, an integrated system can include separate fluidic systems to carry out amplification methods and to carry out detection methods. Examples of integrated sequencing systems that are capable of creating amplified nucleic acids and also determining the sequence of the nucleic acids include, without limitation, the MiSeq™ platform (Illumina, Inc., San Diego, CA) and devices described in US Ser. No. 13/273,666, which is incorporated herein by reference.
[0178] The sequencing system described above sequences nucleic-acid polymers present in samples received by a sequencing device. As defined herein, "sample" and its derivatives, is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target. In some embodiments, the sample comprises DNA, RNA, PNA, LNA, chimeric or hybrid forms of nucleic acids. The sample can include any biological, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids. The term also includes any isolated nucleic acid sample such a genomic DNA, fresh- frozen or formalin-fixed paraffin-embedded nucleic acid specimen. It is also envisioned that the sample can be from a single individual, a collection of nucleic acid samples from genetically related members, nucleic acid samples from genetically unrelated members, nucleic acid samples (matched) from a single individual such as a tumor sample and normal tissue sample, or sample from a single source that contains two distinct forms of genetic material such as maternal and fetal DNA obtained from a maternal subject, or the presence of contaminating bacterial DNA in a sample that contains plant or animal DNA. In some embodiments, the source of nucleic acid material can include nucleic acids obtained from a newborn, for example as typically used for newborn screening.
[0179] The nucleic acid sample can include high molecular weight material such as genomic DNA (gDNA). The sample can include low molecular weight material such as nucleic acid molecules obtained from FFPE or archived DNA samples. In another embodiment, low molecular weight material includes enzymatically or mechanically fragmented DNA. The sample can include cell-free circulating DNA. In some embodiments, the sample can include nucleic acid molecules obtained from biopsies, tumors, scrapings,
swabs, blood, mucus, urine, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples. In some embodiments, the sample can be an epidemiological, agricultural, forensic or pathogenic sample. In some embodiments, the sample can include nucleic acid molecules obtained from an animal such as a human or mammalian source. In another embodiment, the sample can include nucleic acid molecules obtained from a non-mammalian source such as a plant, bacteria, virus or fungus. In some embodiments, the source of the nucleic acid molecules may be an archived or extinct sample or species.
[0180] Further, the methods and compositions disclosed herein may be useful to amplify a nucleic acid sample having low-quality nucleic acid molecules, such as degraded and/or fragmented genomic DNA from a forensic sample. In one embodiment, forensic samples can include nucleic acids obtained from a crime scene, nucleic acids obtained from a missing persons DNA database, nucleic acids obtained from a laboratory associated with a forensic investigation or include forensic samples obtained by law enforcement agencies, one or more military services or any such personnel. The nucleic acid sample may be a purified sample or a crude DNA containing lysate, for example derived from a buccal swab, paper, fabric or other substrate that may be impregnated with saliva, blood, or other bodily fluids. As such, in some embodiments, the nucleic acid sample may comprise low amounts of, or fragmented portions of DNA, such as genomic DNA. In some embodiments, target sequences can be present in one or more bodily fluids including but not limited to, blood, sputum, plasma, semen, urine and serum. In some embodiments, target sequences can be obtained from hair, skin, tissue samples, autopsy or remains of a victim. In some embodiments, nucleic acids including one or more target sequences can be obtained from a deceased animal or human. In some embodiments, target sequences can include nucleic acids obtained from non-human DNA such a microbial, plant or entomological DNA. In some embodiments, target sequences or amplified target sequences are directed to purposes of human identification. In some embodiments, the disclosure relates generally to methods for identifying characteristics of a forensic sample. In some embodiments, the disclosure relates generally to human identification methods using one or more target specific primers disclosed herein or one or more target specific primers designed using the primer design criteria outlined herein. In one embodiment, a forensic or human identification sample containing at least one target sequence can be amplified using any one or more of the target-specific primers disclosed herein or using the primer criteria outlined herein.
[0181] The components of the blind-equalizer sequencing system 106 can include software, hardware, or both. For example, the components of the blind-equalizer sequencing system 106 can include one or more instructions stored on a non-transitory computer readable storage medium and
executable by processors of one or more computing devices (e.g., the user client device 108). When executed by the one or more processors, the computer-executable instructions of the blind-equalizer sequencing system 106 can cause the computing devices to perform the failure source identification methods described herein. Alternatively, the components of the blind-equalizer sequencing system 106 can comprise hardware, such as special purpose processing devices to perform a certain function or group of functions. Additionally, or alternatively, the components of the blind-equalizer sequencing system 106 can include a combination of computer-executable instructions and hardware.
[0182] Furthermore, the components of the blind-equalizer sequencing system 106 performing the functions described herein with respect to the blind-equalizer sequencing system 106 may, for example, be implemented as part of a stand-alone application, as a module of an application, as a plug-in for applications, as a library function or functions that may be called by other applications, and/or as a cloud-computing model. Thus, components of the blind-equalizer sequencing system 106 may be implemented as part of a stand-alone application on a personal computing device or a mobile device. Additionally, or alternatively, the components of the blind-equalizer sequencing system 106 may be implemented in any application that provides sequencing services including, but not limited to Illumina BaseSpace, Illumina DRAGEN, or Illumina TruSight software. “Illumina,” “BaseSpace,” “DRAGEN,” and “TruSight,” are either registered trademarks or trademarks of Illumina, Inc. in the United States and/or other countries.
[0183] Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non- transitory computer readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.
[0184] Computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computerexecutable instructions are non-transitory computer-readable storage media (devices). Computer- readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the disclosure can comprise at least two distinctly
different kinds of computer-readable media: non-transitory computer-readable storage media (devices) and transmission media.
[0185] Non-transitory computer-readable storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (SSDs) (e.g., based on RAM), Flash memory, phasechange memory (PCM), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
[0186] A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer- readable media.
[0187] Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to non-transitory computer-readable storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a NIC), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. Thus, it should be understood that non-transitory computer- readable storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
[0188] Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. In some embodiments, computer-executable instructions are executed on a general-purpose computer to turn the general-purpose computer into a special purpose computer implementing elements of the disclosure. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the
described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
[0189] Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
[0190] Embodiments of the present disclosure can also be implemented in cloud computing environments. In this description, “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources. For example, cloud computing can be employed in the marketplace to offer ubiquitous and convenient on-demand access to the shared pool of configurable computing resources. The shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
[0191] A cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model can also expose various service models, such as, for example, Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (laaS). A cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a “cloud-computing environment” is an environment in which cloud computing is employed.
[0192] FIG. 11 illustrates a block diagram of a computing device 1100 that may be configured to perform one or more of the processes described above. One will appreciate that one or more computing devices such as the computing device 1100 may implement the blind-equalizer sequencing system 106 and the sequencing system 104. As shown by FIG. 11 , the computing device 1100 can comprise a processor 1102, a memory 1104, a storage device 1106, an I/O interface 1108, and a communication interface 1110, which may be communicatively coupled by way of a communication infrastructure 1112. In certain embodiments, the computing device 1100 can
include fewer or more components than those shown in FIG. 11. The following paragraphs describe components of the computing device 1100 shown in FIG. 11 in additional detail.
[0193] In one or more embodiments, the processor 1102 includes hardware for executing instructions, such as those making up a computer program. As an example, and not by way of limitation, to execute instructions for dynamically modifying workflows, the processor 1102 may retrieve (or fetch) the instructions from an internal register, an internal cache, the memory 1104, or the storage device 1106 and decode and execute them. The memory 1104 may be a volatile or nonvolatile memory used for storing data, metadata, and programs for execution by the processor(s). The storage device 1106 includes storage, such as a hard disk, flash disk drive, or other digital storage device, for storing data or instructions for performing the methods described herein.
[0194] The I/O interface 1108 allows a user to provide input to, receive output from, and otherwise transfer data to and receive data from computing device 1100. The I/O interface 1108 may include a mouse, a keypad or a keyboard, a touch screen, a camera, an optical scanner, network interface, modem, other known I/O devices or a combination of such I/O interfaces. The I/O interface 1108 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain embodiments, the I/O interface 1108 is configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
[0195] The communication interface 1110 can include hardware, software, or both. In any event, the communication interface 1110 can provide one or more interfaces for communication (such as, for example, packet-based communication) between the computing device 1100 and one or more other computing devices or networks. As an example, and not by way of limitation, the communication interface 1110 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI.
[0196] Additionally, the communication interface 1110 may facilitate communications with various types of wired or wireless networks. The communication interface 1110 may also facilitate communications using various communication protocols. The communication infrastructure 1112 may also include hardware, software, or both that couples components of the computing device 1100 to each other. For example, the communication interface 1110 may use one or more networks and/or protocols to enable a plurality of computing devices connected by a particular infrastructure to communicate with each other to perform one or more aspects of the processes described herein. To illustrate, the sequencing process can allow a plurality of devices (e.g., a client device,
sequencing device, and server device(s)) to exchange information such as sequencing data and error notifications.
[0197] In the foregoing specification, the present disclosure has been described with reference to specific exemplary embodiments thereof. Various embodiments and aspects of the present disclosure(s) are described with reference to details discussed herein, and the accompanying drawings illustrate the various embodiments. The description above and drawings are illustrative of the disclosure and are not to be construed as limiting the disclosure. Numerous specific details are described to provide a thorough understanding of various embodiments of the present disclosure.
[0198] The present disclosure may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. For example, the methods described herein may be performed with less or more steps/acts or the steps/acts may be performed in differing orders. Additionally, the steps/acts described herein may be repeated or performed in parallel with one another or in parallel with different instances of the same or similar steps/acts. The scope of the present application is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims
1. A system comprising: at least one processor; and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to: receive, for a sequencing cycle, signal values for an expected response signal from at least a cluster of oligonucleotides within a region of a nucleotide-sample slide; determine, for a channel, estimated point-spread-function values based on the signal values corresponding to at least the cluster of oligonucleotides; determine estimated noise values within the channel; determine, based on combining the estimated point-spread-function values and the estimated noise values within the channel, equalizer coefficients that compensate for the estimated point-spread-function values and the estimated noise values with respect to the expected response signal from at least the cluster of oligonucleotides; and determine a base call for at least the cluster of oligonucleotides utilizing the equalizer coefficients.
2. The system of claim 1, wherein the signal values from the cluster of oligonucleotides correspond to the estimated point-spread-function values combined with the expected response signal from the cluster of oligonucleotides summed with the estimated noise values.
3. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to determine the equalizer coefficients by adjusting the equalizer coefficients to minimize a mean squared error between one or more expected response signals corresponding to a set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide and one or more estimated response signals across the set of neighboring clusters of oligonucleotides within the region of the nucleotide-sample slide.
4. The system of claim 3, wherein the estimated response signal corresponds to the equalizer coefficients combined with the signal values from at least the cluster of oligonucleotides.
5. The system of claim 1, further comprising instructions that, when executed by the at least one processor cause the system to determine the estimated point-spread-function values by: receiving the signal values of the expected response signal from at least the cluster of oligonucleotides within a region of the nucleotide-sample slide in a spatial domain; converting the signal values of the expected response signal from at least the cluster of oligonucleotides to a frequency domain;
determining a power spectral density of the signal values of the expected response signal in the frequency domain; and converting the power spectral density of the expected response signal the frequency domain to the spatial domain by applying an inverse fast Fourier transformation (IFFT) to the power spectral density of the expected response signal.
6. The system of claim 5, further comprising enforcing Hermitian symmetry for the expected response signal in the frequency domain.
7. The system of claim 5, wherein the power spectral density is an average measurement of energy within a range of spectral bands.
8. The system of claim 5, wherein converting the power spectral density of the expected response signal in the frequency domain to the spatial domain further comprises taking a square root of the power spectral density of the expected response signal in the frequency domain.
9. The system of claim 5, wherein the expected response signal comprises a measurement of an amplitude of the expected response signal.
10. The system of claim 1, wherein the estimated noise values comprise independent identically distributed Gaussian noise.
11. The system of claim 1, wherein the channel is a minimum phase response channel.
12. The system of claim 1, further comprising instructions that, when executed by the at least one processor cause the system to: determine an estimated response signal from at least the cluster of oligonucleotides; and based on the estimated response signal, determine a base call for at least the cluster of oligonucleotides.
13. The system of claim 1, further comprising instructions that, when executed by the at least one processor cause the system to: initialize an equalizer utilizing the equalizer coefficients; during a subsequent sequencing cycle, determine an additional estimated response signal from at least an additional cluster of oligonucleotides; and based on the additional estimated response signal, update the equalizer coefficients.
14. The system of claim 1, further comprising instructions that, when executed by the at least one processor cause the system to: determine, for the channel, target estimated point-spread-function values based on target signal values corresponding to a target cluster of oligonucleotides; determine, based on combining the target estimated point-spread-function values and the estimated noise values within the channel, target equalizer coefficients that compensate for the
target estimated point-spread-function values and the estimated noise values with respect to a target expected response signal from the target cluster of oligonucleotides; and determine the base call for the target cluster of oligonucleotides utilizing the target equalizer coefficients.
15. The system of claim 1, further comprising instructions that, when executed by the at least one processor cause the system to: initialize an equalizer of a sequencing device utilizing the equalizer coefficients; during a subsequent sequencing cycle on the sequencing device, determine an additional estimated response signal from at least an additional cluster of oligonucleotides; and based on the additional estimated response signal, modify the equalizer coefficients for the equalizer of the sequencing device.
16. A system comprising: at least one processor; and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to: receive, for a sequencing cycle, an image depicting one or more signal values for an expected response signal from a target cluster of oligonucleotides within a region of a nucleotide-sample slide; receive, for the region of the nucleotide-sample slide, estimated cluster locations for the target cluster of oligonucleotides and neighboring clusters of oligonucleotides within the region; determine, for a channel, estimated point-spread-function values and estimated noise values based on the one or more signal values corresponding to the target cluster of oligonucleotides; determine, based on combining the estimated point-spread-function values, the estimated cluster locations, and the estimated noise values, an image matrix comprising equalizer coefficients; and generate a base call for the target cluster of oligonucleotides by applying the image matrix to the image depicting the one or more signal values for the expected response signal from the target cluster of oligonucleotides.
17. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to receive the estimated cluster locations by: receiving a patterned arrangement of the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides arranged according to a pattern within the region of the nucleotide-sample slide; or
receiving a non-pattemed arrangement of the estimated cluster locations for the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides arranged without a pattern within the region of the nucleotide-sample slide.
18. The system of claim 17, wherein the patterned arrangement of the estimated cluster locations comprises a grid of estimated nanowell locations for nanowells comprising the target cluster of oligonucleotides and the neighboring clusters of oligonucleotides.
19. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to determine the image matrix by determining an image mask comprising the equalizer coefficients.
20. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to determine the estimated point-spread-function values by: generating, from the image, a frequency domain matrix comprising values for a power spectral density of the one or more signal values of the expected response signal; generating, from the image, a spatial domain matrix comprising values for an up-sampled point-spread function by converting the power spectral density from a frequency domain to a spatial domain and combining comer regions from the spatial domain matrix; and determining the estimated point-spread-function values from the spatial domain matrix.
21. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to determine the estimated point-spread-function values by: generating, from the image, a frequency domain matrix comprising values for a power spectral density of the one or more signal values of the expected response signal; up-sampling the frequency domain matrix to generate an up-sampled power spectral density of the one or more signal values; generating, from the image, a spatial domain matrix comprising values for an up-sampled point-spread function by converting the up-sampled power spectral density from a frequency domain to a spatial domain; generating an intermediate spatial domain matrix comprising values for an intermediate point-spread function by combining comer regions from the spatial domain matrix; and up-sampling the intermediate spatial domain matrix.
22. The system of claim 20 or 21, further comprising instructions that, when executed by the at least one processor, cause the system to generate the frequency domain matrix by: generating a convoluted matrix by combining the image of the region of the nucleotide- sample slide with a two-dimensional banning window; and
applying a Fast Fourier Transform (FFT) to the convoluted matrix.
23. The system of claim 20 or 21, further comprising instructions that, when executed by the at least one processor, cause the system to: up-sample the frequency domain matrix by an up-sampling factor; and up-sample an arrangement of the estimated cluster locations within the region by the upsampling factor.
24. The system of claim 21, further comprising instructions that, when executed by the at least one processor, cause the system to apply a two-dimensional hamming window to the intermediate spatial domain matrix.
25. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to determine the image matrix comprising equalizer coefficients by: modifying the estimated point-spread-function values; modifying the estimated noise values; and combining the modified estimated point-spread-function values and the modified estimated noise values to generate the image matrix.
26. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to determine the image matrix comprising equalizer coefficients by: generating transposed delta estimated point-spread-function values by transposing a combination of the estimated point-spread-function values with a distribution function; generating square and symmetric estimated point-spread function values by combining the estimated point-spread-function values with transposed estimated point-spread-function values; generating square and symmetric transposed estimated noise values by combining the estimated noise values with transposed estimated noise values; and combining the transposed delta estimated point-spread-function values, the square and symmetric estimated point-spread-function values, and the square and symmetric transposed estimated noise values to generate the image matrix.
27. The system of claim 26, wherein the distribution function comprises a Dirac delta function that sets the one or more signal values of the target cluster of oligonucleotides to one and sets additional signal values of the neighboring clusters of oligonucleotides to zero.
28. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to: identify, from the image, a set of subregions within the region of the nucleotide-sample slide;
generate, from the image matrix comprising equalizer coefficients, a set of subregion image matrices comprising subregion equalizer coefficients; and determine one or more additional base calls for additional target clusters of oligonucleotides within the set of subregions by applying the set of subregion image matrices to the image depicting one or more additional signal values for additional expected response signals from the additional target clusters of oligonucleotides within the set of subregions.
29. The system of claim 28, wherein the subregion equalizer coefficients from a subregion image matrix of the set of subregion image matrices initially match the equalizer coefficients of the image matrix.
30. The system of claim 16, further comprising instructions that, when executed by the at least one processor, cause the system to: receive, for the sequencing cycle, an additional image depicting one or more additional signal values for an additional expected response signal from the target cluster of oligonucleotides; determine, for an additional channel, additional estimated point-spread-function values and additional estimated noise values based on the one or more additional signal values corresponding to the target cluster of oligonucleotides; determine, based on combining the additional estimated point-spread-function values, the estimated cluster locations, and the additional estimated noise values within the additional channel, an additional image matrix comprising additional equalizer coefficients for the additional channel; and determine the base call for the target cluster of oligonucleotides by applying the additional image matrix to the additional image depicting the one or more additional signal values for the additional expected response signal from the target cluster of oligonucleotides.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463649210P | 2024-05-17 | 2024-05-17 | |
| US63/649,210 | 2024-05-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025240924A1 true WO2025240924A1 (en) | 2025-11-20 |
Family
ID=96013169
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/029859 Pending WO2025240924A1 (en) | 2024-05-17 | 2025-05-16 | Blind equalization systems for base calling applications |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025240924A1 (en) |
Citations (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1991006678A1 (en) | 1989-10-26 | 1991-05-16 | Sri International | Dna sequencing |
| US6172218B1 (en) | 1994-10-13 | 2001-01-09 | Lynx Therapeutics, Inc. | Oligonucleotide tags for sorting and identification |
| US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
| US6258568B1 (en) | 1996-12-23 | 2001-07-10 | Pyrosequencing Ab | Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation |
| US6274320B1 (en) | 1999-09-16 | 2001-08-14 | Curagen Corporation | Method of sequencing a nucleic acid |
| US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
| WO2004018497A2 (en) | 2002-08-23 | 2004-03-04 | Solexa Limited | Modified nucleotides for polynucleotide sequencing |
| US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
| WO2005065814A1 (en) | 2004-01-07 | 2005-07-21 | Solexa Limited | Modified molecular arrays |
| US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
| US7001792B2 (en) | 2000-04-24 | 2006-02-21 | Eagle Research & Development, Llc | Ultra-fast nucleic acid sequencing device and a method for making and using the same |
| US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
| WO2006064199A1 (en) | 2004-12-13 | 2006-06-22 | Solexa Limited | Improved method of nucleotide detection |
| US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
| US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
| WO2007010251A2 (en) | 2005-07-20 | 2007-01-25 | Solexa Limited | Preparation of templates for nucleic acid sequencing |
| US7211414B2 (en) | 2000-12-01 | 2007-05-01 | Visigen Biotechnologies, Inc. | Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity |
| WO2007123744A2 (en) | 2006-03-31 | 2007-11-01 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
| US7315019B2 (en) | 2004-09-17 | 2008-01-01 | Pacific Biosciences Of California, Inc. | Arrays of optical confinements and uses thereof |
| US7329492B2 (en) | 2000-07-07 | 2008-02-12 | Visigen Biotechnologies, Inc. | Methods for real-time single molecule sequence determination |
| US20080108082A1 (en) | 2006-10-23 | 2008-05-08 | Pacific Biosciences Of California, Inc. | Polymerase enzymes and reagents for enhanced nucleic acid sequencing |
| US7405281B2 (en) | 2005-09-29 | 2008-07-29 | Pacific Biosciences Of California, Inc. | Fluorescent nucleotide analogs and uses therefor |
| US20090026082A1 (en) | 2006-12-14 | 2009-01-29 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
| US20090127589A1 (en) | 2006-12-14 | 2009-05-21 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
| US20100137143A1 (en) | 2008-10-22 | 2010-06-03 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
| US20100282617A1 (en) | 2006-12-14 | 2010-11-11 | Ion Torrent Systems Incorporated | Methods and apparatus for detecting molecular interactions using fet arrays |
| US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
| US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
| US20160110499A1 (en) * | 2014-10-21 | 2016-04-21 | Life Technologies Corporation | Methods, systems, and computer-readable media for blind deconvolution dephasing of nucleic acid sequencing data |
| WO2021226285A1 (en) * | 2020-05-05 | 2021-11-11 | Illumina, Inc. | Equalization-based image processing and spatial crosstalk attenuator |
-
2025
- 2025-05-16 WO PCT/US2025/029859 patent/WO2025240924A1/en active Pending
Patent Citations (35)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1991006678A1 (en) | 1989-10-26 | 1991-05-16 | Sri International | Dna sequencing |
| US6172218B1 (en) | 1994-10-13 | 2001-01-09 | Lynx Therapeutics, Inc. | Oligonucleotide tags for sorting and identification |
| US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
| US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
| US6258568B1 (en) | 1996-12-23 | 2001-07-10 | Pyrosequencing Ab | Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation |
| US20050100900A1 (en) | 1997-04-01 | 2005-05-12 | Manteia Sa | Method of nucleic acid amplification |
| US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
| US6274320B1 (en) | 1999-09-16 | 2001-08-14 | Curagen Corporation | Method of sequencing a nucleic acid |
| US7001792B2 (en) | 2000-04-24 | 2006-02-21 | Eagle Research & Development, Llc | Ultra-fast nucleic acid sequencing device and a method for making and using the same |
| US7329492B2 (en) | 2000-07-07 | 2008-02-12 | Visigen Biotechnologies, Inc. | Methods for real-time single molecule sequence determination |
| US7211414B2 (en) | 2000-12-01 | 2007-05-01 | Visigen Biotechnologies, Inc. | Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity |
| US7057026B2 (en) | 2001-12-04 | 2006-06-06 | Solexa Limited | Labelled nucleotides |
| US7427673B2 (en) | 2001-12-04 | 2008-09-23 | Illumina Cambridge Limited | Labelled nucleotides |
| US20060188901A1 (en) | 2001-12-04 | 2006-08-24 | Solexa Limited | Labelled nucleotides |
| WO2004018497A2 (en) | 2002-08-23 | 2004-03-04 | Solexa Limited | Modified nucleotides for polynucleotide sequencing |
| US20070166705A1 (en) | 2002-08-23 | 2007-07-19 | John Milton | Modified nucleotides |
| US20060240439A1 (en) | 2003-09-11 | 2006-10-26 | Smith Geoffrey P | Modified polymerases for improved incorporation of nucleotide analogues |
| WO2005065814A1 (en) | 2004-01-07 | 2005-07-21 | Solexa Limited | Modified molecular arrays |
| US7315019B2 (en) | 2004-09-17 | 2008-01-01 | Pacific Biosciences Of California, Inc. | Arrays of optical confinements and uses thereof |
| WO2006064199A1 (en) | 2004-12-13 | 2006-06-22 | Solexa Limited | Improved method of nucleotide detection |
| US20060281109A1 (en) | 2005-05-10 | 2006-12-14 | Barr Ost Tobias W | Polymerases |
| WO2007010251A2 (en) | 2005-07-20 | 2007-01-25 | Solexa Limited | Preparation of templates for nucleic acid sequencing |
| US7405281B2 (en) | 2005-09-29 | 2008-07-29 | Pacific Biosciences Of California, Inc. | Fluorescent nucleotide analogs and uses therefor |
| WO2007123744A2 (en) | 2006-03-31 | 2007-11-01 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
| US20100111768A1 (en) | 2006-03-31 | 2010-05-06 | Solexa, Inc. | Systems and devices for sequence by synthesis analysis |
| US20080108082A1 (en) | 2006-10-23 | 2008-05-08 | Pacific Biosciences Of California, Inc. | Polymerase enzymes and reagents for enhanced nucleic acid sequencing |
| US20090127589A1 (en) | 2006-12-14 | 2009-05-21 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
| US20090026082A1 (en) | 2006-12-14 | 2009-01-29 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
| US20100282617A1 (en) | 2006-12-14 | 2010-11-11 | Ion Torrent Systems Incorporated | Methods and apparatus for detecting molecular interactions using fet arrays |
| US20100137143A1 (en) | 2008-10-22 | 2010-06-03 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
| US20120270305A1 (en) | 2011-01-10 | 2012-10-25 | Illumina Inc. | Systems, methods, and apparatuses to image a sample for biological or chemical analysis |
| US20130079232A1 (en) | 2011-09-23 | 2013-03-28 | Illumina, Inc. | Methods and compositions for nucleic acid sequencing |
| US20130260372A1 (en) | 2012-04-03 | 2013-10-03 | Illumina, Inc. | Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing |
| US20160110499A1 (en) * | 2014-10-21 | 2016-04-21 | Life Technologies Corporation | Methods, systems, and computer-readable media for blind deconvolution dephasing of nucleic acid sequencing data |
| WO2021226285A1 (en) * | 2020-05-05 | 2021-11-11 | Illumina, Inc. | Equalization-based image processing and spatial crosstalk attenuator |
Non-Patent Citations (15)
| Title |
|---|
| ACC. CHEM. RES., vol. 35, 2002, pages 817 - 825 |
| COCKROFT, S. L.CHU, J.AMORIN, M.GHADIRI, M. R.: "A single-molecule nanopore device detects DNA polymerase activity with single-nucleotide resolution", J. AM. CHEM. SOC., vol. 130, 2008, pages 818 - 820, XP055097434, DOI: 10.1021/ja077082c |
| DEAMER, D. W.AKESON, M.: "Nanopores and nucleic acids: prospects for ultrarapid sequencing", TRENDS BIOTECHNOL, vol. 18, 2000, pages 147 - 151, XP004194002, DOI: 10.1016/S0167-7799(00)01426-8 |
| HEALY, K.: "Nanopore-based single-molecule DNA analysis", NANOMED, vol. 2, 2007, pages 459 - 481, XP009111262, DOI: 10.2217/17435889.2.4.459 |
| KORLACH, J. ET AL.: "Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nano structures", PROC. NATL. ACAD. SCI., vol. 105, 2008, pages 1176 - 1181 |
| LEVENE, M. J. ET AL.: "Zero-mode waveguides for single-molecule analysis at high concentrations", SCIENCE, vol. 299, 2003, pages 682 - 686, XP002341055, DOI: 10.1126/science.1079700 |
| LI, J.M. GERSHOWD. STEINE. BRANDINJ. A. GOLOVCHENKO: "DNA molecules and configurations in a solid-state nanopore microscope", NAT. MATER., vol. 2, 2003, pages 611 - 615, XP009039572, DOI: 10.1038/nmat965 |
| LUNDQUIST, P. M. ET AL.: "Parallel confocal detection of single molecules in real time", OPT. LETT., vol. 33, 2008, pages 1026 - 1028, XP001522593, DOI: 10.1364/OL.33.001026 |
| METZKER, GENOME RES, vol. 15, 2005, pages 1767 - 1776 |
| RONAGHI, M.: "Pyrosequencing sheds light on DNA sequencing", GENOME RES, vol. 11, no. 1, 2001, pages 3 - 11, XP000980886, DOI: 10.1101/gr.11.1.3 |
| RONAGHI, M.KARAMOHAMED, S.PETTERSSON, B.UHLEN, M.NYREN, P.: "Real-time DNA sequencing using detection of pyrophosphate release", ANALYTICAL BIOCHEMISTRY, vol. 242, no. 1, 1996, pages 84 - 9, XP002388725, DOI: 10.1006/abio.1996.0432 |
| RONAGHI, M.UHLEN, M.NYREN, P.: "A sequencing method based on real-time pyrophosphate", SCIENCE, vol. 281, no. 5375, 1998, pages 363, XP002135869, DOI: 10.1126/science.281.5375.363 |
| RUPAREL ET AL., PROC NATL ACAD SCI, vol. 102, 2005, pages 5932 - 7 |
| SONI, G. V.MELLER: "A. Progress toward ultrafast DNA sequencing using solid-state nanopores", CLIN. CHEM., vol. 53, 2007, pages 1996 - 2001, XP055076185, DOI: 10.1373/clinchem.2007.091231 |
| VASEEGHI SAEED: "Channel equalizationand blind deconvolution", ADVANCED DIGITAL SIGNAL PROCESSING AND NOISE REDUCTION, 1 January 2000 (2000-01-01), pages 416 - 466, XP093303136, Retrieved from the Internet <URL:https://onlinelibrary.wiley.com/doi/pdf/10.1002/0470841621.ch15> * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP2023547298A (en) | System and method for cluster-wise intensity correction and base calling | |
| JP2025170247A (en) | Nucleotide for sequencing - Machine learning model for detecting bubbles in specimen slides | |
| US20230343415A1 (en) | Generating cluster-specific-signal corrections for determining nucleotide-base calls | |
| US20220415442A1 (en) | Signal-to-noise-ratio metric for determining nucleotide-base calls and base-call quality | |
| WO2024026356A1 (en) | Rapid single-cell multiomics processing using an executable file | |
| WO2025240924A1 (en) | Blind equalization systems for base calling applications | |
| US20240266003A1 (en) | Determining and removing inter-cluster light interference | |
| US20250111898A1 (en) | Tracking and modifying cluster location on nucleotide-sample slides in real time | |
| US20230410944A1 (en) | Calibration sequences for nucelotide sequencing | |
| US20250210137A1 (en) | Directly determining signal-to-noise-ratio metrics for accelerated convergence in determining nucleotide-base calls and base-call quality | |
| US20240127906A1 (en) | Detecting and correcting methylation values from methylation sequencing assays | |
| US20230368866A1 (en) | Adaptive neural network for nucelotide sequencing | |
| WO2025193747A1 (en) | Machine-learning models for ordering and expediting sequencing tasks or corresponding nucleotide-sample slides | |
| WO2024206848A1 (en) | Tandem repeat genotyping | |
| WO2025174774A1 (en) | Determining offline corrections for sequence specific errors caused by low complexity nucleotide sequences | |
| WO2025072833A1 (en) | Predicting insert lengths using primary analysis metrics | |
| WO2025250996A2 (en) | Call generation and recalibration models for implementing personalized diploid reference haplotypes in genotype calling |