[go: up one dir, main page]

WO2025240241A1 - Modifying sequencing cycles during a sequencing run to meet customized coverage estimations for a target genomic region - Google Patents

Modifying sequencing cycles during a sequencing run to meet customized coverage estimations for a target genomic region

Info

Publication number
WO2025240241A1
WO2025240241A1 PCT/US2025/028563 US2025028563W WO2025240241A1 WO 2025240241 A1 WO2025240241 A1 WO 2025240241A1 US 2025028563 W US2025028563 W US 2025028563W WO 2025240241 A1 WO2025240241 A1 WO 2025240241A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
target
genomic
read
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/028563
Other languages
French (fr)
Inventor
Paul Smith
Michael Carney
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Illumina Inc
Original Assignee
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina Inc filed Critical Illumina Inc
Publication of WO2025240241A1 publication Critical patent/WO2025240241A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • existing sequencing systems predict individual nucleobases within sequences by using conventional Sanger sequencing or sequencing-by-synthesis (SBS) methods.
  • SBS sequencing-by-synthesis
  • existing sequencing systems can monitor many thousands to billions of oligonucleotides being synthesized in parallel from templates to predict nucleobase calls for growing nucleotide reads.
  • a camera captures images of irradiated fluorescent tags incorporated into oligonucleotides.
  • some existing sequencing systems determine nucleobase calls for nucleotide reads corresponding to respective clusters of oligonucleotides on a flow cell (or other nucleotide-sample substrate) for a given sequencing run. For example, some existing sequencing systems utilize sequencing-data-analysis software to analyze image data captured during sequencing cycles to determine nucleobase calls for given clusters of oligonucleotides and sequence such calls across sequencing cycles to determine nucleotide reads for the given clusters. Additionally, some existing sequencing systems utilize targeted sequencing to analyze genomic regions of interest. For example, existing sequencing systems can sequence and analyze mutations in specific genomic regions that correspond with various diseases.
  • Existing sequencing systems may pool genetic samples from different individuals into clusters on a single flow well (or other nucleotide-sample substrate) to increase the number of samples analyzed in a single sequencing run.
  • existing sequencing systems may utilize sample multiplexing (or multiplex sequencing) to add individual “barcode” or indexing sequences to each deoxyribonucleic acid (DNA) fragment during library preparation.
  • the indexing sequences correspond to individual genomic samples within the sample pool.
  • existing sequencing systems may perform demultiplexing to identify which indexing sequences — and which clusters of oligonucleotides on a flow cell — correspond with which genomic samples.
  • sequencing devices can under-sequence DNA reads extracted from some samples, sequencing devices can sometimes execute an excessive number of sequencing cycles or images (or otherwise over-sequence) for a sequencing run to generate the requisite numbers or length of nucleotide reads to satisfy the target genomic region coverage level.
  • existing sequencing systems can consume additional reagents, processing materials, and sample material during additional sequencing cycles or runs.
  • additional processing materials including sequencing reagents, library preparation kits, cluster amplification materials, flow cells or other nucleotide-sample substrates, scarce real estate on such flow cells, and other materials.
  • existing sequencing systems In addition to consuming such materials, existing sequencing systems sometimes require reextracting genomic material from an individual and re-performing library preparation necessary to seed oligonucleotide clusters on an additional flow cell to perform an additional sequencing run to compensate for a previous sequencing run that failed to produce a target genomic region coverage for variant calling (or other secondary analysis) of the individual.
  • the relationship between number of cycles and processing materials consumed is a linear function.
  • many existing sequencing systems consume additional processing materials and sample materials to compensate for the coverage uncertainty and variation outlined above.
  • This disclosure describes one or more embodiments of systems, methods, and non- transitory computer readable storage media that solve one or more of the problems described above or provide other advantages over the art.
  • the disclosed systems estimate read coverage for a target genomic region of each sample in a flow cell or other nucleotide- sample substrate and adjusts a number of sequencing cycles to satisfy a target coverage for the target genomic region based of the estimated read coverage.
  • the disclosed systems receive user input for a sequencing run that indicates a target coverage level for a target genomic region.
  • the disclosed systems may provisionally map reads of the samples — which can be identified using index sequences — to a reference sequence corresponding to the target genomic region. Based on the reads provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the run, the system estimates a read-coverage level for the target genomic region in each sample of the run. Having estimated the read-coverage level, the disclosed systems may generate an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for one or more of the genomic samples. The disclosed systems can further execute the sequencing run according to the adjusted number of sequencing cycles.
  • FIG. 1 illustrates a computing system in which a sequencing device and a corresponding target-sequence-coverage system can operate in accordance with one or more embodiments of the present disclosure.
  • FIG. 2 illustrates an overview of the target-sequence-coverage system generating an adjusted number of sequencing cycles and executing a sequencing run based on the adjusted number of sequencing cycles in accordance with one or more implementations of the present disclosure.
  • FIG. 3 illustrates an overview of the target-sequence-coverage system estimating an updated read-coverage level in accordance with one or more implementations of the present disclosure.
  • FIG. 4 illustrates the target-sequence-coverage system provisionally mapping nucleotide reads to a reference sequence in accordance with one or more embodiments of the present disclosure.
  • FIG. 5 illustrates the target-sequence-coverage system identifying clusters of oligonucleotides provisionally mapped to a reference sequence corresponding with the target genomic region and identifying genomic samples to which the nucleotide sequence reads belong in accordance with one or more embodiments of the present disclosure.
  • FIG. 6 illustrates an example decision flowchart by which the target-sequence- coverage system executes sequencing runs according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure.
  • FIG. 7 illustrates a schematic view of an example of a system that may be used to provide biological or chemical analysis in accordance with one or more embodiments of the present disclosure.
  • FIG. 8 illustrates a schematic view of an example of a set of components that may cooperate to provide a fluid path in the system of FIG. 7 in accordance with one or more embodiments of the present disclosure.
  • FIG. 9 illustrates a flowchart of a series of acts for executing a sequencing run according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure.
  • FIG. 10 illustrates a block diagram of an example computing device in accordance with one or more embodiments of the present disclosure.
  • This disclosure describes one or more embodiments of a target-sequence-coverage system that can efficiently identify read data for one or more genomic samples during a sequencing run, estimate read coverage for a target genomic region of the one or more samples, and adjust anumber of sequencing cycles during the sequencing run to satisfy a target coverage for the target genomic region of the one or more genomic samples.
  • the target-sequence- coverage system can accordingly adjust, on the fly, a number of sequencing cycles that likely satisfy a target coverage for a target genomic region.
  • the target-sequence-coverage system can receive data input for a sequencing run that identifies a target genomic region for genomic samples and a target-read- coverage level for the target genomic region.
  • the target-sequence-coverage system can provisionally map nucleotide reads of the genomic samples to a reference sequence corresponding to the target genomic region.
  • a reference sequence may represent the target genomic region within a reference genome.
  • the target-sequence-coverage system can identify to which sample certain nucleotide reads belong based on index sequences within the nucleotide reads.
  • the target-sequence-coverage system Based on the nucleotide reads being provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the sequencing run, the target-sequence-coverage system estimates a read-coverage level for the target genomic region in each sample of the run. Having estimated read-coverage level, the target-sequence-coverage system can generate an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for one or more of the samples. The target-sequence-coverage system can further execute the sequencing run according to the adjusted number of sequencing cycles or, upon further update, an updated adjusted number of sequencing cycles.
  • the target-sequence-coverage system can receive data input identifying a target genomic region for genomic samples.
  • a researcher or clinician may choose to identify a target genomic region for sequencing by inputting a range of genomic coordinates or a particular name or code identifying the target genomic region.
  • the target-sequence-coverage system can receive input identifying the target genomic region.
  • the target-sequence- coverage system can receive data input identifying a target read-coverage level for the target genomic region.
  • the target-sequence-coverage system can identify a desired depth or number of times the target genomic region is sequenced during a sequencing run.
  • the coverage level of a target genomic region can indicate the reliability and accuracy of downstream analysis, such as variant calling.
  • the target-sequence-coverage system can identify a target read-coverage level that results in the desired quality level of additional sequencing.
  • the target-sequence-coverage system can provisionally map nucleotide reads of the genomic samples to a reference sequence corresponding to the target genomic region. After an initial set of sequencing cycles, the target-sequence-coverage system can identify nucleotide reads that contribute to the final coverage of the targeted genomic region. The target-sequence-coverage system can map the nucleotide reads of the genomic sample from the initial set of sequencing cycles to a reference genome. Based on the provisional mapping, the target-sequence-coverage system can identify nucleotide reads that correspond or map to the target genomic region.
  • the target-sequence- coverage system can estimate read-coverage levels of the target genomic region.
  • the target-sequence-coverage system identifies a number of nucleotide reads provisionally mapped to the target genomic region or adjacent regions to estimate the read coverage.
  • the target-sequence-coverage system estimates the read coverage based on the growth direction of nucleotide reads mapped to the target genomic region and/or adjacent regions.
  • the target-sequence-coverage system Based on the estimated read-coverage level for the target genomic region, in some embodiments, the target-sequence-coverage system generates an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region. In particular, the target-sequence-coverage system can adjust a number of sequencing cycles before the sequencing run concludes. For example, the target-sequence-coverage system can determine, for each genomic sample, a number of sequencing cycles likely required to satisfy the target read coverage.
  • the target-sequence-coverage system can determine (i) whether the number of sequencing cycles needed is beyond the maximum number of cycles for a sequencing run, (ii) whether the sequencing run could be terminated at the adjusted number of sequencing cycles for the sequencing run, or (iii) whether the sequencing run should be continued with additional cycles.
  • the target-sequence-coverage system can efficiently eliminate under-sequenced genomic samples and thereby avoid performing additional and unnecessary sequencing runs.
  • the target-sequence-coverage system can iteratively update the adjusted number of sequencing cycles a various checkpoint sequencing cycles of a sequencing run. During such an iterative update, the target-sequence-coverage system can perform a new provisional mapping of nucleotide reads for genomic samples by mapping nucleotide reads (or fragment of nucleotide reads) not mapped previously to the target genomic region. Likewise, the target-sequence-coverage system can provisionally remap nucleotide reads to the reference sequence corresponding to the target genomic region at a checkpoint sequencing cycle.
  • the target-sequence-coverage system can use the provisional remapping of nucleotide reads to predict whether a previously determined adjusted number of sequencing cycles is sufficient to meet the target read-coverage levels across genomic samples and/or for individual genomic samples.
  • the target-sequence-coverage system can determine an updated read-coverage level and further adjust the number of sequencing cycles within the sequencing run.
  • the target-sequence-coverage system provides several technical advantages by, for example, improving computational and resource efficiency of a specialized computing device — that is, a sequencing device — relative to some existing sequencing systems.
  • the target-sequence-coverage system may reduce the amount of compute time and consumed memory on a sequencing device for a given sequencing run to reach target read-coverage levels for target genomic regions relative to existing sequencing systems.
  • the target-sequence-coverage system may more accurately execute a number of sequencing cycles required for each genomic sample (or an individual genomic sample) to reach a target read-coverage level of a target genomic sequence.
  • the target-sequence-coverage system can perform additional provisional re-mappings and generate updated read-coverage levels that may more-precisely predict, in real time or near real time, the number of sequencing cycles required to meet the target read-coverage level.
  • the target-sequence-coverage system can execute a lower number of sequencing cycles that may consume less processing and memory — while still achieving acceptable read-coverage levels for each genomic sample (or an individual genomic sample) at a target genomic region. Accordingly, the target-sequence-coverage system can reduce the amount of compute time required to perform a sequencing run that satisfies a target read-coverage level for target genomic regions for genomic samples.
  • the target- sequence-coverage system may also conserves consumables and other physical resources — and may reduce overuse of fluidics devices and other hardware within a sequencing device — for a sequencing device relative to some existing sequencing systems.
  • some existing sequencing systems may be used to preset additional or buffer sequencing cycles and sometimes perform one or more additional sequencing runs (e.g., top-off runs) with a particular sample to ensure sufficient coverage for data analysis. Such additional preset sequencing cycles or additional sequencing runs can consume additional sequencing reagents, processing materials, and sample materials.
  • the target-sequence- coverage system can (i) efficiently generate an adjusted number of sequencing cycles before a sequencing run concludes and (ii) thereby execute the sequencing run on a specialized computing device — that is, a sequencing device — according to the adjusted number of sequencing cycles.
  • a specialized computing device that is, a sequencing device — according to the adjusted number of sequencing cycles.
  • the present disclosure utilizes a variety of terms to describe features and advantages of the target-sequence-coverage system.
  • the term “sequencing run” refers to an iterative process on a sequencing device to determine a primary structure of nucleotide sequences from a sample (e.g., genomic sample).
  • a sequencing run includes cycles of sequencing chemistry and imaging performed by a sequencing device that incorporate nucleobases into growing oligonucleotides to determine nucleotide reads from nucleotide sequences extracted from a sample (or other sequences within a library fragment) and seeded throughout a flow cell or other nucleotide- sample substrate.
  • a sequencing run includes replicating oligonucleotides derived or extracted from one or more genomic samples seeded in clusters or other structures, such as circularized nucleotide strands, throughout a flow cell.
  • a sequencing device can generate base-call data in a file, such as a binary base call (BCL) sequence file or a fast-all quality (FASTQ) file.
  • BCL binary base call
  • FASTQ fast-all quality
  • sequencing cycle refers to an iteration of adding or incorporating one or more nucleobases to one or more oligonucleotides representing or corresponding to sample’s sequence (e.g., a genomic or transcriptomic sequence from a sample) or a corresponding adapter sequence.
  • a sequencing cycle includes an iteration of both incorporating nucleobases into clusters or other structures of oligonucleotides using sequencing chemistry and capturing images of detectable elements, such as fluorescence of fluorophores, associated with the nucleotides of such clusters or other structures attached to or located on a flow cell or other nucleotide-sample substrate.
  • a sequencing cycle can include one or both of an indexing cycle and a genomic sequencing cycle. For instance, one cluster of oligonucleotides or a set of clusters of oligonucleotides may be undergoing a genomic sequencing cycle in which nucleobases corresponding to a sample genomic sequence are incorporated and another cluster of oligonucleotides or another set of clusters of oligonucleotides may be concurrently undergoing an indexing cycle in which nucleobases corresponding to an indexing sequence for a nucleotide read are incorporated.
  • a sequencing device progresses through sequencing cycles that determine nucleobase calls for a nucleotide read from an oligonucleotide comprising an adapter sequence (e.g., p5/p7 primers), a first indexing sequence, a sample genomic sequence (e.g., gDNA), a second indexing sequence, and another adapter sequence (e.g., p5/p7 primers).
  • an adapter sequence e.g., p5/p7 primers
  • a first indexing sequence e.g., a sample genomic sequence
  • gDNA e.gDNA
  • second indexing sequence e.gDNA
  • another adapter sequence e.g., p5/p7 primers
  • genomic sequencing cycle refers to an iteration of adding or incorporating one or more nucleobases to one or more oligonucleotides representing or corresponding to a sample genomic sequence (or cDNA sequence).
  • a genomic sequencing cycle can include an iteration of capturing and analyzing one or more images with data indicating individual nucleobases added or incorporated into an oligonucleotide or to oligonucleotides (in parallel) representing or corresponding to one or more sample genomic sequences.
  • image analysis can include analyzing data from signals output from an image sensor (e.g., an area capture sensor or a time delayed integration (TDI) sensor).
  • image sensor e.g., an area capture sensor or a time delayed integration (TDI) sensor.
  • each genomic sequencing cycle involves capturing and analyzing images to determine either single reads or paired-end reads of DNA (or RNA) strands representing part of a genomic sample (or transcribed sequence from a genomic sample).
  • a genomic sequencing cycle in some cases, is specific to a cluster of oligonucleotides or a set of clusters of oligonucleotides.
  • indexing cycle refers to an iteration of adding or incorporating one or more nucleobases to one or more oligonucleotides representing or corresponding to one or more indexing sequences.
  • an indexing cycle can include an iteration of capturing and analyzing one or more images of clusters of oligonucleotides indicating one or more nucleobases added or incorporated into an oligonucleotide or to oligonucleotides (in parallel) representing or corresponding to one or more indexing sequences.
  • An indexing cycle differs from a genomic sequencing cycle in that an indexing cycle includes sequencing of at least a nucleobase (or a majority of nucleobases) from one or more indexing sequences that identify or encode one or more sample library fragments.
  • genomic sequencing cycles may be specific to a cluster or clusters of oligonucleotides or other structures of oligonucleotides
  • an indexing cycle for one cluster of oligonucleotides may be performed at a same time as a genomic sequencing cycle for another cluster of oligonucleotides.
  • the term “currently selected number of sequencing cycles” refers to an adjustable value that represents a number of sequencing cycles to be performed during a sequencing run.
  • a currently selected number of sequencing cycles can be automatically determined, determined based on user selection, or preset according to a default number.
  • the currently selected number of sequencing cycles may be fixed based on the reagent kit used with a sequencing system.
  • reagent kits may specify 50, 100, 150, 200, 300, 400, 500, 600, or more genomic sequencing cycles in addition to indexing cycles, primer cycles, and/or other cycles.
  • the target-sequence- coverage system can determine a currently selected number of sequencing cycles equaling 150 sequencing cycles.
  • the target-sequence-coverage system can adjust the number of sequencing cycles by increasing the number of sequencing cycles or reducing the number of sequencing cycles.
  • genomic sample refers to a target oligonucleotide sample.
  • the oligonucleotide sample may be a genome or portion of a genome undergoing an assay or sequencing.
  • a genomic sample includes one or more sequences of nucleotides isolated or extracted from a sample organism (or a copy of such an isolated or extracted sequence) or any other source of oligonucleotides.
  • a genomic sample may include a full genome that is isolated or extracted (in whole or in part) from a sample organism and composed of nitrogenous heterocyclic bases.
  • a genomic sample can include a segment of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or other polymeric forms of nucleic acids or chimeric or hybrid forms of nucleic acids noted below.
  • the genomic sample is found in a sample prepared or isolated by a kit and received by a sequencing device.
  • read-coverage level refers to a measure or value that indicates a depth or redundancy of nucleotide-sequence information for a particular genomic coordinate or genomic region of a sample.
  • read-coverage level refers to a number of times a specific genomic coordinate or genomic region for a sample is covered or spanned by nucleotide reads.
  • Read-coverage level can be relevant when describing the depth of sequencing data obtained for a particular genomic region of interest or a particular genomic sample.
  • read-coverage level may comprise a numeric value (e.g., lOx, 30x, 45x, 50x, 60x) indicating an average number of unique nucleotide reads for a genomic sample that span or cover genomic coordinates or regions of a human genomic sample.
  • readcoverage level is limited to an average number of unique nucleotide reads across a non-N portion of a human or non-human genome (e.g., non-N portion of a PAR-masked human genome or non-human genome, such as genomes of primates, viruses, bacteria, or other organisms).
  • target read-coverage level refers to a desired or intended depth of sequencing coverage for a specific genomic coordinate or genomic region within a genomic sample.
  • a target read-coverage level represents a minimum number of times a position within a genomic sample should be sequenced to achieve a desired level of confidence in the accuracy of the obtained sequence data.
  • a target-read-coverage level can comprise a numeric value (e.g., 40, 50, 60, etc.) indicating a desired read-coverage level for a given position within a genomic sample.
  • genomic coordinate refers to a particular location or position of a nucleobase within a genome (e.g., an organism’s genome or a reference genome).
  • a genomic coordinate includes an identifier for a particular chromosome of a genome and an identifier for a position of a nucleobase within the particular chromosome.
  • a genomic coordinate or coordinates may include a number, name, or other identifier for a chromosome (e.g., chrl, chrX, chrM) and a particular position or positions, such as numbered positions following the identifier for a chromosome (e.g., chrl: 1234570 or chrl: 1234570-1234870).
  • a genomic coordinate refers to a genomic coordinate on a sex chromosome (e.g., chrX or chrY) or mitochondrial DNA (e.g., chrM).
  • a genomic coordinate refers to a source of a reference genome (e.g., mt for a mitochondrial DNA reference genome or SARS-CoV-2 for a reference genome for the SARS-CoV-2 virus) and a position of a nucleobase within the source for the reference genome (e.g., mt: 16568 or SARS-CoV- 2:29001).
  • a genomic coordinate refers to a position of a nucleobase within a reference genome without reference to a chromosome or source (e.g., 29727).
  • genomic region refers to a range of genomic coordinates. Like genomic coordinates, in certain implementations, a genomic region may be identified by an identifier for a chromosome and a particular position or positions, such as numbered positions following the identifier for a chromosome (e.g., chrl: 1234570-1234870). In various implementations, a genomic coordinate includes a position within a reference genome. In some cases, a genomic coordinate is specific to a particular reference genome.
  • target genomic region refers to a specific segment or range of genomic coordinates that is a focus of interest for analysis.
  • a target genomic region comprises a range of genomic coordinates of interest to be sequenced during a sequencing run.
  • a target genomic region can correspond with a gene of interest, chromosomal regions, epigenetic regions, have functional elements, or have other traits.
  • the target genomic region can correspond to a reference sequence.
  • a reference genome refers to a digital nucleic acid sequence assembled as a representative example (or representative examples) of genes and other genetic sequences of an organism. Regardless of the sequence length, in some cases, a reference genome represents an example set of genes or a set of nucleic acid sequences in a digital nucleic acid sequence determined as representative of an organism.
  • a linear human reference genome may be GRCh38 (or other versions of reference genomes) from the Genome Reference Consortium. GRCh38 may include alternate contiguous sequences representing alternate haplotypes, such as SNPs and small indels (e.g., 10 or fewer base pairs, 50 or fewer base pairs).
  • a reference sequence refers to a sequence of nucleobases at a specific location corresponding to a target genomic region within a reference genome.
  • a reference sequence can refer to a span of reference bases in a reference genome that includes a target genomic region and extends upstream or downstream from the target genomic region by a threshold number of nucleobases.
  • a reference sequence can refer to a specific sequence of nucleotides within the reference genome that represents a gene, a promoter region, a chromosomal segment or other target genomic region.
  • nucleobase call refers to a determination or prediction of a particular nucleobase (or nucleobase pair) for an oligonucleotide (e.g., read) during a sequencing cycle.
  • a nucleobase call can indicate a determination or prediction of the type of nucleobase that has been incorporated within an oligonucleotide on a flow cell (e.g., read-based nucleobase calls).
  • a nucleobase call includes a determination or a prediction of a nucleobase based on intensity values resulting from fluorescent-tagged nucleotides added to one or more oligonucleotides of a flow cell (e.g., in a cluster of a flow cell).
  • a nucleobase call includes a determination or a prediction of a nucleobase from chromatogram peaks or electrical current changes resulting from nucleotides passing through a nanopore of a flow cell.
  • a single nucleobase call can be an adenine (A) call, a cytosine (C) call, a guanine (G) call, a thymine (T) call, or a uracil (U) call.
  • A adenine
  • C cytosine
  • G guanine
  • T thymine
  • U uracil
  • sample genomic sequence refers to a nucleotide sequence extracted from, copied from, or complementary to a sample’s chromosome.
  • a sample genomic sequence includes a nucleotide sequence that has been separated or copied from chromosomal DNA of a sample or has been sequenced to be complementary to an extracted or copied nucleotide sequence.
  • a sample genomic sequence includes genomic DNA (gDNA) for a particular unknown sample.
  • the target-sequence-coverage system can use a sample complementary sequence comprising cDNA rather than a sample genomic sequence comprising gDNA in a sample library fragment or wherever suitable cDNA may replace gDNA as understood by a skilled artisan.
  • any embodiment or nucleotide read in this disclosure that uses or includes a sample genomic sequence can also use or include a cDNA sequence corresponding to a genomic sample.
  • indexing sequence refers to a unique and artificial nucleotide sequence that identifies nucleotide reads for a sample and that is ligated to a sample’s nucleotide sequence (e.g., a gDNA fragment or cDNA fragment) or to another sequence within a sample library fragment.
  • nucleotide sequence e.g., a gDNA fragment or cDNA fragment
  • an indexing sequence can be part of a sample library fragment.
  • an indexing sequence can be used to sort nucleotide reads by sample or into different files, among other things, such as part of a demultiplexing process.
  • a sample library fragment includes an indexing primer sequence that differs from a read priming sequence and that indicates a starting point or starting nucleobase for determining nucleobases of an indexing sequence.
  • the term “cluster of oligonucleotides” refers to a localized collection of DNA or RNA molecules immobilized or located on a solid surface.
  • a cluster of oligonucleotides can refer to a collection of fragment nucleotide sequences immobilized or located on a flow cell region of a flow cell.
  • a cluster of oligonucleotides can refer to a collection of nucleotide fragments originating from a genomic sample.
  • a cluster of oligonucleotides can be imaged utilizing one or more light signals.
  • an oligonucleotide-cluster image may be captured by an image sensor during a sequencing cycle of light emitted by irradiated fluorescent tags incorporated into oligonucleotides from one or more clusters on a flow cell.
  • nucleotide read refers to an inferred sequence of one or more nucleobases (or nucleobase pairs) from all or part of a sample nucleotide sequence (e.g., a sample genomic sequence, complementary DNA).
  • a nucleotide read includes a determined or predicted sequence of nucleobase calls for a nucleotide sequence (or group of monoclonal nucleotide sequences) from a sample library fragment corresponding to a genomic sample.
  • a sequencing device determines a nucleotide read by generating nucleobase calls for nucleobases passed through a nanopore of a flow cell, determined via fluorescent tagging, or determined from a cluster in a flow cell.
  • nucleobase refers to a nitrogenous base.
  • nucleobases comprise components of nucleotides.
  • a nucleobase may be an adenine (A), cytosine (C), guanine (G), thymine (T), or uracil (U).
  • A adenine
  • C cytosine
  • G guanine
  • T thymine
  • U uracil
  • one or more non-naturally occurring nucleobases may also be used.
  • a sequencing device refers to an instrument or platform used to perform a sequencing process.
  • a sequencing device refers to an instrument or platform used to perform a sequencing process based on sequencing by synthesis (SBS) technology, single-molecule real-time sequencing (SMRT) technology using magnetic beads or nanopores or other suitable medium.
  • a sequencing device may comprise components including, but not limited to, -flow cell receptacle, fluidics systems, imaging systems, and/or computational capabilities for acquiring, processing, and analyzing image data (e.g., illumination lasers or SLEDS, focus tracking, emission optics or objective, image sensor or camera) during a sequencing run.
  • image data e.g., illumination lasers or SLEDS, focus tracking, emission optics or objective, image sensor or camera
  • FIG. 1 illustrates a schematic diagram of a computing system 100 in which a target-sequence- coverage system 106 operates in accordance with one or more embodiments.
  • the computing system 100 includes a local server device 102 connected to one or more server device(s) 110, a sequencing device 108, and a client device 114 via a network 112. While FIG. 1 shows an embodiment of the target-sequence-coverage system 106, this disclosure describes alternative embodiments and configurations below.
  • the local server device 102, the sequencing device 108, the server device(s) 110, and the client device 114 can communicate with each other via the network 112.
  • the network 112 comprises any suitable network over which computing devices can communicate. Example networks are discussed in additional detail below with respect to FIG. 10.
  • the sequencing device 108 comprises a device for sequencing a genomic sample or other nucleic-acid polymer.
  • the sequencing device 108 implements a sequencing device system 118 that analyzes nucleic-acid segments or oligonucleotides extracted from genomic samples to generate nucleotide reads or other data utilizing computer implemented methods and systems (described herein) either directly or indirectly on the sequencing device 108. More particularly, the sequencing device 108 receives nucleotide-sample substrates (e.g., flow cells) comprising nucleotide fragments extracted from samples and then copies and determines the nucleotide-base sequence of such extracted nucleotide fragments.
  • nucleotide-sample substrates e.g., flow cells
  • the sequencing device 108 utilizes SBS to sequence nucleic-acid polymers into nucleotide reads. Additionally, the sequencing device 108 can determine base calls for indexing sequences. In addition, or in the alternative to communicating across the network 112, in some embodiments, the sequencing device 108 bypasses the network 112 and communicates directly with the local server device 102 or the client device 114.
  • the local server device 102 is located at or near a same physical location of the sequencing device 108. Indeed, in some embodiments, the local server device 102 and the sequencing device 108 are integrated into a same computing device or are part of the sequencing device 108, as indicated by dotted lines 122.
  • the local server device 102 may run a sequencing system 104 to generate, receive, analyze, store, and transmit digital data, such as by receiving base-call data or determining indexing sequence data or filter metric data based on analyzing such base-call data.
  • the sequencing device 108 may send (and the local server device 102 may receive) base-call data generated during a sequencing run of the sequencing device 108.
  • the local server device 102 may estimate read-coverage levels for genomic samples in a pool of genomic samples.
  • the local server device 102 may also communicate with the client device 114.
  • the local server device 102 can send data to the client device 114, including read-coverage information for genomic samples, filter metric data, estimated read-coverage levels, a variant call file (VCF), or other information indicating nucleobase calls, genotype calls, sequencing metrics, error data, or other metrics.
  • VCF variant call file
  • the server device(s) 110 are located remotely from the local server device 102 and the sequencing device 108.
  • the sequencing device 108 may send (and the server device(s) 110 may receive) base-call data from the sequencing device 108.
  • the server device(s) 110 may also communicate with the client device 114.
  • the server device(s) 110 can send data to the client device 114, including estimated read-coverage levels for target genomic regions within genomic samples, VCFs, or other sequencing related information.
  • the server device(s) 110 comprise a distributed collection of servers where the server device(s) 110 include a number of server devices distributed across the network 112 and located in the same or different physical locations. Further, the server device(s) 110 can comprise a content server, an application server, a communication server, a web-hosting server, or another type of server.
  • the client device 114 can generate, store, receive, and send digital data.
  • the client device 114 can receive readcoverage data from the local server device 102 or receive sequencing metrics from the sequencing device 108.
  • the client device 114 may communicate with the local server device 102 or the server device(s) 110 to receive a VCF comprising variant or genotype calls and/or other metrics, such as a base-call-quality metrics or pass-filter metrics.
  • the client device 114 can accordingly present or display information pertaining to variant calls or other genotype calls within a graphical user interface to a user associated with the client device 114.
  • the client device 114 can present a target read-coverage interface comprising elements indicating potential target read-coverage levels for genomic samples and/or target genomic sequences of the genomic samples.
  • FIG. 1 depicts the client device 114 as a desktop or laptop computer
  • the client device 114 may comprise various types of client devices.
  • the client device 114 includes non-mobile devices, such as desktop computers or servers, or other types of client devices.
  • the client device 114 includes mobile devices, such as laptops, tablets, mobile telephones, or smartphones. Additional details regarding the client device 114 are discussed below with respect to FIG. 10.
  • the client device 114 includes a sequencing application 116.
  • the sequencing application 116 may be a web application or a native application stored and executed on the client device 114 (e.g., a mobile application, desktop application).
  • the sequencing application 116 can include instructions that (when executed) cause the client device 114 to receive data from the target-sequence-coverage system 106 and present, for display at the client device 114, data concerning read-coverage data for a sequencing run, data from a VCF, or other information. Furthermore, the sequencing application 116 can instruct the client device 114 to display graphical user interfaces for receiving input indicating a target genomic sequence and a target read-coverage level for the target genomic sequence.
  • a version of the target-sequence-coverage system 106 may be located on the client device 114 as part of the sequencing application 116. Accordingly, in some embodiments, the target-sequence-coverage system 106 is implemented by (e.g., located entirely or in part) on the client device 114. In yet other embodiments, the target-sequence-coverage system 106 is implemented by one or more other components of the computing system 100, such as the server device(s) 110. In particular, the target-sequence- coverage system 106 can be implemented in a variety of different ways across local server device 102, the sequencing device 108, the client device 114, and the server device(s) 110.
  • the target-sequence-coverage system 106 can be downloaded from the server device(s) 110 to the local server device 102 and/or the client device 114 where all or part of the functionality of the target-sequence-coverage system 106 is performed at each respective device within the computing system 100.
  • aspects of the present disclosure relate generally to devices, systems, and methods providing biological or chemical analysis.
  • Various protocols in biological or chemical research involve performing a large number of controlled reactions on local support surfaces or within predefined reaction chambers. The designated reactions may then be observed or detected, and subsequent analysis may help identify or reveal properties of chemicals involved in the reaction.
  • an unknown analyte having an identifiable label e.g., fluorescent label
  • an identifiable label e.g., fluorescent label
  • Each known probe may be deposited into a corresponding well of a flow cell channel. Observing any chemical reactions that occur between the known probes and the unknown analyte within the wells may help identify or reveal properties of the analyte.
  • Other examples of such protocols include known DNA sequencing processes, such as sequencing- by-synthesis (SBS) or cyclic-array sequencing.
  • the target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles sufficient to satisfy a target read-coverage level of a target genomic region.
  • FIG. 2 illustrates an overview of the target-sequence-coverage system 106 generating an adjusted number of sequencing cycles and executing a sequencing run based on the adjusted number of sequencing cycles in accordance with one or more implementations of the present disclosure.
  • the target-sequence-coverage system 106 can identify a target genomic region and determine a target read-coverage level for the target genomic region.
  • the target-sequence-coverage system 106 can further provisionally map nucleotide reads to a reference sequence corresponding to the target genomic region and estimate read-coverage levels for the target genomic region.
  • the target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles based on the estimated readcoverage levels and execute a sequencing run based on the adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 performs an act 202 of receiving data input identifying a target genomic region.
  • the target-sequence-coverage system 106 receives a user indication of a target genomic region.
  • the target genomic region comprises genomic coordinates or a region of one or more genomic samples that is the focus of sequencing.
  • a target genomic region can be identified as part of a target array that focuses on a panel of genes, exons, or other genomic regions of interest.
  • Target arrays can be useful in targeted gene sequencing, exome sequencing, custom panels for disease research, epigenetic studies, and other applications. As shown in FIG.
  • the target-sequence-coverage system 106 receives data input identifying a target genomic region 216 for a genomic sample 214. In some implementations, the target-sequence-coverage system 106 identifies one or more target genomic regions for which the target-sequence-coverage system 106 estimates read-coverage levels and generates an adjusted number of sequencing cycles. [0066] FIG. 2 further illustrates the target-sequence-coverage system 106 performing an act 204 of determining a target read-coverage level for the target genomic region. In some embodiments, the target-sequence-coverage system 106 automatically determines a target read-coverage level for the target genomic region across samples.
  • the target- sequence-coverage system 106 may predetermine a target read-coverage level of 20x, 3 Ox, 40x, 50x, 60x, or more.
  • the target-sequence-coverage system 106 determines the target read-coverage level based on receiving data input. For instance, the target-sequence-coverage system 106 can receive an indication of a desired target readcoverage.
  • the target-sequence-coverage system 106 determines multiple target read-coverage levels.
  • the target-sequence-coverage system 106 may identify or determine different target read-coverage levels for different target genomic regions and/or different target read-coverage levels for different genomic samples.
  • the target-sequence-coverage system 106 may determine higher or lower target readcoverage levels for different genomic samples.
  • the target-sequence-coverage system 106 can perform an act 206 of provisionally mapping nucleotide reads to a reference sequence corresponding to the target genomic region.
  • the target-sequence-coverage system 106 provisionally maps nucleotide reads of the genomic samples to a reference sequence after an initial set of sequencing cycles and during the sequencing run.
  • the initial set of sequencing cycles comprises enough sequencing cycles such that the nucleotide read is long enough for the target-sequence-coverage system 106 to provisionally map the nucleotide read.
  • the target-sequence-coverage system 106 automatically determines the initial set of sequencing cycles.
  • the target-sequence-coverage system 106 can determine that the initial set of sequencing cycles comprises 15, 20, 32, 50, or another number of sequencing cycles. In some embodiments, the target-sequence-coverage system 106 determines the number of sequencing cycles in the initial set of sequencing cycles based on user input. For example, the target-sequence-coverage system 106 can receive a user indication of a desired number of initial sequencing cycles.
  • the target-sequence-coverage system 106 provisionally maps nucleotide reads of the genomic samples to the reference sequence corresponding to the target genomic region. As shown in FIG. 2, the target-sequence-coverage system 106 provisionally maps nucleotide reads 222 to a reference genome 220 after an initial set of sequencing cycles. Based on mapping the nucleotide reads 222 to the reference genome, the target-sequence-coverage system 106 can identify nucleotide reads that are provisionally mapped to a reference sequence corresponding to the target genomic region 218. As shown in FIG. 2, the target-sequence-coverage system 106 maps reads originating from all samples to the reference genome 220.
  • the target-sequence-coverage system 106 maps the nucleotide reads 222 to the reference genome 220 in real time during the sequencing run. Realtime mapping of nucleotide reads before completion of a sequencing run is described by U.S. Pat. No. 11,646,102 B2, the disclosure of which is incorporated herein by reference in its entirety.
  • the target-sequence-coverage system 106 can perform a secondary analysis iteratively while nucleotide reads (also called, sequence reads) are generated by the target-sequence-coverage system 106 or other sequencing system.
  • the target-sequence-coverage system 106 can align nucleotide reads to the reference sequence and, based on such alignment, detect differences between a the nucleotide reads of a genomic sample and a reference genome. More specifically, the target-sequence-coverage system 106 can receive imaging data for sequencing cycles and determine whether a certain number of minimum sequencing cycles have been performed. Based on determining that the minimum sequencing cycles have been performed, the target-sequence-coverage system 106 can align nucleotide reads to a reference genome.
  • the target-sequence-coverage system 106 can repeat the process of accessing additional imaging data and mapping nucleotide reads from the minimum sequencing cycles to the reference genome.
  • the target-sequence-coverage system 106 can iteratively repeat this process until all sequencing cycles are complete.
  • FIG. 4 and the corresponding discussion further detail how the target-sequence-coverage system 106 provisionally maps nucleotide reads to the reference sequence in accordance with one or more embodiments of the present disclosure.
  • the target-sequence-coverage system 106 may also perform the act 208 of estimating, during the sequencing run, read-coverage levels of the target genomic region. As part of performing the act 208, the target-sequence-coverage system 106 identifies nucleotide reads or clusters of oligonucleotides belonging to respective genomic samples. The target-sequence-coverage system 106 can determine an estimated read-coverage level for each genomic sample based on the number of nucleotide reads provisionally mapped to the reference sequence (or “# of target region nucleotide reads”) and a currently selected number of sequencing cycles for the sequencing run.
  • the target-sequence- coverage system 106 determines an estimated read-coverage level for each genomic sample by estimating a number of nucleotide reads of a genomic sample for each of the initial set of sequencing cycles.
  • FIG. 5 and the corresponding paragraph provide additional detail regarding how the target-sequence-coverage system 106 estimates read-coverage levels in accordance with one or more embodiments of the present disclosure.
  • the target-sequence-coverage system 106 performs the act 208 by utilizing the following equation:
  • the # of target region nucleotide reads represents an estimated number of nucleotide reads of a genomic sample based on the nucleotide reads provisionally mapped to the reference sequence
  • the # of initial sequencing cycles represents a number of sequencing cycles in the initial set of sequencing cycles
  • the currently selected # of sequencing cycles represents the currently selected number of sequencing cycles.
  • the target-sequence-coverage system 106 performs an act 210 of generating an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level.
  • the target-sequence-coverage system 106 can generate an adjusted number for each genomic sample of the genomic samples.
  • the targetsequence-coverage system 106 can generate an adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles that is lower than the currently selected number of sequencing cycles or a preset number of sequencing cycles to avoid over sequencing one or more genomic samples. In another example, the target-sequence-coverage system 106 generates an adjusted number of sequencing cycles that is higher than the currently selected number of sequencing cycles or a preset number of sequencing cycles to satisfy the target readcoverage level.
  • the target-sequence-coverage system 106 performs the act 210 and generates an adjusted number of sequencing cycles by utilizing the following equation: # of target region nucleotide reads
  • N cvc x - - - - - - - - target read — coverage level # of initial sequencing cycles
  • N cyc represents the adjusted number of sequencing cycles
  • # of target region nucleotide reads represents the number of nucleotide reads of a genomic sample provisionally mapped to the reference sequence corresponding to the target genomic region
  • # of initial sequencing cycles represents the number of sequencing cycles in the initial set of sequencing cycles.
  • the target-sequence-coverage system 106 determines a plurality of target read-coverage levels for different target genomic regions and/or different genomic samples. In such instances, the target-sequence-coverage system 106 may generate a plurality of adjusted numbers of sequencing cycles sufficient to satisfy the plurality of target read-coverage levels. In some implementations, the target-sequence- coverage system 106 selects an adjusted number of sequencing cycles from the plurality of adjusted numbers of sequencing cycles. For example, the target-sequence-coverage system 106 may select the highest adjusted number of sequencing cycles from the plurality of adjusted numbers of sequencing cycles. In another example, the target-sequence-coverage system 106 selects the highest adjusted number of sequencing cycles that is below a maximum number of cycles within a sequencing run.
  • the target-sequence-coverage system 106 includes buffer sequencing cycles in the adjusted number of sequencing cycles.
  • the target-sequence- coverage system 106 can determine to perform a predetermined number of buffer sequencing cycles to ensure that the target read-coverage levels will be met.
  • the target- sequence-coverage system 106 can perform a threshold number of buffer sequencing cycles on top of target sequencing cycles predicted to meet the target read-coverage level.
  • the threshold number of buffer sequencing cycles can be a percentage of the target sequencing cycles.
  • the target-sequence-coverage system 106 can determine that the threshold number of buffer sequencing cycles equals 10% of the target sequencing cycles.
  • the target-sequence-coverage system 106 performs an act 212 of executing the sequencing run.
  • the target-sequence- coverage system 106 executes the sequencing run according to the adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 executes the sequencing run based on the highest adjusted number of sequencing cycles for a genomic sample.
  • the target-sequence-coverage system 106 can ensure that the genomic sample having the lowest estimated read-coverage level is sequenced to the target read-coverage level.
  • FIG. 6 and the corresponding discussion provide additional details on how the target-sequence- coverage system 106 executes the sequencing run in accordance with one or more implementations of the present disclosure.
  • the target-sequence-coverage system 106 generates and executes an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level for a given genomic sample. For example, in some implementations, the target-sequence- coverage system 106 performs the act 208 and estimates read-coverage levels of the target genomic region for all of the genomic samples. The target-sequence-coverage system 106 can improve efficiency by generating an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level for the genomic sample having the lowest estimated readcoverage levels. In one or more additional embodiments, the target-sequence-coverage system 106 generates adjusted numbers of sequencing cycles sufficient to satisfy target read-coverage levels for all genomic samples.
  • the target-sequence-coverage system 106 generates and executes an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level for a given target genomic region. For example, in some implementations, the target- sequence-coverage system 106 performs the act 208 by estimating read-coverage levels of the target genomic region for all target genomic regions. The target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles sufficient to satisfy the target readcoverage level for the target genomic region having the lowest estimated read-coverage levels. In one or more additional embodiments, the target-sequence-coverage system 106 generates adjusted numbers of sequencing cycles sufficient to satisfy target read-coverage levels for all target genomic regions.
  • the target-sequence-coverage system 106 provisionally re-maps nucleotide reads of the genomic samples to the reference sequence and estimates an updated read-coverage level. By re-mapping the nucleotide reads and estimating an updated read-coverage level, the target-sequence-coverage system 106 can more accurately predict whether the adjusted number of sequencing cycles is sufficient to satisfy the target read-coverage level across genomic samples.
  • FIG. 3 illustrates an overview of the target-sequence-coverage system 106 estimating an updated read-coverage level in accordance with one or more implementations of the present disclosure.
  • the target-sequence-coverage system 106 determines a checkpoint sequencing cycle at which to perform the provisional re-mapping of the nucleotide reads.
  • the target-sequence-coverage system 106 can provisionally re-map nucleotide reads to the reference sequence and estimate an updated read-coverage level based on the provisionally re-mapped nucleotide reads.
  • FIG. 3 illustrates the target-sequence-coverage system 106 performing an act 302 of determining a checkpoint sequencing cycle.
  • the target-sequence-coverage system 106 can determine a checkpoint sequencing cycle within a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles. As shown in FIG. 3, for instance, the target-sequence-coverage system 106 can use the following equation to determine the checkpoint sequencing cycle:
  • N cyc represents the adjusted number of sequencing cycles and “Threshold Number” represents a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 automatically determines the threshold number of sequencing cycles before the last sequencing cycle of the adjusted number of sequencing cycles. For example, the target-sequence-coverage system 106 can automatically determine that the threshold number equals 5, 10, 20, 50, etc. sequencing cycles before the last sequencing cycle of the adjusted number of sequencing cycles. In another embodiment, the target-sequence-coverage system 106 determines the threshold number of sequencing cycles based on user input. For instance, the target-sequence-coverage system 106 may receive a user indication of a desired threshold number of sequencing cycles.
  • the target-sequence-coverage system 106 performs an act 304 of provisionally re-mapping nucleotide reads to the reference sequence. As shown, the target-sequence-coverage system 106 provisionally re-maps, at or before the checkpoint sequencing cycle, nucleotide reads of the genomic samples to the reference sequence corresponding to a target genomic region 310. FIG. 3 illustrates nucleotide reads 312 that have previously been mapped to a reference genome 308. The target-sequence-coverage system 106 maps the nucleotide reads 312 after the set of initial sequencing cycles. Nucleotide reads 314 shown in FIG.
  • the target-sequence-coverage system 106 maps the nucleotide reads 314 to the target genomic region 310. [0083] As further shown in FIG. 3, the target-sequence-coverage system 106 performs an act 306 of estimating an updated read-coverage level. The target-sequence-coverage system 106 estimates the updated read-coverage level at or after the checkpoint sequencing cycle. More specifically, the updated read-coverage level indicates a predicted read-coverage level for the target genomic region of the genomic samples after execution of the adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 estimates, for the target genomic region of the genomic samples, an updated read-coverage level based on the nucleotide reads of the genomic samples provisionally re-mapped to the reference sequence. In some implementations, the target-sequence-coverage system 106 estimates the updated read-coverage level based on the following equation:
  • # of Remapped Nucleotide Reads represents a number of nucleotide reads of the genomic samples provisionally re-mapped to the reference sequence
  • # of Cycles at Checkpoint represents a number of total sequencing cycles from the beginning of the sequencing run to the checkpoint sequencing cycle
  • Adjusted # of Sequencing Cycles represents the number of sequencing cycles within the adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 further estimates the updated read-coverage level based on a margin number of sequencing cycles represented by N Sequencing Cycles.
  • N Sequencing Cycles comprises a predetermined number of sequencing cycles that accounts for some variability in coverage gained between sequencing cycles.
  • the target-sequence-coverage system 106 may automatically determine the margin number of sequencing cycles N Sequencing Cycles. In some examples, the target-sequence-coverage system 106 recieves the margin number of sequencing cycles as input from a client device.
  • the target-sequence-coverage system 106 can compare the updated read-coverage levels for the genomic samples to the target read-coverage level.
  • the target-sequence-coverage system 106 can perform various actions based on comparing the updated read-coverage levels and the target read-coverage level. For example, in some implementations, the target-sequence- coverage system 106 generates an updated adjusted number of sequencing cycles and executes the sequencing run according to the updated adjusted number of sequencing cycles. In particular, the target-sequence-coverage system 106 may perform sequencing cycles until finishing a safety threshold number of sequencing cycles comprising the updated adjusted number of sequencing cycles and a buffer number of sequencing cycles based on the updated read-coverage level.
  • the target-sequence-coverage system 106 can determine that the adjusted number of sequencing cycles is sufficient to meet the target readcoverage level and execute the sequencing run according to the adjusted number of sequencing cycles.
  • FIG. 6 and the corresponding paragraphs detail various actions that the target-sequence- coverage system 106 can perform based on comparing the updated read-coverage levels with the target read-coverage level.
  • the target-sequence-coverage system 106 provisionally maps nucleotide reads of genomic samples to a reference sequence.
  • FIG. 4 illustrates the target- sequence-coverage system 106 provisionally mapping nucleotide reads 408a - 408e to a reference sequence in accordance with one or more embodiments of the present disclosure.
  • FIG. 4 illustrates a reference genome 402 comprising a target genomic region 404 and adjacent genomic regions 406a - 406b.
  • FIG. 4 illustrates how the target-sequence-coverage system 106 can identify nucleotide reads that are provisionally mapped to the target genomic region 404 or near the target genomic region.
  • the target-sequence-coverage system 106 can identify read mates of paired-end reads that are provisionally mapped to the reference genome 402.
  • FIG. 4 illustrates the target- sequence-coverage system 106 mapping various paired-end nucleotide reads to the reference genome 402.
  • nucleotide reads 408a - 408b represent a pair of paired-end reads comprising a first read mate and a second read mate
  • nucleotide reads 408c - 408d represent a different pair of paired-end reads comprising a first read mate and a second read mate that have been provisionally mapped to the reference genome 402.
  • the target-sequence-coverage system 106 utilizes sequencing methods that index and begin determining base calls for both read mates early in a sequencing cycle. Accordingly, the target-sequence-coverage system 106 can access and provisionally map both read mates in a paired-end read to the reference genome 402 after the initial set of sequencing cycles. For instance, the target-sequence-coverage system 106 can access and provisionally map both read mates of the nucleotide reads 408a - 408b and nucleotide reads 408c - 408d after an initial set of sequencing cycles.
  • the target-sequence-coverage system 106 can provisionally map a single read to the target genomic region 404.
  • the target-sequence-coverage system 106 may map a single nucleotide read resulting from single-end sequencing or a first read mate from paired-end sequencing.
  • some sequencing systems generate base calls for a second read only after determining nucleotide reads for a first read mate.
  • sequencing systems index a first read mate before indexing a second read mate — accordingly, the target-sequence-coverage system 106 cannot access information identifying a second read mate from paired-end sequencing.
  • nucleotide reads 408e- 408f can comprise nucleotide reads from single-end sequencing or a first read mate (e.g., Rl) of paired-end nucleotide reads.
  • the target-sequence-coverage system 106 can identify nucleotide reads that are provisionally mapped to the target genomic region 404 within the reference genome 402. More specifically, the target-sequence-coverage system 106 may determine locations of nucleotide reads of the genomic samples within the reference sequence corresponding to the target genomic region. For instance, the target-sequence-coverage system 106 provisionally maps the nucleotide reads 408a - 408b to the target genomic region 404. Additionally, the target-sequence-coverage system 106 can map a first read mate from paired- end nucleotide reads to the target genomic region 404.
  • the target-sequence- coverage system 106 can map the nucleotide read 408d to the target genomic region 404 while mapping its corresponding second read mate, the nucleotide read 408c, to a segment of the reference genome 402 that does not overlap with the target genomic region 404.
  • the target-sequence-coverage system 106 identifies nucleotide reads that are mapped near the target genomic region that will likely contribute to the coverage of the target genomic region 404.
  • the target-sequence-coverage system 106 provisionally maps nucleotide reads to an adjacent genomic region within a threshold number of nucleobases of the target genomic region.
  • FIG. 4 illustrates adjacent genomic regions 406a - 406b that surround the target genomic region 404.
  • the target-sequence-coverage system 106 automatically determines a threshold number of nucleobases for the adjacent genomic regions.
  • the target- sequence-coverage system 106 can automatically determine that the threshold number of nucleobases equals 25, 50, 100, etc. nucleobases. In some implementations, the targetsequence-coverage system 106 determines the threshold number of nucleobases for the adjacent genomic regions based on user input. In other implementations, the target-sequence- coverage system 106 determines the threshold number of nucleobases based on the number of sequencing cycles in the initial set of sequencing cycles. A higher number of initial sequencing cycles may indicate that nucleotide reads are nearer to sequencing completion and, thus, close to their final read length. Accordingly, the target-sequence-coverage system 106 can determine lower threshold numbers of nucleobases for higher numbers of initial sequencing cycles.
  • nucleotide read 408e is provisionally mapped to the adjacent genomic region 406b.
  • Nucleotide reads that provisionally map to the adjacent genomic regions 406a - 406b may provide additional coverage for the target genomic region.
  • the target-sequence-coverage system 106 can estimate read-coverage levels of a target genomic region based on the mapped locations and read-growth directions of the nucleotide reads of the genomic samples. Generally, the target-sequence- coverage system 106 can evaluate the location of nucleotide reads within or near the target genomic region 404 and whether the read direction is toward (or away from) the center of the target genomic region 404 to determine if those nucleotide reads will contribute to coverage of the target genomic region 404. The target-sequence-coverage system 106 determines that nucleotide reads that are in closer proximity to the target genomic region 404 are more likely to affect the coverage of the target genomic region 404. In some embodiments, and as illustrated in FIG. 4, the target-sequence-coverage system 106 determines read-growth directions of nucleotide reads as part of provisionally mapping nucleotide reads to the target genomic region 404.
  • the target-sequence-coverage system 106 can determine read-growth directions of nucleotide reads growing upstream or downstream with respect to the target genomic region.
  • the left side of the reference genome 402 can represent the 5’ end, and the right side of the reference genome 402 can represent the 3’ end.
  • the target- sequence-coverage system 106 determines the read-growth directions of one or more of the nucleotide reads 408b, 408d, and 408e are upstream based on the nucleotide reads growing toward the 5’ end.
  • the target-sequence-coverage system 106 determines that the nucleotide read 408e is located downstream relative to the target genomic region 404.
  • the target-sequence-coverage system 106 can determine that the read-growth direction of the nucleotide read 408e is upstream and pointed toward the center of the target genomic region 404. In contrast, the target-sequence-coverage system 106 determines that while the nucleotide read 408f is in a similar location as the nucleotide read 408e, the read-growth direction of the nucleotide read 408f is opposite to the center of the target genomic region 404. Accordingly, the target-sequence-coverage system 106 may determine that the nucleotide read 408f is unlikely to influence coverage of the target genomic region 404.
  • the target-sequence-coverage system 106 estimates read-coverage levels of the target genomic region within one or more genomic samples.
  • FIG. 5 illustrates the target-sequence- coverage system 106 identifying clusters of oligonucleotides provisionally mapped to a reference sequence corresponding with a target genomic region and identifying genomic samples to which the nucleotide sequence reads belong.
  • the target-sequence-coverage system 106 determines an estimated total number of clusters belonging to each genomic sample. As illustrated in FIG. 5, the target- sequence-coverage system 106 performs an act 501 of determining filtered clusters corresponding to the target genomic region. The target-sequence-coverage system 106 can identify clusters of oligonucleotides satisfying a filtering threshold for signals of the subset of clusters of oligonucleotides. More specifically, the target-sequence-coverage system 106 determines quality metrics 520 for the subset of clusters of oligonucleotides.
  • the target-sequence-coverage system 106 captures images of clusters of oligonucleotide within a flow cell region and evaluates the signals emitted from the clusters of oligonucleotides to determine the quality metrics 520.
  • the target-sequence-coverage system 106 can determine quality metrics 520 for clusters of oligonucleotides up to an evaluation cycle of the initial set of sequencing cycles. For example, the target-sequence-coverage system 106 can determine the quality metrics 520 for each sequencing cycle up to the 25th cycle comprising the evaluation cycle of the sequencing run.
  • the quality metrics 520 comprise a chastity value.
  • the term “chastity value” refers to a quality measurement used to assess a confidence or purity of a called nucleobase from a sequencing cycle.
  • the chastity value is a measure of the confidence of the called base at each position within a nucleotide read.
  • the chastity value may be calculated based on the intensity of the fluorescent signals emitted from the clusters of oligonucleotides.
  • the target-sequence-coverage system 106 measures the intensity of each of the four nucleotide-specific fluorescent signals (e.g., within two channels).
  • the target-sequence-coverage system 106 may determine the chastity value by determining a ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. In some examples, the target-sequence-coverage system 106 can report the chastity value as a percent value ranging from 0%-100%.
  • the quality metrics 520 comprise a base-call-quality score.
  • base-call-quality score refers to a specific score or other measurement indicating an accuracy of a nucleobase call.
  • a base-call-quality score comprises a value indicating a likelihood that one or more predicted nucleobase calls for a genomic coordinate contain errors.
  • a base-call-quality score can comprise a Q score (e.g., a Phil’s Read Editor (PhRED) quality' score) predicting the error probability of any given nucleobase call.
  • a base-call-quality score may indicate that a probability of an incorrect nucleobase call at a genomic coordinate is equal to 1 in 100 for a Q20 score, 1 in 1,000 for a Q30 score, 1 in 10,000 for a Q40 score, etc.
  • the base- call-quality score is generated by a machine-learning model or an algorithm, either of which can be scaled to be consistent with a PhRED scale.
  • the quality metrics 520 comprise mapping-quality scores and are determined either during provisional mapping or later mapping of nucleotide reads to a reference sequence or reference genome.
  • mapping-quality score refers to a metric or other measurement quantifying a quality or certainty of an alignment of nucleotide reads (or other nucleotide sequences or subsequences) with a reference sequence or reference genome.
  • a mapping-quality score includes mapping quality (MAPQ) scores for nucleobase calls at genomic coordinates, where a MAPQ score represents -10 loglO Pr ⁇ mapping position is wrong ⁇ , rounded to the nearest integer.
  • a mapping-quality score includes a full distribution of mapping qualities for all nucleotide reads aligning with a reference genome at a genomic coordinate.
  • the target-sequence-coverage system 106 may generate other types of quality metrics 520 for identifying the filtered clusters. For example, in some implementations, the target-sequence-coverage system 106 evaluates si nal-to-noise ratio (SNR) for clusters of oligonucleotides.
  • SNR si nal-to-noise ratio
  • the targetsequence-coverage system 106 can generate a pass filter map 522.
  • the pass filter map 522 is limited to clusters of oligonucleotides corresponding to one or more target genomic regions.
  • the target-sequence-coverage system 106 aggregates the quality metrics 520 for the initial set of sequencing cycles to generate the pass filter map 522 for the subset of clusters of oligonucleotides identified in the genomic sample map 518.
  • the pass filter map 522 provides information about the outcome of quality filtering applied to the subset of clusters of oligonucleotides for the initial set of sequencing cycles.
  • the pass filter map 522 indicates a percentage of clusters at a location that satisfy a filtering threshold over sequencing cycles up to, and in some instances including, the evaluation cycle. For example, the targetsequence-coverage system 106 determines a percent of filter-passing clusters for each cluster of the subset of clusters of oligonucleotides. For example, the target-sequence-coverage system 106 determines that across the sequencing cycles up to the evaluation cycle, 75% of the oligonucleotide cluster 512a comprise filter-passing clusters. Additionally, the target- sequence-coverage system 106 determines that up to the evaluation cycle, 93% of the oligonucleotide cluster 512b comprise filter-passing clusters.
  • the target-sequence-coverage system 106 may utilize one or more filtering thresholds based on the generated base-call-quality metrics. For example, the target-sequence- coverage system 106 may utilize a chastity filtering threshold, a quality score filtering threshold, and/or a mapping-quality filtering threshold to identify clusters of oligonucleotides that satisfy the filtering threshold.
  • the target-sequence-coverage system 106 performs an act 502 of determining a subset of clusters of oligonucleotides producing nucleotide reads provisionally mapped to the reference sequence. More specifically, the target-sequence- coverage system 106 can disregard clusters that did not pass filter and evaluate passing filter oligonucleotide clusters. For instance, the target-sequence-coverage system 106 determines, from filter-passing clusters, a subset of clusters of oligonucleotides provisionally mapped to the reference sequence.
  • the target- sequence-coverage system 106 provisionally maps nucleotide reads to the reference genome.
  • the target-sequence-coverage system 106 captures images of a flow cell comprising clusters of oligonucleotides.
  • the target-sequence-coverage system 106 can estimate clusters of oligonucleotides that originate the nucleotide reads provisionally mapped to the reference sequence corresponding to the target genomic region.
  • the target-sequence- coverage system 106 determines that clusters of oligonucleotides 512a-512b within a flow cell region 510 produce nucleotide reads provisionally mapped to the reference sequence.
  • the target-sequence-coverage system 106 improves efficiency of analysis by limiting its analysis to the subset of clusters of oligonucleotides provisionally mapped to a reference sequence (or mapped within a threshold number of nucleobases within the reference sequence) and ignoring or discarding the other clusters of oligonucleotides not provisionally mapped to the reference sequence (or mapped within the threshold number of nucleobases within the reference sequence).
  • the target-sequence-coverage system 106 may perform an act 504 of determining respective numbers of clusters of oligonucleotides belonging to respective genomic samples. More specifically, the target-sequence-coverage system 106 determines numbers of clusters of oligonucleotides belonging to each genomic sample for the initial set of sequencing cycles. The target-sequence-coverage system 106 determines, based on indexing sequences within the subset of clusters of oligonucleotides, respective numbers of clusters of oligonucleotides belonging to respective genomic samples.
  • the target-sequence-coverage system 106 compares index sequences of nucleotide reads in clusters of oligonucleotides to a reference (or database) of known indexes to determine the genomic sample origin of each nucleotide reads.
  • the target-sequence- coverage system 106 may sort clusters of oligonucleotides based on their originating samples. [0102] For instance, and as shown in FIG. 5, the target-sequence-coverage system 106 accesses raw sequencing data comprising indexing sequences 514 associated with a sample genomic sequence 524 and indexing sequences 516 associated with a sample genomic sequence 526.
  • the indexing sequences 514-516 can comprise “barcodes” that act as unique identifiers for each genomic sample, allowing for differentiation and sorting of the nucleotide reads during demultiplexing.
  • the indexing sequences 514 indicate that the sample genomic sequence 524 comes from genomic sample I.
  • the indexing sequences 516 indicate that the sample genomic sequence 526 originates from genomic sample II.
  • the target-sequence-coverage system 106 demultiplexes nucleotide reads by utilizing a reference of known indexes.
  • FIG. 5 illustrates a reference of registered indexes 528.
  • the target-sequence-coverage system 106 compares indexing sequences with known indexing sequences in the reference of registered indexes 528.
  • the reference of registered indexes 528 associates each index barcode or sequence with its respective genomic sample.
  • the reference of registered indexes 528 stores indexing sequences with their corresponding genomic samples. As shown, genomic samples may correspond with one or more unique barcodes.
  • the target-sequence- coverage system 106 generates and stores a genomic sample map indicating the locations of clusters corresponding with each genomic sample. More particularly, the target-sequence- coverage system 106 generates a genomic sample map 518 that indicates locations of clusters corresponding to each of the genomic samples. As mentioned, the target-sequence-coverage system 106 improves efficiency by analyzing clusters of oligonucleotides producing nucleotide reads provisionally mapped to the reference sequence. Accordingly, the target-sequence- coverage system 106 analyzes indexing sequences for the oligonucleotide clusters 512a-512b to identify their originating genomic samples.
  • the target-sequence- coverage system 106 determines that the oligonucleotide cluster 512a originates from genomic sample I and the oligonucleotide cluster 512b originates from genomic sample II. Based on this analysis, the target-sequence-coverage system 106 can determine numbers of clusters of oligonucleotides belonging to each genomic sample of the genomic samples.
  • the target-sequence-coverage system 106 determines base calls for indexing sequences before determining base calls for nucleotide reads as described by U.S. Patent Application No. 63/517,160, entitled “Modifying Sequencing Cycles or Imaging During a Sequencing Run to Meet Customized Coverage Estimation,” filed August 2, 2023, for Alexander Fuhrmann et al. and assigned to Illumina, Inc., the disclosure of which is incorporated herein by reference in its entirety.
  • the target-sequence-coverage system 106 performs an act 508 of determining an estimated total number of clusters.
  • the target-sequence- coverage system 106 uses the observed respective numbers of clusters of oligonucleotides for the initial set of sequencing cycles to infer an estimated total number of clusters for each genomic sample for the entire sequencing run.
  • the target-sequence-coverage system 106 generates the estimated total number of filtered clusters corresponding to one or more target genomic samples based on the currently selected number of sequencing cycles for the sequencing run and the numbers of clusters of oligonucleotides belonging to each genomic sample of the genomic samples.
  • the target-sequence-coverage system 106 estimates a total number of clusters for a genomic sample for a sequencing run using the following equation:
  • # of initial sequencing cycles where # of clusters for a sample represents the number of clusters belonging to a genomic sample determined as part of the act 504.
  • # of initial sequencing cycles represents the number of sequencing cycles in the initial set of sequencing cycles, and currently selected # of sequencing cycles represents the currently selected number of sequencing cycles.
  • FIG. 6 illustrates an example decision flowchart by which the target-sequence-coverage system 106 executes a sequencing run according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure.
  • the target-sequence- coverage system 106 can adjust a number of sequencing cycles to (i) satisfy a single target read-coverage level for a single target genomic region or (ii) satisfy multiple target readcoverage levels for multiple target genomic regions, respectively.
  • this disclosure focuses its description of FIG.
  • target-sequence-coverage system 106 can adjust the following operations to account for nucleotide reads mapping to multiple target genomic regions and/or multiple corresponding target read-coverage levels.
  • the target-sequence-coverage system 106 may begin at the start 602 by provisionally mapping nucleotide reads to the reference sequence and estimating readcoverage levels of the target genomic region. As further illustrated in FIG. 6, the target- sequence-coverage system 106 can perform an act 604 of generating an adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 can perform an evaluation 606 of the adjusted number of sequencing cycles by determining whether the adjusted number of sequencing cycles is higher than the currently selected number of sequencing cycles. More particularly, the target-sequence-coverage system 106 evaluates the highest adjusted number of sequencing cycles across genomic samples.
  • the target-sequence-coverage system 106 can perform an act 608 of performing the adjusted number of sequencing cycles or the currently selected number of sequencing cycles. For example, the target-sequence-coverage system 106 determines that both the adjusted number of sequencing cycles and the currently selected number of sequencing cycles will likely result in sufficient coverage of the target genomic region across all genomic samples. The target-sequence-coverage system 106 may elect to perform the adjusted number of sequencing cycles to further reduce the amount of consumable resources required to execute additional sequencing cycles. Accordingly, the target-sequence-coverage system 106 can more efficiently utilize consumables by reducing the number of fluidic reagent cycles and reagents required to meet the target coverage level relative to some existing sequencing systems.
  • the target-sequence-coverage system 106 can perform an evaluation 610 of whether the adjusted number of sequencing cycles is above a maximum number of sequencing cycles within a run. If the adjusted number of sequencing cycles is above a maximum number of cycles within a run, the target-sequence-coverage system 106 may determine that an additional sequencing run is required to meet the target readcoverage level for the target genomic region. Accordingly, the target-sequence-coverage system 106 may perform an act 612 of stopping sequencing or completing the maximum number of sequencing cycles. The target-sequence-coverage system 106 may determine to terminate sequencing after the initial set of sequencing cycles to conserve sequencing resources. More specifically, the target-sequence-coverage system 106 can cause the fluidics systems described below with respect to FIGS. 7-8 to terminate the sequencing run.
  • the target-sequence-coverage system 106 may perform the act 612 of stopping sequencing or completing the maximum number of sequencing cycles based on an assessment of reagent volume.
  • the target-sequence-coverage system 106 may determine that a sequencing device includes an insufficient amount of reagent(s) to complete an adjusted number of sequencing cycles.
  • the target-sequence- coverage system 106 evaluates a known amount of reagent in the reagent source or a detected amount of reagents in a sequencing device based on a detection sensor to determine whether the target-sequence-coverage system 106 may successfully complete the adjusted number of sequencing cycles with the known amount or detected amount of reagent.
  • the target-sequence-coverage system 106 performs an additional act of providing, to a client device (e.g., the client device 114), a notification that existing levels of reagent are insufficient to complete the adjusted number of sequencing cycles such that an automated system or an operator can provide additional reagent.
  • the target-sequence-coverage system 106 may then continue to perform the adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 may generate and execute an adjusted number of sequencing cycles sufficient to satisfy target read-coverage levels for the maximum number of target genomic regions and genomic samples. For example, the target-sequence-coverage system 106 may determine that an adjusted number of sequencing cycles to satisfy target read-coverage levels for all target genomic regions and genomic samples exceeds the maximum number of sequencing cycles within a run. In some examples, the target-sequence-coverage system 106 may determine an adjusted number of sequencing cycles that falls within a maximum number of cycles within a sequencing run.
  • the target-sequence-coverage system 106 may further determine deficient target genomic regions and/or deficient genomic samples that are unlikely to meet their target read-coverage levels.
  • the target-sequence- coverage system 106 may provide, to a client device (e.g., the client device 114), a notification indicating the deficient target genomic regions and/or deficient genomic samples that are unlikely to meet their target read-coverage levels.
  • the target-sequence-coverage system 106 may receive, from the client device, an indication to stop sequencing or to complete the maximum number of cy cles.
  • the target-sequence-coverage system 106 can determine to complete the maximum number of cycles in the sequencing run as part of the act 612.
  • the target-sequence-coverage system 106 can estimate maximum read-coverage levels of the target genomic region based on the maximum number of sequencing cycles in a sequencing run. Based on determining that the maximum estimated read-coverage levels are within a threshold range of the target read-coverage level, the target-sequence-coverage system 106 can complete the maximum number of sequencing cycles.
  • the target-sequence-coverage system 106 can perform an act 614 of continuing the adjusted number of sequencing cycles. Generally, the target-sequence-coverage system 106 continues to perform sequencing cycles after the initial set of sequencing cycles. In some implementations, the target-sequence- coverage system 106 performs the adjusted number of sequencing cycles to completion of the sequencing run.
  • FIG. 6 illustrates the target-sequence-coverage system 106 performing an act 616 of estimating an updated read-coverage level at a checkpoint sequencing cycle.
  • the target-sequence-coverage system 106 performs a provisional re-mapping of the nucleotide reads at a checkpoint sequencing cycle.
  • the act 616 is an optional act and can be performed iteratively. For instance, the target-sequence-coverage system 106 can continue to estimate updated read- coverage levels at different checkpoint sequencing cycles to ensure that the target readcoverage level will be met.
  • the target-sequence-coverage system 106 can further perform an evaluation 618 of whether additional sequencing cycles are required to meet the target-read-coverage level based on the updated read-coverage-level. If additional sequencing cycles are required to meet the target-read-coverage level, the target- sequence-coverage system 106 can perform an act 620 of generating and performing an updated adjusted number of sequencing cycles. In some embodiments, the target-sequence- coverage system 106 evaluates the updated adjusted number of sequencing cycles in a similar way to how it evaluates the adjusted number of sequencing cycles. For instance, the target- sequence-coverage system 106 can determine whether the updated adjusted number of sequencing cycles is above a maximum number of cycles within a run. Based on this determination, the target-sequence-coverage system 106 can elect to terminate the sequencing run or complete the maximum number of sequencing cycles.
  • the target-sequence-coverage system 106 can perform the act 608 of performing the adjusted number of sequencing cycles.
  • the target-sequence-coverage system 106 can terminate a sequencing run or adjust a number of sequencing cycles to meet a target read-coverage level.
  • the targetsequence-coverage system 106 may execute the determined number of sequencing cycles by utilizing fluidics systems.
  • FIG. 7 illustrates a schematic diagram of an example of a system (700) that may be used to perform an analysis on one or more samples of interest.
  • the sample may include one or more clusters of nucleotides (e.g., DNA) that have been linearized to form a single stranded DNA (sstDNA).
  • system (700) is configured to receive a flow cell cartridge assembly (702) including a flow cell assembly (703) and a sample cartridge (704).
  • System (700) includes a flow cell receptacle (722) that receives flow cell cartridge assembly (702), a vacuum chuck (724) that supports flow cell assembly (703), and a flow cell interface (726) that is used to establish a fluidic coupling between system (700) and flow cell assembly (703).
  • Flow cell interface (726) may include one or more manifolds.
  • System (700) further includes a sipper manifold assembly (706), a sample loading manifold assembly (708), and a pump manifold assembly (710).
  • System (700) also includes a drive assembly (712), a controller (714), an imaging system (716), and a waste reservoir (718).
  • Controller (714) is electrically and/or communicatively coupled to drive assembly (712) and to imaging system (716); and is configured to cause drive assembly (712) and/or the imaging system (716) to perform various functions as disclosed herein.
  • flow cell assembly (703) includes a flow cell (728) having a channel (730) and defining a plurality of first openings (732), which are fluidically coupled to the channel (730) and arranged on a first side (734) of the channel (730).
  • Flow cell (728) further includes a plurality of second openings (736) fluidically coupled to the channel (730) and arranged on a second side (738) of the channel (730). Fluid may thus flow through flow cell (728) via channel. While the flow cell (728) is shown including one channel (730), flow cell (728) may include two or more channels (730).
  • Flow cell assembly (703) also includes a flow cell manifold assembly (740) coupled to flow cell (728) and having a first manifold fluidic line (742) and a second manifold fluidic line (744).
  • Flow cell manifold assembly (740) may be in the form of a laminate including a plurality of layers as discussed in more detail below.
  • first manifold fluidic line (742) has a first fluidic line opening (746) and is fluidically coupled to each of the plurality of first openings (732) of flow cell (728); and second manifold fluidic line (744) has a second fluidic line opening (748) and is fluidically coupled to each of the second openings (736).
  • flow cell assembly (703) includes gaskets (750) coupled to flow cell manifold assembly (740) and fluidically coupled to fluidic line openings (746, 748).
  • flow cell manifold assembly (740) may include additional fluidic lines (752) that couple first fluidic line openings (746) to a single manifold port (754).
  • a single gasket (750) may be coupled to flow cell manifold assembly (740) that surrounds the manifold port (754) and is in fluidic communication with a plurality of channels (730).
  • flow cell interface (726) engages with corresponding gaskets (750) to establish a fluidic coupling between system (700) and flow cell (728).
  • the engagement between flow cell interface (726) and gaskets (750) reduces or eliminates fluid leakage between flow cell interface (726) and flow cell (728).
  • first manifold fluidic line (742) has a portion (756) that is substantially parallel to a longitudinal axis (758) of channel (730); and second manifold fluidic line (744) has a portion (760) that is substantially parallel to longitudinal axis (758) of channel (730). Additionally , first manifold fluidic line (742) is shown being at least partially adjacent a first end (762) of flow cell (728) and spaced from a second end (764) of flow cell (728); and second manifold fluidic line (744) is shown being at least partially adjacent second end (764) of flow cell (728) and spaced from first end (762). Other arrangements of manifold fluidic lines (742, 744) may prove suitable, however.
  • system (700) includes a sample cartridge receptacle (766) that receives sample cartridge (704) that carries one or more samples of interest (e.g., an analyte).
  • System (700) also includes a sample cartridge interface (768) that establishes a fluidic connection with sample cartridge (704).
  • Sample loading manifold assembly (708) includes one or more sample valves (770).
  • Pump manifold assembly (710) includes one or more pumps (772), one or more pump valves (774), and a cache (776). Valves (770, 774) and pumps (772) may take any suitable form.
  • Cache (776) may include a serpentine cache and may temporarily store one or more reaction components during, for example, bypass manipulations of the system (700).
  • cache (776) is shown being included in pump manifold assembly (710), cache (776) may alternatively be located elsewhere (e.g., in sipper manifold assembly (706) or in another manifold downstream of a bypass fluidic line (778), etc.).
  • Sample loading manifold assembly (708) and pump manifold assembly (710) flow one or more samples of interest from sample cartridge (704) through a fluidic line (780) toward flow cell cartridge assembly (702).
  • sample loading manifold assembly (708) may individually load or address each channel (730) of flow cell (728) with a respective sample of interest. The process of loading channel (730) with a sample of interest may occur automatically using sy stem (700).
  • sample cartridge (704) and sample loading manifold assembly (708) are positioned downstream of flow cell cartridge assembly (702).
  • sample loading manifold assembly (708) is coupled between flow cell cartridge assembly (702) and pump manifold assembly (710).
  • sample valves (770), pump valves (774), and/or pumps (772) may be selectively actuated to urge the sample of interest toward pump manifold assembly (710).
  • Sample cartridge (704) may include a plurality of sample reservoirs that are selectively fluidically accessible via the corresponding sample valves (770).
  • sample valves (770), pump valves (774), and/or pumps (772) may be selectively actuated to urge the sample of interest toward flow cell cartridge assembly (702) and into respective channels (730) of flow cell (728).
  • Drive assembly (712) interfaces with sipper manifold assembly (706) and pump manifold assembly (710) to flow one or more reagents that interact with the sample within flow cell (728).
  • a reversible terminator is attached to the reagent to allow a single nucleotide to be incorporated onto a growing DNA strand.
  • one or more of the nucleotides has a unique fluorescent label that emits a color when excited. The color (or absence thereof) is used to detect the corresponding nucleotide.
  • imaging system (716) excites one or more of the identifiable labels (e.g., a fluorescent label) and thereafter obtains image data for the identifiable labels.
  • the labels may be excited by incident light and/or a laser and the image data may include one or more colors emitted by the respective labels in response to the excitation.
  • the image data (e.g., detection data) may be analyzed by system (700). Examples of features and functionalities that may be incorporated into imaging system (716) will be described in greater detail below.
  • drive assembly (712) interfaces with sipper manifold assembly (706) and pump manifold assembly (710) to flow another reaction component (e.g., a reagent) through flow cell (728) that is thereafter received by waste reservoir (718) via a primary waste fluidic line (782) and/or otherwise exhausted by system (700).
  • reaction components may perform a flushing operation that chemically cleaves the fluorescent label and the reversible terminator from the sstDNA. The sstDNA may then be ready for another cycle.
  • the primary waste fluidic line (782) is coupled between pump manifold assembly (710) and waste reservoir (718).
  • pumps (772) and/or pump valves (774) of pump manifold assembly (710) selectively flow the reaction components from flow cell cartridge assembly (702), through fluidic line (780) and sample loading manifold assembly (708) to primary waste fluidic line (782).
  • Flow cell cartridge assembly (702) is coupled to a central valve (784) via flow cell interface (726).
  • Central valve (784) is coupled with flow cell interface (726) via afluidic line (785).
  • An auxiliary waste fluidic line (786) is coupled to central valve (784) and to waste reservoir (718).
  • auxiliary waste fluidic line (786) receives excess fluid of a sample of interest from flow cell cartridge assembly (702), via central valve (784), and flows the excess fluid of the sample of interest to waste reservoir (718) when back loading the sample of interest into flow cell (728), as described herein.
  • Sipper manifold assembly (706) includes a shared line valve (788) and a bypass valve (790). Shared line valve (788) may be referred to as a reagent selector valve. Central valve (784) and the valves (788, 790) of sipper manifold assembly (706) may be selectively actuated to control the flow of fluid through fluidic lines (792, 794, 796). Sipper manifold assembly (706) may be coupled to a corresponding number of reagent reservoirs (798) via reagent sippers (800). Reagent reservoirs (798) may contain fluid (e.g., reagent and/or another reaction component). In some implementations, sipper manifold assembly (706) includes a plurality of ports.
  • Each port of sipper manifold assembly (706) may receive one of the reagent sippers (800).
  • Reagent sippers (800) may be referred to as fluidic lines.
  • Some forms of reagent sippers (800) may include an array of sipper tubes extending downwardly along the z- dimension from ports in the body of sipper manifold assembly (706).
  • Reagent reservoirs (798) may be provided in a cartridge, and the tubes of reagent sippers (800) may be configured to be inserted into corresponding reagent reservoirs (798) in the reagent cartridge so that liquid reagent may be drawn from each reagent reservoir (798) into the sipper manifold assembly (706).
  • Shared line valve (788) of sipper manifold assembly (706) is coupled to central valve (784) via shared reagent fluidic line (796). Different reagents may flow through shared reagent fluidic line (796) at different times.
  • pump manifold assembly (710) may draw wash buffer through shared reagent fluidic line (796), central valve (784), and flow cell cartridge assembly (702).
  • Bypass valve (790) of sipper manifold assembly (706) is coupled to central valve (784) via dedicated reagent fluidic lines (794, 796).
  • Each of the dedicated reagent fluidic lines (794, 796) may be associated with a single reagent.
  • the fluids that may flow through dedicated reagent fluidic lines (794, 796) may be used during sequencing operations and may include a cleave reagent, an incorporation reagent, a scan reagent, a cleave wash, and/or a wash buffer.
  • Bypass valve (790) is also coupled to cache (776) of pump manifold assembly (710) via bypass fluidic line (778).
  • One or more reagent priming operations, hydration operations, mixing operations, and/or transfer operations may be performed using bypass fluidic line (778).
  • the priming operations, the hydration operations, the mixing operations, and/or the transfer operations may be performed independent of flow cell cartridge assembly (702).
  • the operations using bypass fluidic line (778) may occur during, for example, incubation of one or more samples of interest within flow cell cartridge assembly (702).
  • Drive assembly (712) includes a pump drive assembly (802) and a valve drive assembly (804).
  • Pump drive assembly (802) may be adapted to interface with one or more pumps (772) to pump fluid through flow cell (728) and/or to load one or more samples of interest into flow cell (728).
  • Valve drive assembly (804) may be adapted to interface with one or more of the valves (770, 774, 784, 788, 790) to control the position of the corresponding valves (770, 774, 784, 788, 790).
  • FIG. 8 shows an example of a fluidic arrangement (820) that may be incorporated into a variation of system (700).
  • Fluidic arrangement (820) of this example includes a pump manifold assembly (822), which may operate similar to pump manifold assembly (710) described above; a sample loading manifold assembly (828), which may operate similar to sample loading manifold assembly (708) described above; a flow cell interface (840), which may operate similar to flow cell interface (726) described above; a sipper manifold assembly (850), which may operate similar to sipper manifold assembly (706) described above; and a waste reservoir (870), which may operate similar to waste reservoir (718) described above.
  • Pump manifold assembly (822) is coupled with a port assembly (858) of sipper manifold assembly (850) via a fluidic line (824), which may be similar to fluidic line (778); and with sample loading manifold assembly (828) via a fluidic line (826).
  • Sample loading manifold assembly (828) is coupled with flow cell interface (840) via fluidic line (830), which may be similar to fluidic line (780); and with port assembly (858) via fluidic lines (832, 834).
  • Flow cell interface (840) is coupled with sipper manifold assembly (850) via fluidic line (842), which may be similar to fluidic line (785).
  • Sipper manifold assembly (850) includes a manifold body (852) and a common output port (856), which provides fluid communication via fluidic line
  • a valve assembly (854) controls fluid flow through common output port (856) and may operate similar to central valve (784).
  • Port assembly (858) of sipper manifold assembly (850) is coupled with waste reservoir (870) via fluidic line (872), which may be similar to fluidic line
  • a plurality of reagent sippers (860) extend from manifold body (852) and are fluidically coupled with valve assembly (854) via respective fluid channels (862) in manifold body (852).
  • Reagent sippers (860) may operate similar to reagent sippers (800).
  • Valve assembly (854) is operable to selectively couple fluid channels (862) with flow cell interface (840) via common output port (856) and fluidic line (830), to thereby selectively provide various reagents to flow cell interface (840).
  • a flow cell e.g., like flow cell (728)
  • flow cell interface 840
  • a plurality of reagent sippers (860) extend from manifold body (852) and are fluidically coupled with valve assembly (854) via respective fluid channels (862) in manifold body (852).
  • Reagent sippers (860) may operate similar to reagent sippers (800).
  • Valve assembly (854) is operable to selectively couple fluid channels (862) with flow cell interface (840) via common output port (856) and fluidic line (830), to thereby selectively provide various reagents to flow cell interface (840).
  • a flow cell e.g., like flow cell (728)
  • flow cell interface 840
  • controller (714) of the present example includes a user interface (806), a communication interface (808), one or more processors (810), and a memory (812) storing instructions executable by the one or more processors (810) to perform various functions including the disclosed implementations.
  • User interface (806), communication interface (733), and memory (812) are electrically and/or communicatively coupled to the one or more processors (810).
  • User interface (806) may be adapted to receive input from a user and to provide information to the user associated with the operation of system (700) and/or an analysis taking place.
  • User interface (806) may include a touch screen, a display, a keyboard, a speaker(s), a mouse, a track ball, and/or a voice recognition system.
  • Communication interface (808) is adapted to enable communication between system (700) and a remote system(s) (e.g., computers) via a network(s) (e.g., the Internet, an intranet, a local-area network (LAN), a wide-area network (WAN), a coaxial-cable network, a wireless network, a wired network, a satellite network, a digital subscriber line (DSL) network, a cellular network, a Bluetooth connection, a near field communication (NFC) connection, etc.).
  • a network(s) e.g., the Internet, an intranet, a local-area network (LAN), a wide-area network (WAN), a coaxial-cable network, a wireless network, a wired network, a satellite network, a digital subscriber line (DSL) network, a cellular network, a Bluetooth connection, a near field communication (NFC) connection, etc.
  • Some of the communications provided to the remote system may be associated with analysis results, imaging data, etc
  • the one or more processors (810) and/or system (700) may include one or more of a processor-based system(s) or a microprocessor-based system(s).
  • the one or more processors (810) and/or system (700) includes one or more of a programmable processor, a programmable controller, a microprocessor, a microcontroller, a graphics processing unit (GPU), a digital signal processor (DSP), a reduced-instruction set computer (RISC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a field programmable logic device (FPLD), a logic circuit, and/or another logic-based device executing various functions including the ones described herein.
  • Memory (812) may include one or more of a semiconductor memory, a magnetically readable memory, an optical memory, a hard disk drive (HDD), an optical storage drive, a solid-state storage device, a solid-state drive (SSD), a flash memory, a read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), a random-access memory (RAM), a nonvolatile RAM (NVRAM) memory, a compact disc (CD), a compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a Blu-ray disk, a redundant array of independent disks (RAID) system, a cache and/or any other storage device or storage disk in which information is stored for any duration (e.g., permanently, temporarily, for extended periods of time, for buffering, for caching).
  • HDD hard disk drive
  • SSD solid-state drive
  • flash memory a read-only memory
  • ROM read-only memory
  • FIGS. 1-8, the corresponding text, and the examples provide a number of different methods, systems, devices, and non-transitory computer-readable media of the target- sequence-coverage system 106.
  • FIG. 9 illustrates a flowchart of a series of acts 900 for executing a sequencing run according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure. While FIG. 9 illustrates acts according to one embodiment, alternative embodiments may omit, add to, reorder, and/or modify any of the acts shown in FIG. 9. The acts of FIG.
  • a non- transitory computer readable storage medium can comprise instructions that, when executed by one or more processors, cause a computing device or a system to perform the acts depicted in FIG. 9.
  • a system comprising an imaging system, a fluidic system, and a computer comprising: at least one processor; and a non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the system to perform the acts of FIG. 9.
  • the series of acts 900 includes an act 902 of receiving data input identifying a target genomic region, an act 904 of provisionally mapping nucleotide reads to a reference sequence corresponding to the target genomic region, an act 906 of estimating read-coverage levels of the target genomic region, an act 908 of generating an adjusted number of sequencing cycles, and an act 910 of executing the sequencing run according to the adjusted number of sequencing cycles.
  • the series of acts 900 can include acts to perform any of the operations described in the following clauses:
  • a method comprising: receiving, for a sequencing run, data input identifying a target genomic region for one or more genomic samples and a target read-coverage level for the target genomic region; provisionally mapping, after an initial set of sequencing cycles and during the sequencing run, nucleotide reads of the one or more genomic samples to a reference sequence corresponding to the target genomic region; estimating, during the sequencing run, read-coverage levels of the target genomic region within the one or more genomic samples based on the nucleotide reads of the one or more genomic samples provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the sequencing run; generating, for the sequencing run and based on the estimated read-coverage levels, an adjusted number of sequencing cycles sufficient to satisfy the target readcoverage level within the target genomic region for one or more genomic samples of the one or more genomic samples; and executing the sequencing run according to the adjusted number of sequencing cycles.
  • CLAUSE 2 The method of clause 1, further comprising: determining a checkpoint sequencing cycle within a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles; provisionally re-mapping, at the checkpoint sequencing cycle, nucleotide reads of the one or more genomic samples to the reference sequence corresponding to the target genomic region; and estimating, for the target genomic region of the one or more genomic samples and at or after the checkpoint sequencing cycle, an updated read-coverage level based on the nucleotide reads of the one or more genomic samples provisionally re-mapped to the reference sequence.
  • CLAUSE 3 The method of clause 1, further comprising provisionally mapping the nucleotide reads of the one or more genomic samples to the reference sequence by: provisionally mapping one or more nucleotide reads of the one or more genomic samples to the target genomic region within a reference genome; or provisionally mapping one or more nucleotide reads of the one or more genomic samples to an adjacent genomic region within a threshold number of nucleobases of the target genomic region within the reference genome.
  • CLAUSE 4 The method of clause 1, further comprising terminating sequencing cycles of the sequencing run after the adjusted number of sequencing cycles finish.
  • CLAUSE 5 The method of clause 1, further comprising: determining locations of the nucleotide reads of the one or more genomic samples within the reference sequence corresponding to the target genomic region; determining read-growth directions of the nucleotide reads growing upstream or downstream with respect to the target genomic region; and estimating the read-coverage levels of the target genomic region within the one or more genomic samples based on the locations and read-growth directions of the nucleotide reads of the one or more genomic samples provisionally mapped to the reference sequence.
  • CLAUSE 6 The method of clause 1, further comprising estimatingthe read-coverage levels of the target genomic region within the one or more genomic samples by: determining, from a set of clusters of oligonucleotides for the sequencing run, a subset of clusters of oligonucleotides producing nucleotide reads provisionally mapped to the reference sequence; determining, based on indexing sequences within the subset of clusters of oligonucleotides, respective numbers of clusters of oligonucleotides belonging to respective genomic samples of the one or more genomic samples; and generating, for each genomic sample of the one or more genomic samples, an estimated total number of clusters of oligonucleotides based on the respective numbers of clusters of oligonucleotides and the currently selected number of sequencing cycles for the sequencing run.
  • CLAUSE 7 The method of clause 1, further comprising: determining, based on an estimated read-coverage level for a genomic sample of the one or more genomic samples at the target genomic region, the genomic sample is unlikely to satisfy the target read-coverage level within a threshold number of sequencing cycles of the adjusted number of sequencing cycles; and based on the genomic sample being unlikely to satisfy the target read-coverage level within the threshold number of sequencing cycles, terminating the sequencing run after the adjusted number of sequencing cycles.
  • CLAUSE 8 The method of clause 1, further comprising: determining, based on an estimated read-coverage level for a genomic sample of the one or more genomic samples at the target genomic region, the genomic sample is likely to satisfy the target read-coverage level within a threshold number of sequencing cycles after the adjusted number of sequencing cycles; and based on the genomic sample being likely to satisfy the target read-coverage level within the threshold number of sequencing cycles, continuing the sequencing run until the threshold number of sequencing cycles after the adjusted number of sequencing cycles.
  • CLAUSE 9 The method of clause 1, further comprising perform sequencing cycles of the sequencing run by: determining, for clusters of oligonucleotides immobilized on a nucleotide-sample substrate, base calls as part of paired-end nucleotide reads comprising first read mates and a second read mates; or determining, for clusters of oligonucleotides immobilized on the nucleotide-sample substrate, base calls as part of single-end nucleotide reads.
  • CLAUSE 10 The method of clause 1, further comprising: provisionally re-mapping, during a subset of sequencing cycles for first read mates and second read mates of paired-end nucleotide reads, a subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence corresponding to the target genomic region; and estimating, for the target genomic region of the one or more genomic samples and during the subset of sequencing cycles for the first read mates and the second read mates, updated read-coverage levels based on the subset of paired-end nucleotide reads provisionally re-mapped to the reference sequence.
  • CLAUSE 11 The method of clause 10, further comprising: estimating the read-coverage levels during the sequencing run in part by estimating the read-coverage levels for the target genomic region based on first read mates of the subset of paired-end nucleotide reads provisionally mapped to the reference sequence; provisionally re-mapping the subset of paired-end nucleotide reads in part by remapping, during a first subset of sequencing cycles for the first read mates of the paired-end nucleotide reads, a first subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence; estimating the updated read-coverage levels in part by estimating, for the target genomic region of the one or more genomic samples, a first updated read-coverage level before a last sequencing cycle of first subset of sequencing cycles for the first read mates; and performing the first subset of sequencing cycles for the first read mates until finishing a safety threshold number of sequencing cycles based on the first updated read-cover
  • CLAUSE 12 The method of clause 11, further comprising: provisionally re-mapping the subset of paired-end nucleotide reads in part by remapping, during a second subset of sequencing cycles for the second read mates of the paired- end nucleotide reads, a second subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence; estimating the updated read-coverage levels in part by estimating, for the target genomic region of the one or more genomic samples, a second updated read-coverage level before a last sequencing cycle of the second subset of sequencing cycles for the second read mates; and generating, for the sequencing run and based on the second updated read-coverage level, an updated adjusted number of sequencing cycles sufficient to satisfy the target readcoverage level within the target genomic region for the one or more genomic samples of the one or more genomic samples.
  • CLAUSE 13 The method of clause 1, further comprising: provisionally mapping, after the initial set of sequencing cycles and during the sequencing run, additional nucleotide reads of the one or more genomic samples to an additional reference sequence corresponding to an additional target genomic region; estimating, during the sequencing run, additional read-coverage levels of the additional target genomic region within the one or more genomic samples based on the additional nucleotide reads of the one or more genomic samples provisionally mapped to the additional reference sequence and the currently selected number of sequencing cycles for the sequencing run; and generating, for the sequencing run and based on the estimated read-coverage levels and the additional read-coverage levels, the adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region and the additional target genomic region for one or more genomic samples of the one or more genomic samples.
  • CLAUSE 14 The method of clause 1, further comprising: receive, for the sequencing run, data input identifying the target read-coverage level for a first genomic sample of the one or more genomic samples and an additional target readcoverage level for the target genomic region of a second genomic sample of the one or more genomic samples; generate, for the sequencing run and based on the estimated read-coverage levels, an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for the first genomic sample and the additional target read-coverage level within the target genomic region for the second genomic sample; and execute the sequencing run according to the adjusted number of sequencing cycles.
  • CLAUSE 15 The method of clause 1, further comprising perform sequencing cycles of the sequencing run according to an order of indexing cycles before genomic sequencing cycles by: determining base calls for a first indexing sequence appended to a sample genomic sequence of a genomic sample; determining base calls for a second indexing sequence appended to the sample genomic sequence of the genomic sample; and after determining the base calls for the first indexing sequence and the second indexing sequence, determining base calls for a first nucleotide read corresponding to a first portion of the sample genomic sequence and determining base calls for a second nucleotide read corresponding to a second portion of the sample genomic sequence.
  • CLAUSE 16 The method of clause 1, further comprising generating the adjusted number of sequencing cycles for the sequencing run by increasing or decreasing a preset number of sequencing cycles for the sequencing run.
  • CLAUSE 17 The method of clause 1, further comprising detecting a reagent volume of a reagent cartridge in fluid communication with the fluidic system and operate the fluidic system to perform one or more additional sequencing cycles relative to the currently selected number of sequencing cycles until finishing the adjusted number of sequencing cycles by aspirating one or more reagents from the reagent cartridge.
  • CLAUSE 18 The method of clause 1, further comprising terminating operation of the fluidic system from performing one or more sequencing cycles of the currently selected number of sequencing cycles to finish the sequencing run after performing the adjusted number of sequencing cycles.
  • nucleic acid sequencing techniques can be used in conjunction with a variety of nucleic acid sequencing techniques. Particularly applicable techniques are those wherein nucleic acids are attached at fixed locations in an array such that their relative positions do not change and wherein the array is repeatedly imaged. Embodiments in which images are obtained in different color channels, for example, coinciding with different labels used to distinguish one nucleobase type from another are particularly applicable.
  • the process to determine the nucleotide sequence of a target nucleic acid i.e., a nucleic-acid polymer
  • Preferred embodiments include sequencing-by-synthesis (SBS) techniques.
  • SBS techniques generally involve the enzymatic extension of a nascent nucleic acid strand through the iterative addition of nucleotides against a template strand.
  • a single nucleotide monomer may be provided to a target nucleotide in the presence of a polymerase in each delivery.
  • more than one type of nucleotide monomer can be provided to a target nucleic acid in the presence of a polymerase in a delivery.
  • SBS can utilize nucleotide monomers that have a terminator moiety or those that lack any terminator moieties.
  • Methods utilizing nucleotide monomers lacking terminators include, for example, pyrosequencing and sequencing using y-phosphate-labeled nucleotides, as set forth in further detail below.
  • the number of nucleotides added in each cycle is generally variable and dependent upon the template sequence and the mode of nucleotide delivery .
  • the terminator can be effectively irreversible under the sequencing conditions used as is the case for traditional Sanger sequencing which utilizes dideoxynucleotides, or the terminator can be reversible as is the case for sequencing methods developed by Solexa (now Illumina, Inc.).
  • SBS techniques can utilize nucleotide monomers that have a label moiety or those that lack a label moiety. Accordingly, incorporation events can be detected based on a characteristic of the label, such as fluorescence of the label; a characteristic of the nucleotide monomer such as molecular weight or charge; a byproduct of incorporation of the nucleotide, such as release of pyrophosphate; or the like.
  • a characteristic of the label such as fluorescence of the label
  • a characteristic of the nucleotide monomer such as molecular weight or charge
  • a byproduct of incorporation of the nucleotide such as release of pyrophosphate; or the like.
  • the different nucleotides can be distinguishable from each other, or alternatively, the two or more different labels can be the indistinguishable under the detection techniques being used.
  • the different nucleotides present in a sequencing reagent can have different labels and they can be distinguished using appropriate optics as exemplified by
  • Preferred embodiments include pyrosequencing techniques. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M. and Nyren, P. (1996) "Real-time DNA sequencing using detection of pyrophosphate release.” Analytical Biochemistry 242(1), 84-9; Ronaghi, M. (2001) "Pyrosequencing sheds light on DNA sequencing.” Genome Res. 11(1), 3-11; Ronaghi, M., Uhlen, M. and Nyren, P.
  • PPi inorganic pyrophosphate
  • the nucleic acids to be sequenced can be attached to features in an array and the array can be imaged to capture the chemiluminescent signals that are produced due to incorporation of a nucleotides at the features of the array.
  • An image can be obtained after the array is treated with a particular nucleotide type (e.g., A, T, C or G). Images obtained after addition of each nucleotide type will differ with regard to which features in the array are detected. These differences in the image reflect the different sequence content of the features on the array. However, the relative locations of each feature will remain unchanged in the images.
  • the images can be stored, processed and analyzed using the methods set forth herein. For example, images obtained after treatment of the array with each different nucleotide type can be handled in the same way as exemplified herein for images obtained from different detection channels for reversible terminator-based sequencing methods.
  • cycle sequencing is accomplished by stepwise addition of reversible terminator nucleotides containing, for example, a cleavable or photobleachable dye label as described, for example, in WO 04/018497 and U.S. Pat. No. 7,057,026, the disclosures of which are incorporated herein by reference.
  • This approach is being commercialized by Solexa (now Illumina Inc.), and is also described in WO 91/06678 and WO 07/123,744, each of which is incorporated herein by reference.
  • the availability of fluorescently-labeled terminators in which both the termination can be reversed and the fluorescent label cleaved facilitates efficient cyclic reversible termination (CRT) sequencing.
  • Polymerases can also be co-engineered to efficiently incorporate and extend from these modified nucleotides.
  • the labels do not substantially inhibit extension under SBS reaction conditions.
  • the detection labels can be removable, for example, by cleavage or degradation. Images can be captured following incorporation of labels into arrayed nucleic acid features.
  • each cycle involves simultaneous delivery of four different nucleotide types to the array and each nucleotide type has a spectrally distinct label. Four images can then be obtained, each using a detection channel that is selective for one of the four different labels. Alternatively, different nucleotide types can be added sequentially and an image of the array can be obtained between each addition step.
  • each image will show nucleic acid features that have incorporated nucleotides of a particular type. Different features are present or absent in the different images due the different sequence content of each feature. However, the relative position of the features will remain unchanged in the images. Images obtained from such reversible terminator-SBS methods can be stored, processed and analyzed as set forth herein. Following the image capture step, labels can be removed and reversible terminator moi eties can be removed for subsequent cycles of nucleotide addition and detection. Removal of the labels after they have been detected in a particular cycle and prior to a subsequent cycle can provide the advantage of reducing background signal and crosstalk between cycles. Examples of useful labels and removal methods are set forth below.
  • nucleotide monomers can include reversible terminators.
  • reversible terminators/cleavable fluors can include fluor linked to the ribose moiety via a 3' ester linkage (Metzker, Genome Res. 15: 1767- 1776 (2005), which is incorporated herein by reference).
  • Other approaches have separated the terminator chemistry from the cleavage of the fluorescence label (Ruparel et al., Proc Natl Acad Sci USA 102: 5932-7 (2005), which is incorporated herein by reference in its entirety).
  • Ruparel et al described the development of reversible terminators that used a small 3' allyl group to block extension, but could easily be deblocked by a short treatment with a palladium catalyst.
  • the fluorophore was attached to the base via a photocleavable linker that could easily be cleaved by a 30 second exposure to long wavelength UV light.
  • disulfide reduction or photocleavage can be used as a cleavable linker.
  • Another approach to reversible termination is the use of natural termination that ensues after placement of a bulky dye on a dNTP.
  • the presence of a charged bulky' dye on the dNTP can act as an effective terminator through steric and/or electrostatic hindrance.
  • Some embodiments can utilize detection of four different nucleotides using fewer than four different labels.
  • SBS can be performed utilizing methods and systems described in the incorporated materials of U.S. Patent Application Publication No. 2013/0079232.
  • a pair of nucleotide types can be detected at the same wavelength, but distinguished based on a difference in intensity for one member of the pair compared to the other, or based on a change to one member of the pair (e g. via chemical modification, photochemical modification or physical modification) that causes apparent signal to appear or disappear compared to the signal detected for the other member of the pair.
  • nucleotide types can be detected under particular conditions while a fourth nucleotide type lacks a label that is detectable under those conditions, or is minimally detected under those conditions (e.g., minimal detection due to background fluorescence, etc.). Incorporation of the first three nucleotide types into a nucleic acid can be determined based on presence of their respective signals and incorporation of the fourth nucleotide type into the nucleic acid can be determined based on absence or minimal detection of any signal.
  • one nucleotide type can include label(s) that are detected in two different channels, whereas other nucleotide types are detected in no more than one of the channels.
  • An exemplary embodiment that combines all three examples is a fluorescent-based SBS method that uses a first nucleotide type that is detected in a first channel (e.g. dATP having a label that is detected in the first channel when excited by a first excitation wavelength), a second nucleotide type that is detected in a second channel (e.g. dCTP having a label that is detected in the second channel when excited by a second excitation wavelength), a third nucleotide type that is detected in both the first and the second channel (e.g.
  • dTTP having at least one label that is detected in both channels when excited by the first and/or second excitation wavelength
  • a fourth nucleotide type that lacks a label that is not, or minimally, detected in either channel (e.g. dGTP having no label).
  • sequencing data can be obtained using a single channel.
  • the first nucleotide type is labeled but the label is removed after the first image is generated, and the second nucleotide type is labeled only after a first image is generated.
  • the third nucleotide type retains its label in both the first and second images, and the fourth nucleotide type remains unlabeled in both images.
  • Some embodiments can utilize sequencing by ligation techniques. Such techniques utilize DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides.
  • the oligonucleotides typically have different labels that are correlated with the identity of a particular nucleotide in a sequence to which the oligonucleotides hybridize.
  • images can be obtained following treatment of an array of nucleic acid features with the labeled sequencing reagents. Each image will show nucleic acid features that have incorporated labels of a particular type. Different features are present or absent in the different images due the different sequence content of each feature, but the relative position of the features will remain unchanged in the images.
  • Some embodiments can utilize nanopore sequencing (Deamer, D. W. & Akeson, M. "Nanopores and nucleic acids: prospects for ultrarapid sequencing.” Trends Biotechnol. 18, 147-151 (2000); Deamer, D. and D. Branton, “Characterization of nucleic acids by nanopore analysis”. Acc. Chem. Res. 35:817-825 (2002); Li, J., M. Gershow, D. Stein, E. Brandin, and J. A. Golovchenko, "DNA molecules and configurations in a solid-state nanopore microscope” Nat. Mater. 2:611-615 (2003), the disclosures of which are incorporated herein by reference in their entireties).
  • the target nucleic acid passes through a nanopore.
  • the nanopore can be a synthetic pore or biological membrane protein, such as a-hemolysin.
  • each base-pair can be identified by measuring fluctuations in the electrical conductance of the pore.
  • Some embodiments can utilize methods involving the real-time monitoring of DNA polymerase activity.
  • Nucleotide incorporations can be detected through fluorescence resonance energy transfer (FRET) interactions between a fluorophore-bearing polymerase and y- phosphate-labeled nucleotides as described, for example, in U.S. Pat. No. 7,329,492 and U.S. Pat. No. 7,211,414 (each of which is incorporated herein by reference) or nucleotide incorporations can be detected with zero-mode waveguides as described, for example, in U.S. Pat. No.
  • FRET fluorescence resonance energy transfer
  • the illumination can be restricted to a zeptoliter-scale volume around a surface- tethered polymerase such that incorporation of fluorescently labeled nucleotides can be observed with low background (Levene, M. J. et al. "Zero-mode waveguides for singlemolecule analysis at high concentrations.” Science 299, 682-686 (2003); Lundquist, P. M. et al.
  • Some SBS embodiments include detection of a proton released upon incorporation of a nucleotide into an extension product.
  • sequencing based on detection of released protons can use an electrical detector and associated techniques that are commercially available from Ion Torrent (Guilford, CT, a Life Technologies subsidiary) or sequencing methods and systems described in US 2009/0026082 Al; US 2009/0127589 Al; US 2010/0137143 Al; or US 2010/0282617 Al, each of which is incorporated herein by reference.
  • Methods set forth herein for amplifying target nucleic acids using kinetic exclusion can be readily applied to substrates used for detecting protons. More specifically, methods set forth herein can be used to produce clonal populations of amplicons that are used to detect protons.
  • the above SBS methods can be advantageously carried out in multiplex formats such that multiple different target nucleic acids are manipulated simultaneously.
  • different target nucleic acids can be treated in a common reaction vessel or on a surface of a particular substrate. This allows convenient delivery of sequencing reagents, removal of unreacted reagents and detection of incorporation events in a multiplex manner.
  • the target nucleic acids can be in an array format. In an array format, the target nucleic acids can be typically bound to a surface in a spatially distinguishable manner.
  • the target nucleic acids can be bound by direct covalent attachment, attachment to a bead or other particle or binding to a polymerase or other molecule that is attached to the surface.
  • the array can include a single copy of a target nucleic acid at each site (also referred to as a feature) or multiple copies having the same sequence can be present at each site or feature. Multiple copies can be produced by amplification methods such as, bridge amplification or emulsion PCR as described in further detail below.
  • the methods set forth herein can use arrays having features at any of a variety of densities including, for example, at least about 10 features/cm2, 100 features/cm2, 500 features/cm2, 1,000 features/cm2, 5,000 features/cm2, 10,000 features/cm2, 50,000 features/cm2, 100,000 features/cm2, 1,000,000 features/cm2, 5,000,000 features/cm2, or higher.
  • an advantage of the methods set forth herein is that they provide for rapid and efficient detection of a plurality of target nucleic acid in parallel. Accordingly the present disclosure provides integrated systems capable of preparing and detecting nucleic acids using techniques known in the art such as those exemplified above.
  • an integrated system of the present disclosure can include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more immobilized DNA fragments, the system comprising components such as pumps, valves, reservoirs, fluidic lines and the like.
  • a flow cell can be configured and/or used in an integrated system for detection of target nucleic acids. Exemplary flow cells are described, for example, in US 2010/0111768 Al and US Ser. No.
  • one or more of the fluidic components of an integrated system can be used for an amplification method and for a detection method.
  • one or more of the fluidic components of an integrated system can be used for an amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method such as those exemplified above.
  • an integrated system can include separate fluidic systems to cariy out amplification methods and to carry' out detection methods.
  • Examples of integrated sequencing systems that are capable of creating amplified nucleic acids and also determining the sequence of the nucleic acids include, without limitation, the MiSeqTM platform (Illumina, Inc., San Diego, CA) and devices described in US Ser. No. 13/273,666, which is incorporated herein by reference.
  • sample and its derivatives, is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target.
  • the sample comprises DNA, RNA, PNA, LNA, chimeric or hybrid forms of nucleic acids.
  • the sample can include any biological, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids.
  • the term also includes any isolated nucleic acid sample such a genomic DNA, fresh- frozen or formalin-fixed paraffin-embedded nucleic acid specimen.
  • the sample can be from a single individual, a collection of nucleic acid samples from genetically related members, nucleic acid samples from genetically unrelated members, nucleic acid samples (matched) from a single individual such as a tumor sample and normal tissue sample, or sample from a single source that contains two distinct forms of genetic material such as maternal and fetal DNA obtained from a maternal subject, or the presence of contaminating bacterial DNA in a sample that contains plant or animal DNA.
  • the source of nucleic acid material can include nucleic acids obtained from a newborn, for example as typically used for newborn screening.
  • the nucleic acid sample can include high molecular weight material such as genomic DNA (gDNA).
  • the sample can include low molecular weight material such as nucleic acid molecules obtained from FFPE or archived DNA samples.
  • low molecular weight material includes enzymatically or mechanically fragmented DNA.
  • the sample can include cell-free circulating DNA.
  • the sample can include nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples.
  • the sample can be an epidemiological, agricultural, forensic or pathogenic sample.
  • the sample can include nucleic acid molecules obtained from an animal such as a human or mammalian source.
  • the sample can include nucleic acid molecules obtained from a nonmammalian source such as a plant, bacteria, virus or fungus.
  • the source of the nucleic acid molecules may be an archived or extinct sample or species.
  • forensic samples can include nucleic acids obtained from a crime scene, nucleic acids obtained from a missing persons DNA database, nucleic acids obtained from a laboratory associated with a forensic investigation or include forensic samples obtained by law enforcement agencies, one or more military services or any such personnel.
  • the nucleic acid sample may be a purified sample or a crude DNA containing lysate, for example derived from a buccal swab, paper, fabric or other substrate that may be impregnated with saliva, blood, or other bodily fluids.
  • the nucleic acid sample may comprise low amounts of, or fragmented portions of DNA, such as genomic DNA.
  • target sequences can be present in one or more bodily fluids including but not limited to, blood, sputum, plasma, semen, urine and serum.
  • target sequences can be obtained from hair, skin, tissue samples, autopsy or remains of a victim.
  • nucleic acids including one or more target sequences can be obtained from a deceased animal or human.
  • target sequences can include nucleic acids obtained from non-human DNA such a microbial, plant or entomological DNA.
  • target sequences or amplified target sequences are directed to purposes of human identification.
  • the disclosure relates generally to methods for identifying characteristics of a forensic sample.
  • the disclosure relates generally to human identification methods using one or more target specific primers disclosed herein or one or more target specific primers designed using the primer design criteria outlined herein.
  • a forensic or human identification sample containing at least one target sequence can be amplified using any one or more of the target-specific primers disclosed herein or using the primer criteria outlined herein.
  • the components of the target-sequence-coverage system 106 can include software, hardware, or both.
  • the components of the target-sequence-coverage system 106 can include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices (e g., the local server device 102). When executed by the one or more processors, the computer-executable instructions of the target-sequence-coverage system 106 can cause the computing devices to perform the bubble detection methods described herein.
  • the components of the target-sequence- coverage system 106 can comprise hardware, such as special purpose processing devices to perform a certain function or group of functions. Additionally, or alternatively, the components of the target-sequence-coverage system 106 can include a combination of computer-executable instructions and hardware.
  • the components of the target-sequence-coverage system 106 performing the functions described herein with respect to the target-sequence-coverage system 106 may, for example, be implemented as part of a stand-alone application, as a module of an application, as a plug-in for applications, as a library function or functions that may be called by other applications, and/or as a cloud-computing model.
  • components of the targetsequence-coverage system 106 may be implemented as part of a stand-alone application on a personal computing device or a mobile device.
  • the components of the target-sequence-coverage system 106 may be implemented in any application that provides sequencing services including, but not limited to Illumina BaseSpace, Illumina MiSeq, Illumina NovaSeq, Illumina NextS eq, Illumina TruSeq, or Illumina TruSight software.
  • Illumina “Illumina,” “BaseSpace,” “MiSeq,” “NovaSeq,” “NextSeq,” “TruSeq,” and “TruSight,” are either registered trademarks or trademarks of Illumina, Inc. in the United States and/or other countries.
  • Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below.
  • Embodiments within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
  • one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein).
  • a processor receives instructions, from a non-transitory computer- readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.
  • a non-transitory computer- readable medium e.g., a memory, etc.
  • Computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system.
  • Computer-readable media that store computer-executable instructions are non-transitory computer-readable storage media (devices).
  • Computer-readable media that carry computer-executable instructions are transmission media.
  • embodiments of the disclosure can comprise at least two distinctly different kinds of computer-readable media: non-transitory computer-readable storage media (devices) and transmission media.
  • Non-transitory computer-readable storage media includes RAM, ROM, EEPROM, CD-ROM, solid state drives (SSDs) (e.g., based on RAM), Flash memory, phasechange memory (PCM), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
  • SSDs solid state drives
  • PCM phasechange memory
  • a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
  • a network or another communications connection can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
  • program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to non-transitory computer-readable storage media (devices) (or vice versa).
  • computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a NIC), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system.
  • a network interface module e.g., a NIC
  • non-transitory computer-readable storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
  • computer-executable instructions are executed on a general- purpose computer to turn the general-purpose computer into a special purpose computer implementing elements of the disclosure.
  • the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
  • the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like.
  • the disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
  • program modules may be located in both local and remote memory storage devices.
  • Embodiments of the present disclosure can also be implemented in cloud computing environments.
  • “cloud computing” is defined as a model for enabling on- demand network access to a shared pool of configurable computing resources.
  • cloud computing can be employed in the marketplace to offer ubiquitous and convenient on- demand access to the shared pool of configurable computing resources.
  • the shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
  • a cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth.
  • a cloud-computing model can also expose various service models, such as, for example, Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (laaS).
  • SaaS Software as a Service
  • PaaS Platform as a Service
  • laaS Infrastructure as a Service
  • a cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth.
  • a “cloud-computing environment” is an environment in which cloud computing is employed.
  • FIG. 10 illustrates a block diagram of a computing device 1000 that may be configured to perform one or more of the processes described above.
  • the computing device 1000 may implement the target- sequence-coverage system 106.
  • the computing device 1000 can comprise a processor 1002, a memory 1004, a storage device 1006, an I/O interface 1008, and a communication interface 1010, which may be communicatively coupled by way of a communication infrastructure 1012.
  • the computing device 1000 can include fewer or more components than those shown in FIG. 10. The following paragraphs describe components of the computing device 1000 shown in FIG. 10 in additional detail.
  • the processor 1002 includes hardware for executing instructions, such as those making up a computer program.
  • the processor 1002 may retrieve (or fetch) the instructions from an internal register, an internal cache, the memory 1004, or the storage device 1006 and decode and execute them.
  • the memory 1004 may be a volatile or non-volatile memory used for storing data, metadata, and programs for execution by the processor(s).
  • the storage device 1006 includes storage, such as a hard disk, flash disk drive, or other digital storage device, for storing data or instructions for performing the methods described herein.
  • the I/O interface 1008 allows a user to provide input to, receive output from, and otherwise transfer data to and receive data from computing device 1000.
  • the 1/ O interface 1008 may include a mouse, a keypad or a keyboard, a touch screen, a camera, an optical scanner, network interface, modem, other known I/O devices or a combination of such I/O interfaces.
  • the I/O interface 1008 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers.
  • the I/O interface 1008 is configured to provide graphical data to a display for presentation to a user.
  • the graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
  • the communication interface 1010 can include hardware, software, or both. In any event, the communication interface 1010 can provide one or more interfaces for communication (such as, for example, packet-based communication) between the computing device 1000 and one or more other computing devices or networks. As an example, and not by way of limitation, the communication interface 1010 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI.
  • NIC network interface controller
  • WNIC wireless NIC
  • the communication interface 1010 may facilitate communications with various types of wired or wireless networks.
  • the communication interface 1010 may also facilitate communications using various communication protocols.
  • the communication infrastructure 1012 may also include hardware, software, or both that couples components of the computing device 1000 to each other.
  • the communication interface 1010 may use one or more networks and/or protocols to enable a plurality of computing devices connected by a particular infrastructure to communicate with each other to perform one or more aspects of the processes described herein.
  • the sequencing process can allow a plurality of devices (e.g., a client device, sequencing device, and server device(s)) to exchange information such as sequencing data and error notifications.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

This disclosure describes methods, non-transitory-computer readable media, and systems that can estimate read, coverage for a target genomic region and adjust a number of sequencing cycles to satisfy a. target read-coverage level for the target genomic region across genomic samples. For instance, the disclosed system can receive user input indicating a target genomic region for genomic samples and a target read-coverage level for the target genomic region within the one or more genomic samples. Based on reads provisionally mapped to a reference sequence corresponding to the target genomic region, the disclosed systems can estimate a read-coverage level for the target genomic region. Having estimated read-coverage levels, the disclosed systems can generate and execute an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for one or more of the samples.

Description

MODIFYING SEQUENCING CYCLES DURING A SEQUENCING RUN TO MEET CUSTOMIZED COVERAGE ESTIMATIONS FOR A TARGET GENOMIC REGION
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional Application No. 63/646,237, filed May 13, 2024, the contents of which is incorporated by reference herein in its entirety.
BACKGROUND
[0002] In recent years, biotechnology firms and research institutions have improved hardware and software for sequencing nucleotides and determining nucleobase calls for genomic samples. For instance, some existing sequencing devices and sequencing-data- analysis software (together “existing sequencing systems”) predict individual nucleobases within sequences by using conventional Sanger sequencing or sequencing-by-synthesis (SBS) methods. When using SBS, existing sequencing systems can monitor many thousands to billions of oligonucleotides being synthesized in parallel from templates to predict nucleobase calls for growing nucleotide reads. During a sequencing run in many existing sequencing systems, a camera captures images of irradiated fluorescent tags incorporated into oligonucleotides. After capturing such images, some existing sequencing systems determine nucleobase calls for nucleotide reads corresponding to respective clusters of oligonucleotides on a flow cell (or other nucleotide-sample substrate) for a given sequencing run. For example, some existing sequencing systems utilize sequencing-data-analysis software to analyze image data captured during sequencing cycles to determine nucleobase calls for given clusters of oligonucleotides and sequence such calls across sequencing cycles to determine nucleotide reads for the given clusters. Additionally, some existing sequencing systems utilize targeted sequencing to analyze genomic regions of interest. For example, existing sequencing systems can sequence and analyze mutations in specific genomic regions that correspond with various diseases.
[0003] As part of such improved genomic sequencing, biotechnology firms and research institutions have also improved methods of simultaneously pooling and sequencing large numbers of genomic samples. Existing sequencing systems may pool genetic samples from different individuals into clusters on a single flow well (or other nucleotide-sample substrate) to increase the number of samples analyzed in a single sequencing run. For instance, existing sequencing systems may utilize sample multiplexing (or multiplex sequencing) to add individual “barcode” or indexing sequences to each deoxyribonucleic acid (DNA) fragment during library preparation. The indexing sequences correspond to individual genomic samples within the sample pool. After the indexing sequences have been identified, existing sequencing systems may perform demultiplexing to identify which indexing sequences — and which clusters of oligonucleotides on a flow cell — correspond with which genomic samples.
[0004] Despite recent advances in multiplexing and per-cycle image analysis, existing sequencing systems some times face challenges in accurately determining coverage for a genomic region of interest across samples until after concluding a sequencing run. Additionally, existing sequencing systems face other technical difficulties that vary the level of genomic region coverage for samples provided by read data from a given sequencing run. In multiplexed sequencing, for example, the number of nucleotide reads from each genomic sample in clusters may not be evenly distributed, leading to variations in nucleotide-read depth or coverage. Furthermore, the nucleotide reads may align to different segments of a reference genome, which results in additional variations in coverage across genomic regions. This uneven representation sometimes results in a sequencing device executing an insufficient number of sequencing cycles or images (or otherwise under-sequencing) for a sequencing run to generate the requisite numbers or length of nucleotide reads that satisfy a target level of coverage for a genomic region of interest for a given sample. While sequencing devices can under-sequence DNA reads extracted from some samples, sequencing devices can sometimes execute an excessive number of sequencing cycles or images (or otherwise over-sequence) for a sequencing run to generate the requisite numbers or length of nucleotide reads to satisfy the target genomic region coverage level.
[0005] Incorrect or inefficient sample loading on a nucleotide-sample substrate for an existing sequencing systems can result in run-to-run variations, such as variations in read-data coverage for a genomic region of interest produced by a given sequencing run. Existing sequencing systems often consume additional computing time, memory, and consumable materials to compensate for these run-to-run variations. Some existing sequencing systems consume additional computing time and memory to address under-sequenced genomic regions. For instance, existing sequencing systems often perform preset additional or buffer sequencing cycles during a sequencing run to ensure sufficient coverage of a target genomic region across all samples. As a result of performing additional sequencing cycles within a sequencing run, existing sequencing systems can over-sequence some genomic regions of samples within a sample pool while still under-sequencing other genomic regions. Thus, in addition to expending additional computing time and memory to over-sequencing certain genomic regions, existing systems often also expend additional computing time to perform one or more additional sequencing runs (also known as top-up runs) to compensate for under-sequenced genomic regions of interest for one or more samples of a previous sequencing run.
[0006] Because of run-to-run variations arising from incorrect or inefficient sample loading, existing sequencing systems can consume additional reagents, processing materials, and sample material during additional sequencing cycles or runs. By extending sequencing cycles to compensate for coverage uncertainty caused by incorrect or inefficient sample loading and sometimes performing additional sequencing runs to compensate for under-sequenced samples, existing sequencing systems can consume additional processing materials, including sequencing reagents, library preparation kits, cluster amplification materials, flow cells or other nucleotide-sample substrates, scarce real estate on such flow cells, and other materials. In addition to consuming such materials, existing sequencing systems sometimes require reextracting genomic material from an individual and re-performing library preparation necessary to seed oligonucleotide clusters on an additional flow cell to perform an additional sequencing run to compensate for a previous sequencing run that failed to produce a target genomic region coverage for variant calling (or other secondary analysis) of the individual. For many existing systems, the relationship between number of cycles and processing materials consumed is a linear function. Thus, many existing sequencing systems consume additional processing materials and sample materials to compensate for the coverage uncertainty and variation outlined above.
[0007] These, along with additional problems and issues exist in existing sequencing systems.
SUMMARY
[0008] This disclosure describes one or more embodiments of systems, methods, and non- transitory computer readable storage media that solve one or more of the problems described above or provide other advantages over the art. For example, the disclosed systems estimate read coverage for a target genomic region of each sample in a flow cell or other nucleotide- sample substrate and adjusts a number of sequencing cycles to satisfy a target coverage for the target genomic region based of the estimated read coverage.
[0009] To illustrate, in some embodiments, the disclosed systems receive user input for a sequencing run that indicates a target coverage level for a target genomic region. After performing an initial set of sequencing cycles, the disclosed systems may provisionally map reads of the samples — which can be identified using index sequences — to a reference sequence corresponding to the target genomic region. Based on the reads provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the run, the system estimates a read-coverage level for the target genomic region in each sample of the run. Having estimated the read-coverage level, the disclosed systems may generate an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for one or more of the genomic samples. The disclosed systems can further execute the sequencing run according to the adjusted number of sequencing cycles.
[0010] Additional features and advantages of one or more embodiments of the present disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of such example embodiments.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The detailed description refers to the drawings briefly described below.
[0012] FIG. 1 illustrates a computing system in which a sequencing device and a corresponding target-sequence-coverage system can operate in accordance with one or more embodiments of the present disclosure.
[0013] FIG. 2 illustrates an overview of the target-sequence-coverage system generating an adjusted number of sequencing cycles and executing a sequencing run based on the adjusted number of sequencing cycles in accordance with one or more implementations of the present disclosure.
[0014] FIG. 3 illustrates an overview of the target-sequence-coverage system estimating an updated read-coverage level in accordance with one or more implementations of the present disclosure.
[0015] FIG. 4 illustrates the target-sequence-coverage system provisionally mapping nucleotide reads to a reference sequence in accordance with one or more embodiments of the present disclosure.
[0016] FIG. 5 illustrates the target-sequence-coverage system identifying clusters of oligonucleotides provisionally mapped to a reference sequence corresponding with the target genomic region and identifying genomic samples to which the nucleotide sequence reads belong in accordance with one or more embodiments of the present disclosure.
[0017] FIG. 6 illustrates an example decision flowchart by which the target-sequence- coverage system executes sequencing runs according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure.
[0018] FIG. 7 illustrates a schematic view of an example of a system that may be used to provide biological or chemical analysis in accordance with one or more embodiments of the present disclosure.
[0019] FIG. 8 illustrates a schematic view of an example of a set of components that may cooperate to provide a fluid path in the system of FIG. 7 in accordance with one or more embodiments of the present disclosure.
[0020] FIG. 9 illustrates a flowchart of a series of acts for executing a sequencing run according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure.
[0021] FIG. 10 illustrates a block diagram of an example computing device in accordance with one or more embodiments of the present disclosure. DETAILED DESCRIPTION
[0022] This disclosure describes one or more embodiments of a target-sequence-coverage system that can efficiently identify read data for one or more genomic samples during a sequencing run, estimate read coverage for a target genomic region of the one or more samples, and adjust anumber of sequencing cycles during the sequencing run to satisfy a target coverage for the target genomic region of the one or more genomic samples. The target-sequence- coverage system can accordingly adjust, on the fly, a number of sequencing cycles that likely satisfy a target coverage for a target genomic region.
[0023] For instance, the target-sequence-coverage system can receive data input for a sequencing run that identifies a target genomic region for genomic samples and a target-read- coverage level for the target genomic region. After performing an initial set of sequencing cycles, in some embodiments, the target-sequence-coverage system can provisionally map nucleotide reads of the genomic samples to a reference sequence corresponding to the target genomic region. Such a reference sequence may represent the target genomic region within a reference genome. The target-sequence-coverage system can identify to which sample certain nucleotide reads belong based on index sequences within the nucleotide reads. Based on the nucleotide reads being provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the sequencing run, the target-sequence-coverage system estimates a read-coverage level for the target genomic region in each sample of the run. Having estimated read-coverage level, the target-sequence-coverage system can generate an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for one or more of the samples. The target-sequence-coverage system can further execute the sequencing run according to the adjusted number of sequencing cycles or, upon further update, an updated adjusted number of sequencing cycles.
[0024] As just noted, the target-sequence-coverage system can receive data input identifying a target genomic region for genomic samples. A researcher or clinician may choose to identify a target genomic region for sequencing by inputting a range of genomic coordinates or a particular name or code identifying the target genomic region. For example, a researcher may be specifically interested in particular genes or genomic regions associated with a particular disease, trait, or biological process as part of a sequencing array or whole genome sequencing. The target-sequence-coverage system can receive input identifying the target genomic region. [0025] In addition to input identifying the targe genomic region, the target-sequence- coverage system can receive data input identifying a target read-coverage level for the target genomic region. More particularly, the target-sequence-coverage system can identify a desired depth or number of times the target genomic region is sequenced during a sequencing run. The coverage level of a target genomic region can indicate the reliability and accuracy of downstream analysis, such as variant calling. Accordingly, the target-sequence-coverage system can identify a target read-coverage level that results in the desired quality level of additional sequencing.
[0026] Having identified both the target genomic region and the target read-coverage level, the target-sequence-coverage system can provisionally map nucleotide reads of the genomic samples to a reference sequence corresponding to the target genomic region. After an initial set of sequencing cycles, the target-sequence-coverage system can identify nucleotide reads that contribute to the final coverage of the targeted genomic region. The target-sequence-coverage system can map the nucleotide reads of the genomic sample from the initial set of sequencing cycles to a reference genome. Based on the provisional mapping, the target-sequence-coverage system can identify nucleotide reads that correspond or map to the target genomic region.
[0027] As described further below, during the sequencing run, the target-sequence- coverage system can estimate read-coverage levels of the target genomic region. In some examples, the target-sequence-coverage system identifies a number of nucleotide reads provisionally mapped to the target genomic region or adjacent regions to estimate the read coverage. Furthermore, in some implementations, the target-sequence-coverage system estimates the read coverage based on the growth direction of nucleotide reads mapped to the target genomic region and/or adjacent regions.
[0028] Based on the estimated read-coverage level for the target genomic region, in some embodiments, the target-sequence-coverage system generates an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region. In particular, the target-sequence-coverage system can adjust a number of sequencing cycles before the sequencing run concludes. For example, the target-sequence-coverage system can determine, for each genomic sample, a number of sequencing cycles likely required to satisfy the target read coverage. As part of such a determination, the target-sequence-coverage system can determine (i) whether the number of sequencing cycles needed is beyond the maximum number of cycles for a sequencing run, (ii) whether the sequencing run could be terminated at the adjusted number of sequencing cycles for the sequencing run, or (iii) whether the sequencing run should be continued with additional cycles. By generating the adjusted number of sequencing cycles, the target-sequence-coverage system can efficiently eliminate under-sequenced genomic samples and thereby avoid performing additional and unnecessary sequencing runs.
[0029] As indicated above, the target-sequence-coverage system can iteratively update the adjusted number of sequencing cycles a various checkpoint sequencing cycles of a sequencing run. During such an iterative update, the target-sequence-coverage system can perform a new provisional mapping of nucleotide reads for genomic samples by mapping nucleotide reads (or fragment of nucleotide reads) not mapped previously to the target genomic region. Likewise, the target-sequence-coverage system can provisionally remap nucleotide reads to the reference sequence corresponding to the target genomic region at a checkpoint sequencing cycle. The target-sequence-coverage system can use the provisional remapping of nucleotide reads to predict whether a previously determined adjusted number of sequencing cycles is sufficient to meet the target read-coverage levels across genomic samples and/or for individual genomic samples. The target-sequence-coverage system can determine an updated read-coverage level and further adjust the number of sequencing cycles within the sequencing run.
[0030] As indicated above, the target-sequence-coverage system provides several technical advantages by, for example, improving computational and resource efficiency of a specialized computing device — that is, a sequencing device — relative to some existing sequencing systems. As mentioned, the target-sequence-coverage system may reduce the amount of compute time and consumed memory on a sequencing device for a given sequencing run to reach target read-coverage levels for target genomic regions relative to existing sequencing systems. By estimating read-coverage levels and adjusting a number of sequencing cycles before finishing a sequencing run, the target-sequence-coverage system may more accurately execute a number of sequencing cycles required for each genomic sample (or an individual genomic sample) to reach a target read-coverage level of a target genomic sequence. As part of further iterative updates, in some embodiments, the target-sequence-coverage system can perform additional provisional re-mappings and generate updated read-coverage levels that may more-precisely predict, in real time or near real time, the number of sequencing cycles required to meet the target read-coverage level. Relative to some existing sequencing systems operating on existing sequencing devices, the target-sequence-coverage system can execute a lower number of sequencing cycles that may consume less processing and memory — while still achieving acceptable read-coverage levels for each genomic sample (or an individual genomic sample) at a target genomic region. Accordingly, the target-sequence-coverage system can reduce the amount of compute time required to perform a sequencing run that satisfies a target read-coverage level for target genomic regions for genomic samples.
[0031] Beyond conserved computing resources, in some implementations, the target- sequence-coverage system may also conserves consumables and other physical resources — and may reduce overuse of fluidics devices and other hardware within a sequencing device — for a sequencing device relative to some existing sequencing systems. To compensate for the read-data-coverage uncertainty and variant and otherwise satisfy target read-coverage levels for multiplexed samples described above, some existing sequencing systems may be used to preset additional or buffer sequencing cycles and sometimes perform one or more additional sequencing runs (e.g., top-off runs) with a particular sample to ensure sufficient coverage for data analysis. Such additional preset sequencing cycles or additional sequencing runs can consume additional sequencing reagents, processing materials, and sample materials. In contrast to some such existing sequencing systems, as described below, the target-sequence- coverage system can (i) efficiently generate an adjusted number of sequencing cycles before a sequencing run concludes and (ii) thereby execute the sequencing run on a specialized computing device — that is, a sequencing device — according to the adjusted number of sequencing cycles. By tailoring parameters of a sequencing run based on an estimated readcoverage level, the target-sequence-coverage system can efficiently consume physical resources required to achieve a target read-coverage level for a target genomic region and avoids unnecessary wear and tear on the physical components of a sequencing device.
[0032] As illustrated by the foregoing discussion, the present disclosure utilizes a variety of terms to describe features and advantages of the target-sequence-coverage system. As used herein, for example, the term “sequencing run” refers to an iterative process on a sequencing device to determine a primary structure of nucleotide sequences from a sample (e.g., genomic sample). In particular, a sequencing run includes cycles of sequencing chemistry and imaging performed by a sequencing device that incorporate nucleobases into growing oligonucleotides to determine nucleotide reads from nucleotide sequences extracted from a sample (or other sequences within a library fragment) and seeded throughout a flow cell or other nucleotide- sample substrate. In some cases, a sequencing run includes replicating oligonucleotides derived or extracted from one or more genomic samples seeded in clusters or other structures, such as circularized nucleotide strands, throughout a flow cell. Upon completing a sequencing run, a sequencing device can generate base-call data in a file, such as a binary base call (BCL) sequence file or a fast-all quality (FASTQ) file.
[0033] Relatedly, as used herein, for example, the term "sequencing cycle” refers to an iteration of adding or incorporating one or more nucleobases to one or more oligonucleotides representing or corresponding to sample’s sequence (e.g., a genomic or transcriptomic sequence from a sample) or a corresponding adapter sequence. In some cases, a sequencing cycle includes an iteration of both incorporating nucleobases into clusters or other structures of oligonucleotides using sequencing chemistry and capturing images of detectable elements, such as fluorescence of fluorophores, associated with the nucleotides of such clusters or other structures attached to or located on a flow cell or other nucleotide-sample substrate. A sequencing cycle can include one or both of an indexing cycle and a genomic sequencing cycle. For instance, one cluster of oligonucleotides or a set of clusters of oligonucleotides may be undergoing a genomic sequencing cycle in which nucleobases corresponding to a sample genomic sequence are incorporated and another cluster of oligonucleotides or another set of clusters of oligonucleotides may be concurrently undergoing an indexing cycle in which nucleobases corresponding to an indexing sequence for a nucleotide read are incorporated. In some cases, a sequencing device progresses through sequencing cycles that determine nucleobase calls for a nucleotide read from an oligonucleotide comprising an adapter sequence (e.g., p5/p7 primers), a first indexing sequence, a sample genomic sequence (e.g., gDNA), a second indexing sequence, and another adapter sequence (e.g., p5/p7 primers).
[0034] As further used herein, the term “genomic sequencing cycle” refers to an iteration of adding or incorporating one or more nucleobases to one or more oligonucleotides representing or corresponding to a sample genomic sequence (or cDNA sequence). In particular, a genomic sequencing cycle can include an iteration of capturing and analyzing one or more images with data indicating individual nucleobases added or incorporated into an oligonucleotide or to oligonucleotides (in parallel) representing or corresponding to one or more sample genomic sequences. Such image analysis can include analyzing data from signals output from an image sensor (e.g., an area capture sensor or a time delayed integration (TDI) sensor). For example, in one or more embodiments, each genomic sequencing cycle involves capturing and analyzing images to determine either single reads or paired-end reads of DNA (or RNA) strands representing part of a genomic sample (or transcribed sequence from a genomic sample). As suggested above, however, a genomic sequencing cycle, in some cases, is specific to a cluster of oligonucleotides or a set of clusters of oligonucleotides. [0035] By contrast, the term “indexing cycle” refers to an iteration of adding or incorporating one or more nucleobases to one or more oligonucleotides representing or corresponding to one or more indexing sequences. In particular, an indexing cycle can include an iteration of capturing and analyzing one or more images of clusters of oligonucleotides indicating one or more nucleobases added or incorporated into an oligonucleotide or to oligonucleotides (in parallel) representing or corresponding to one or more indexing sequences. An indexing cycle differs from a genomic sequencing cycle in that an indexing cycle includes sequencing of at least a nucleobase (or a majority of nucleobases) from one or more indexing sequences that identify or encode one or more sample library fragments. Because genomic sequencing cycles may be specific to a cluster or clusters of oligonucleotides or other structures of oligonucleotides, an indexing cycle for one cluster of oligonucleotides may be performed at a same time as a genomic sequencing cycle for another cluster of oligonucleotides.
[0036] Relatedly, the term “currently selected number of sequencing cycles” refers to an adjustable value that represents a number of sequencing cycles to be performed during a sequencing run. In particular, a currently selected number of sequencing cycles can be automatically determined, determined based on user selection, or preset according to a default number. In some instances, the currently selected number of sequencing cycles may be fixed based on the reagent kit used with a sequencing system. For instance, reagent kits may specify 50, 100, 150, 200, 300, 400, 500, 600, or more genomic sequencing cycles in addition to indexing cycles, primer cycles, and/or other cycles. In one example, the target-sequence- coverage system can determine a currently selected number of sequencing cycles equaling 150 sequencing cycles. The target-sequence-coverage system can adjust the number of sequencing cycles by increasing the number of sequencing cycles or reducing the number of sequencing cycles.
[0037] As used herein, the term “genomic sample” refers to a target oligonucleotide sample. The oligonucleotide sample may be a genome or portion of a genome undergoing an assay or sequencing. For example, a genomic sample includes one or more sequences of nucleotides isolated or extracted from a sample organism (or a copy of such an isolated or extracted sequence) or any other source of oligonucleotides. In particular, a genomic sample may include a full genome that is isolated or extracted (in whole or in part) from a sample organism and composed of nitrogenous heterocyclic bases. A genomic sample can include a segment of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or other polymeric forms of nucleic acids or chimeric or hybrid forms of nucleic acids noted below. In some cases, the genomic sample is found in a sample prepared or isolated by a kit and received by a sequencing device.
[0038] As used herein, the term “read-coverage level” refers to a measure or value that indicates a depth or redundancy of nucleotide-sequence information for a particular genomic coordinate or genomic region of a sample. In particular, read-coverage level refers to a number of times a specific genomic coordinate or genomic region for a sample is covered or spanned by nucleotide reads. Read-coverage level can be relevant when describing the depth of sequencing data obtained for a particular genomic region of interest or a particular genomic sample. For example, read-coverage level may comprise a numeric value (e.g., lOx, 30x, 45x, 50x, 60x) indicating an average number of unique nucleotide reads for a genomic sample that span or cover genomic coordinates or regions of a human genomic sample. In some cases, readcoverage level is limited to an average number of unique nucleotide reads across a non-N portion of a human or non-human genome (e.g., non-N portion of a PAR-masked human genome or non-human genome, such as genomes of primates, viruses, bacteria, or other organisms).
[0039] As used herein, the term “target read-coverage level” refers to a desired or intended depth of sequencing coverage for a specific genomic coordinate or genomic region within a genomic sample. In particular, a target read-coverage level represents a minimum number of times a position within a genomic sample should be sequenced to achieve a desired level of confidence in the accuracy of the obtained sequence data. For example, a target-read-coverage level can comprise a numeric value (e.g., 40, 50, 60, etc.) indicating a desired read-coverage level for a given position within a genomic sample.
[0040] As further used herein, the term “genomic coordinate” (or sometimes simply “coordinate”) refers to a particular location or position of a nucleobase within a genome (e.g., an organism’s genome or a reference genome). In some cases, a genomic coordinate includes an identifier for a particular chromosome of a genome and an identifier for a position of a nucleobase within the particular chromosome. For instance, a genomic coordinate or coordinates may include a number, name, or other identifier for a chromosome (e.g., chrl, chrX, chrM) and a particular position or positions, such as numbered positions following the identifier for a chromosome (e.g., chrl: 1234570 or chrl: 1234570-1234870). In some cases, a genomic coordinate refers to a genomic coordinate on a sex chromosome (e.g., chrX or chrY) or mitochondrial DNA (e.g., chrM). Further, in certain implementations, a genomic coordinate refers to a source of a reference genome (e.g., mt for a mitochondrial DNA reference genome or SARS-CoV-2 for a reference genome for the SARS-CoV-2 virus) and a position of a nucleobase within the source for the reference genome (e.g., mt: 16568 or SARS-CoV- 2:29001). By contrast, in certain cases, a genomic coordinate refers to a position of a nucleobase within a reference genome without reference to a chromosome or source (e.g., 29727).
[0041] As used herein, a “genomic region” refers to a range of genomic coordinates. Like genomic coordinates, in certain implementations, a genomic region may be identified by an identifier for a chromosome and a particular position or positions, such as numbered positions following the identifier for a chromosome (e.g., chrl: 1234570-1234870). In various implementations, a genomic coordinate includes a position within a reference genome. In some cases, a genomic coordinate is specific to a particular reference genome.
[0042] As used herein, the term “target genomic region” refers to a specific segment or range of genomic coordinates that is a focus of interest for analysis. In particular, a target genomic region comprises a range of genomic coordinates of interest to be sequenced during a sequencing run. For example, a target genomic region can correspond with a gene of interest, chromosomal regions, epigenetic regions, have functional elements, or have other traits. The target genomic region can correspond to a reference sequence.
[0043] As used herein, the term “reference genome” refers to a digital nucleic acid sequence assembled as a representative example (or representative examples) of genes and other genetic sequences of an organism. Regardless of the sequence length, in some cases, a reference genome represents an example set of genes or a set of nucleic acid sequences in a digital nucleic acid sequence determined as representative of an organism. For example, a linear human reference genome may be GRCh38 (or other versions of reference genomes) from the Genome Reference Consortium. GRCh38 may include alternate contiguous sequences representing alternate haplotypes, such as SNPs and small indels (e.g., 10 or fewer base pairs, 50 or fewer base pairs).
[0044] Relatedly, as used herein, the term “reference sequence” refers to a sequence of nucleobases at a specific location corresponding to a target genomic region within a reference genome. In particular, a reference sequence can refer to a span of reference bases in a reference genome that includes a target genomic region and extends upstream or downstream from the target genomic region by a threshold number of nucleobases. For example, a reference sequence can refer to a specific sequence of nucleotides within the reference genome that represents a gene, a promoter region, a chromosomal segment or other target genomic region. [0045] As used herein, the term “nucleobase call” (or simply “base call”) refers to a determination or prediction of a particular nucleobase (or nucleobase pair) for an oligonucleotide (e.g., read) during a sequencing cycle. In particular, a nucleobase call can indicate a determination or prediction of the type of nucleobase that has been incorporated within an oligonucleotide on a flow cell (e.g., read-based nucleobase calls). In some cases, for a nucleotide read, a nucleobase call includes a determination or a prediction of a nucleobase based on intensity values resulting from fluorescent-tagged nucleotides added to one or more oligonucleotides of a flow cell (e.g., in a cluster of a flow cell). Alternatively, a nucleobase call includes a determination or a prediction of a nucleobase from chromatogram peaks or electrical current changes resulting from nucleotides passing through a nanopore of a flow cell. As suggested above, a single nucleobase call can be an adenine (A) call, a cytosine (C) call, a guanine (G) call, a thymine (T) call, or a uracil (U) call.
[0046] As used herein, the term “sample genomic sequence” refers to a nucleotide sequence extracted from, copied from, or complementary to a sample’s chromosome. For example, a sample genomic sequence includes a nucleotide sequence that has been separated or copied from chromosomal DNA of a sample or has been sequenced to be complementary to an extracted or copied nucleotide sequence. Accordingly, a sample genomic sequence includes genomic DNA (gDNA) for a particular unknown sample. Accordingly, as described herein, in some embodiments, the target-sequence-coverage system can use a sample complementary sequence comprising cDNA rather than a sample genomic sequence comprising gDNA in a sample library fragment or wherever suitable cDNA may replace gDNA as understood by a skilled artisan. Indeed, any embodiment or nucleotide read in this disclosure that uses or includes a sample genomic sequence can also use or include a cDNA sequence corresponding to a genomic sample.
[0047] As used herein, the term “indexing sequence” refers to a unique and artificial nucleotide sequence that identifies nucleotide reads for a sample and that is ligated to a sample’s nucleotide sequence (e.g., a gDNA fragment or cDNA fragment) or to another sequence within a sample library fragment. As indicated above, an indexing sequence can be part of a sample library fragment. Similarly, an indexing sequence can be used to sort nucleotide reads by sample or into different files, among other things, such as part of a demultiplexing process. In some cases, a sample library fragment includes an indexing primer sequence that differs from a read priming sequence and that indicates a starting point or starting nucleobase for determining nucleobases of an indexing sequence. [0048] As used herein, the term “cluster of oligonucleotides” refers to a localized collection of DNA or RNA molecules immobilized or located on a solid surface. In particular, a cluster of oligonucleotides can refer to a collection of fragment nucleotide sequences immobilized or located on a flow cell region of a flow cell. For example, a cluster of oligonucleotides can refer to a collection of nucleotide fragments originating from a genomic sample. A cluster of oligonucleotides can be imaged utilizing one or more light signals. For instance, an oligonucleotide-cluster image may be captured by an image sensor during a sequencing cycle of light emitted by irradiated fluorescent tags incorporated into oligonucleotides from one or more clusters on a flow cell.
[0049] As used herein, the term “nucleotide read” (or simply “read”) refers to an inferred sequence of one or more nucleobases (or nucleobase pairs) from all or part of a sample nucleotide sequence (e.g., a sample genomic sequence, complementary DNA). In particular, a nucleotide read includes a determined or predicted sequence of nucleobase calls for a nucleotide sequence (or group of monoclonal nucleotide sequences) from a sample library fragment corresponding to a genomic sample. For example, in some cases, a sequencing device determines a nucleotide read by generating nucleobase calls for nucleobases passed through a nanopore of a flow cell, determined via fluorescent tagging, or determined from a cluster in a flow cell.
[0050] As used herein, the term “nucleobase” refers to a nitrogenous base. In particular, nucleobases comprise components of nucleotides. For example, a nucleobase may be an adenine (A), cytosine (C), guanine (G), thymine (T), or uracil (U). In some instances, one or more non-naturally occurring nucleobases may also be used.
[0051] As used herein, the term “sequencing device” refers to an instrument or platform used to perform a sequencing process. In particular, a sequencing device refers to an instrument or platform used to perform a sequencing process based on sequencing by synthesis (SBS) technology, single-molecule real-time sequencing (SMRT) technology using magnetic beads or nanopores or other suitable medium. For example, a sequencing device may comprise components including, but not limited to, -flow cell receptacle, fluidics systems, imaging systems, and/or computational capabilities for acquiring, processing, and analyzing image data (e.g., illumination lasers or SLEDS, focus tracking, emission optics or objective, image sensor or camera) during a sequencing run.
[0052] The following paragraphs describe a target-sequence-coverage system with respect to illustrative figures that portray example embodiments and implementations. For example, FIG. 1 illustrates a schematic diagram of a computing system 100 in which a target-sequence- coverage system 106 operates in accordance with one or more embodiments. As illustrated, the computing system 100 includes a local server device 102 connected to one or more server device(s) 110, a sequencing device 108, and a client device 114 via a network 112. While FIG. 1 shows an embodiment of the target-sequence-coverage system 106, this disclosure describes alternative embodiments and configurations below.
[0053] As shown in FIG. 1, the local server device 102, the sequencing device 108, the server device(s) 110, and the client device 114 can communicate with each other via the network 112. The network 112 comprises any suitable network over which computing devices can communicate. Example networks are discussed in additional detail below with respect to FIG. 10.
[0054] As indicated by FIG. 1, the sequencing device 108 comprises a device for sequencing a genomic sample or other nucleic-acid polymer. In some embodiments, the sequencing device 108 implements a sequencing device system 118 that analyzes nucleic-acid segments or oligonucleotides extracted from genomic samples to generate nucleotide reads or other data utilizing computer implemented methods and systems (described herein) either directly or indirectly on the sequencing device 108. More particularly, the sequencing device 108 receives nucleotide-sample substrates (e.g., flow cells) comprising nucleotide fragments extracted from samples and then copies and determines the nucleotide-base sequence of such extracted nucleotide fragments. In one or more embodiments, the sequencing device 108 utilizes SBS to sequence nucleic-acid polymers into nucleotide reads. Additionally, the sequencing device 108 can determine base calls for indexing sequences. In addition, or in the alternative to communicating across the network 112, in some embodiments, the sequencing device 108 bypasses the network 112 and communicates directly with the local server device 102 or the client device 114.
[0055] As further indicated by FIG. 1, the local server device 102 is located at or near a same physical location of the sequencing device 108. Indeed, in some embodiments, the local server device 102 and the sequencing device 108 are integrated into a same computing device or are part of the sequencing device 108, as indicated by dotted lines 122. The local server device 102 may run a sequencing system 104 to generate, receive, analyze, store, and transmit digital data, such as by receiving base-call data or determining indexing sequence data or filter metric data based on analyzing such base-call data. As shown in FIG. 1, the sequencing device 108 may send (and the local server device 102 may receive) base-call data generated during a sequencing run of the sequencing device 108. By executing software from the form of the sequencing system 104, the local server device 102 may estimate read-coverage levels for genomic samples in a pool of genomic samples. The local server device 102 may also communicate with the client device 114. In particular, the local server device 102 can send data to the client device 114, including read-coverage information for genomic samples, filter metric data, estimated read-coverage levels, a variant call file (VCF), or other information indicating nucleobase calls, genotype calls, sequencing metrics, error data, or other metrics.
[0056] As further indicated by FIG. 1, the server device(s) 110 are located remotely from the local server device 102 and the sequencing device 108. The sequencing device 108 may send (and the server device(s) 110 may receive) base-call data from the sequencing device 108. The server device(s) 110 may also communicate with the client device 114. In particular, the server device(s) 110 can send data to the client device 114, including estimated read-coverage levels for target genomic regions within genomic samples, VCFs, or other sequencing related information.
[0057] In some embodiments, the server device(s) 110 comprise a distributed collection of servers where the server device(s) 110 include a number of server devices distributed across the network 112 and located in the same or different physical locations. Further, the server device(s) 110 can comprise a content server, an application server, a communication server, a web-hosting server, or another type of server.
[0058] As further illustrated and indicated in FIG. 1, the client device 114 can generate, store, receive, and send digital data. In particular, the client device 114 can receive readcoverage data from the local server device 102 or receive sequencing metrics from the sequencing device 108. Furthermore, the client device 114 may communicate with the local server device 102 or the server device(s) 110 to receive a VCF comprising variant or genotype calls and/or other metrics, such as a base-call-quality metrics or pass-filter metrics. The client device 114 can accordingly present or display information pertaining to variant calls or other genotype calls within a graphical user interface to a user associated with the client device 114. For example, the client device 114 can present a target read-coverage interface comprising elements indicating potential target read-coverage levels for genomic samples and/or target genomic sequences of the genomic samples.
[0059] Although FIG. 1 depicts the client device 114 as a desktop or laptop computer, the client device 114 may comprise various types of client devices. For example, in some embodiments, the client device 114 includes non-mobile devices, such as desktop computers or servers, or other types of client devices. In yet other embodiments, the client device 114 includes mobile devices, such as laptops, tablets, mobile telephones, or smartphones. Additional details regarding the client device 114 are discussed below with respect to FIG. 10. [0060] As further illustrated in FIG. 1, the client device 114 includes a sequencing application 116. The sequencing application 116 may be a web application or a native application stored and executed on the client device 114 (e.g., a mobile application, desktop application). The sequencing application 116 can include instructions that (when executed) cause the client device 114 to receive data from the target-sequence-coverage system 106 and present, for display at the client device 114, data concerning read-coverage data for a sequencing run, data from a VCF, or other information. Furthermore, the sequencing application 116 can instruct the client device 114 to display graphical user interfaces for receiving input indicating a target genomic sequence and a target read-coverage level for the target genomic sequence.
[0061] As further illustrated in FIG. 1, a version of the target-sequence-coverage system 106 may be located on the client device 114 as part of the sequencing application 116. Accordingly, in some embodiments, the target-sequence-coverage system 106 is implemented by (e.g., located entirely or in part) on the client device 114. In yet other embodiments, the target-sequence-coverage system 106 is implemented by one or more other components of the computing system 100, such as the server device(s) 110. In particular, the target-sequence- coverage system 106 can be implemented in a variety of different ways across local server device 102, the sequencing device 108, the client device 114, and the server device(s) 110. For example, the target-sequence-coverage system 106 can be downloaded from the server device(s) 110 to the local server device 102 and/or the client device 114 where all or part of the functionality of the target-sequence-coverage system 106 is performed at each respective device within the computing system 100.
[0062] Aspects of the present disclosure relate generally to devices, systems, and methods providing biological or chemical analysis. Various protocols in biological or chemical research involve performing a large number of controlled reactions on local support surfaces or within predefined reaction chambers. The designated reactions may then be observed or detected, and subsequent analysis may help identify or reveal properties of chemicals involved in the reaction. For example, in some multiplex assays, an unknown analyte having an identifiable label (e.g., fluorescent label) may be exposed to thousands of known probes under controlled conditions. Each known probe may be deposited into a corresponding well of a flow cell channel. Observing any chemical reactions that occur between the known probes and the unknown analyte within the wells may help identify or reveal properties of the analyte. Other examples of such protocols include known DNA sequencing processes, such as sequencing- by-synthesis (SBS) or cyclic-array sequencing.
[0063] While a variety of devices, systems, and methods have been made and used to perform biological or chemical analysis, it is believed that no one prior to the inventor(s) has made or used the devices and techniques described herein.
[0064] As mentioned, the target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles sufficient to satisfy a target read-coverage level of a target genomic region. FIG. 2 illustrates an overview of the target-sequence-coverage system 106 generating an adjusted number of sequencing cycles and executing a sequencing run based on the adjusted number of sequencing cycles in accordance with one or more implementations of the present disclosure. By way of overview, the target-sequence-coverage system 106 can identify a target genomic region and determine a target read-coverage level for the target genomic region. The target-sequence-coverage system 106 can further provisionally map nucleotide reads to a reference sequence corresponding to the target genomic region and estimate read-coverage levels for the target genomic region. The target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles based on the estimated readcoverage levels and execute a sequencing run based on the adjusted number of sequencing cycles.
[0065] As shown in FIG. 2, the target-sequence-coverage system 106 performs an act 202 of receiving data input identifying a target genomic region. For example, in some embodiments, the target-sequence-coverage system 106 receives a user indication of a target genomic region. More specifically, the target genomic region comprises genomic coordinates or a region of one or more genomic samples that is the focus of sequencing. For instance, a target genomic region can be identified as part of a target array that focuses on a panel of genes, exons, or other genomic regions of interest. Target arrays can be useful in targeted gene sequencing, exome sequencing, custom panels for disease research, epigenetic studies, and other applications. As shown in FIG. 2, the target-sequence-coverage system 106 receives data input identifying a target genomic region 216 for a genomic sample 214. In some implementations, the target-sequence-coverage system 106 identifies one or more target genomic regions for which the target-sequence-coverage system 106 estimates read-coverage levels and generates an adjusted number of sequencing cycles. [0066] FIG. 2 further illustrates the target-sequence-coverage system 106 performing an act 204 of determining a target read-coverage level for the target genomic region. In some embodiments, the target-sequence-coverage system 106 automatically determines a target read-coverage level for the target genomic region across samples. For example, the target- sequence-coverage system 106 may predetermine a target read-coverage level of 20x, 3 Ox, 40x, 50x, 60x, or more. In some implementations, the target-sequence-coverage system 106 determines the target read-coverage level based on receiving data input. For instance, the target-sequence-coverage system 106 can receive an indication of a desired target readcoverage.
[0067] In some implementations, the target-sequence-coverage system 106 determines multiple target read-coverage levels. In particular, the target-sequence-coverage system 106 may identify or determine different target read-coverage levels for different target genomic regions and/or different target read-coverage levels for different genomic samples. For example, the target-sequence-coverage system 106 may determine higher or lower target readcoverage levels for different genomic samples.
[0068] As further illustrated in FIG. 2, the target-sequence-coverage system 106 can perform an act 206 of provisionally mapping nucleotide reads to a reference sequence corresponding to the target genomic region. In particular, the target-sequence-coverage system 106 provisionally maps nucleotide reads of the genomic samples to a reference sequence after an initial set of sequencing cycles and during the sequencing run. Generally, the initial set of sequencing cycles comprises enough sequencing cycles such that the nucleotide read is long enough for the target-sequence-coverage system 106 to provisionally map the nucleotide read. In some examples, the target-sequence-coverage system 106 automatically determines the initial set of sequencing cycles. For instance, the target-sequence-coverage system 106 can determine that the initial set of sequencing cycles comprises 15, 20, 32, 50, or another number of sequencing cycles. In some embodiments, the target-sequence-coverage system 106 determines the number of sequencing cycles in the initial set of sequencing cycles based on user input. For example, the target-sequence-coverage system 106 can receive a user indication of a desired number of initial sequencing cycles.
[0069] After the initial set of sequencing cycles, and as part of performing the act 206, the target-sequence-coverage system 106 provisionally maps nucleotide reads of the genomic samples to the reference sequence corresponding to the target genomic region. As shown in FIG. 2, the target-sequence-coverage system 106 provisionally maps nucleotide reads 222 to a reference genome 220 after an initial set of sequencing cycles. Based on mapping the nucleotide reads 222 to the reference genome, the target-sequence-coverage system 106 can identify nucleotide reads that are provisionally mapped to a reference sequence corresponding to the target genomic region 218. As shown in FIG. 2, the target-sequence-coverage system 106 maps reads originating from all samples to the reference genome 220.
[0070] In some implementations, the target-sequence-coverage system 106 maps the nucleotide reads 222 to the reference genome 220 in real time during the sequencing run. Realtime mapping of nucleotide reads before completion of a sequencing run is described by U.S. Pat. No. 11,646,102 B2, the disclosure of which is incorporated herein by reference in its entirety. For instance, the target-sequence-coverage system 106 can perform a secondary analysis iteratively while nucleotide reads (also called, sequence reads) are generated by the target-sequence-coverage system 106 or other sequencing system. By performing secondary analysis, the target-sequence-coverage system 106 can align nucleotide reads to the reference sequence and, based on such alignment, detect differences between a the nucleotide reads of a genomic sample and a reference genome. More specifically, the target-sequence-coverage system 106 can receive imaging data for sequencing cycles and determine whether a certain number of minimum sequencing cycles have been performed. Based on determining that the minimum sequencing cycles have been performed, the target-sequence-coverage system 106 can align nucleotide reads to a reference genome. Based on determining that there are more nucleotides to be read, the target-sequence-coverage system 106 can repeat the process of accessing additional imaging data and mapping nucleotide reads from the minimum sequencing cycles to the reference genome. The target-sequence-coverage system 106 can iteratively repeat this process until all sequencing cycles are complete. FIG. 4 and the corresponding discussion further detail how the target-sequence-coverage system 106 provisionally maps nucleotide reads to the reference sequence in accordance with one or more embodiments of the present disclosure.
[0071] As further shown in FIG. 2, the target-sequence-coverage system 106 may also perform the act 208 of estimating, during the sequencing run, read-coverage levels of the target genomic region. As part of performing the act 208, the target-sequence-coverage system 106 identifies nucleotide reads or clusters of oligonucleotides belonging to respective genomic samples. The target-sequence-coverage system 106 can determine an estimated read-coverage level for each genomic sample based on the number of nucleotide reads provisionally mapped to the reference sequence (or “# of target region nucleotide reads”) and a currently selected number of sequencing cycles for the sequencing run. In one embodiment, the target-sequence- coverage system 106 determines an estimated read-coverage level for each genomic sample by estimating a number of nucleotide reads of a genomic sample for each of the initial set of sequencing cycles. FIG. 5 and the corresponding paragraph provide additional detail regarding how the target-sequence-coverage system 106 estimates read-coverage levels in accordance with one or more embodiments of the present disclosure. For example, in some implementations, the target-sequence-coverage system 106 performs the act 208 by utilizing the following equation:
# of target region nucleotide reads
- - — - - ; - ; - x Currently selected # of sequencing cycles
# of initial sequencing cycles
Where the # of target region nucleotide reads represents an estimated number of nucleotide reads of a genomic sample based on the nucleotide reads provisionally mapped to the reference sequence, the # of initial sequencing cycles represents a number of sequencing cycles in the initial set of sequencing cycles, and the currently selected # of sequencing cycles represents the currently selected number of sequencing cycles.
[0072] As further illustrated in FIG. 2, the target-sequence-coverage system 106 performs an act 210 of generating an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level. The target-sequence-coverage system 106 can generate an adjusted number for each genomic sample of the genomic samples. In particular, if the target-sequence- coverage system 106 determines that the estimated read-coverage levels of the target genomic region are lower or significantly higher than the target read-coverage levels, the targetsequence-coverage system 106 can generate an adjusted number of sequencing cycles. For instance, in some implementations, the target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles that is lower than the currently selected number of sequencing cycles or a preset number of sequencing cycles to avoid over sequencing one or more genomic samples. In another example, the target-sequence-coverage system 106 generates an adjusted number of sequencing cycles that is higher than the currently selected number of sequencing cycles or a preset number of sequencing cycles to satisfy the target readcoverage level.
[0073] As shown in FIG. 2, in some implementations, the target-sequence-coverage system 106 performs the act 210 and generates an adjusted number of sequencing cycles by utilizing the following equation: # of target region nucleotide reads
Ncvc x - - - - - - - = target read — coverage level # of initial sequencing cycles
Where Ncyc represents the adjusted number of sequencing cycles, # of target region nucleotide reads represents the number of nucleotide reads of a genomic sample provisionally mapped to the reference sequence corresponding to the target genomic region, and # of initial sequencing cycles represents the number of sequencing cycles in the initial set of sequencing cycles.
[0074] As mentioned previously, in some examples, the target-sequence-coverage system 106 determines a plurality of target read-coverage levels for different target genomic regions and/or different genomic samples. In such instances, the target-sequence-coverage system 106 may generate a plurality of adjusted numbers of sequencing cycles sufficient to satisfy the plurality of target read-coverage levels. In some implementations, the target-sequence- coverage system 106 selects an adjusted number of sequencing cycles from the plurality of adjusted numbers of sequencing cycles. For example, the target-sequence-coverage system 106 may select the highest adjusted number of sequencing cycles from the plurality of adjusted numbers of sequencing cycles. In another example, the target-sequence-coverage system 106 selects the highest adjusted number of sequencing cycles that is below a maximum number of cycles within a sequencing run.
[0075] In some embodiments, the target-sequence-coverage system 106 includes buffer sequencing cycles in the adjusted number of sequencing cycles. Generally, the target-sequence- coverage system 106 can determine to perform a predetermined number of buffer sequencing cycles to ensure that the target read-coverage levels will be met. For example, the target- sequence-coverage system 106 can perform a threshold number of buffer sequencing cycles on top of target sequencing cycles predicted to meet the target read-coverage level. The threshold number of buffer sequencing cycles can be a percentage of the target sequencing cycles. For instance, the target-sequence-coverage system 106 can determine that the threshold number of buffer sequencing cycles equals 10% of the target sequencing cycles.
[0076] As further illustrated in FIG. 2, the target-sequence-coverage system 106 performs an act 212 of executing the sequencing run. In some implementations, the target-sequence- coverage system 106 executes the sequencing run according to the adjusted number of sequencing cycles. In some embodiments, the target-sequence-coverage system 106 executes the sequencing run based on the highest adjusted number of sequencing cycles for a genomic sample. Thus, the target-sequence-coverage system 106 can ensure that the genomic sample having the lowest estimated read-coverage level is sequenced to the target read-coverage level. FIG. 6 and the corresponding discussion provide additional details on how the target-sequence- coverage system 106 executes the sequencing run in accordance with one or more implementations of the present disclosure.
[0077] In some implementations, the target-sequence-coverage system 106 generates and executes an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level for a given genomic sample. For example, in some implementations, the target-sequence- coverage system 106 performs the act 208 and estimates read-coverage levels of the target genomic region for all of the genomic samples. The target-sequence-coverage system 106 can improve efficiency by generating an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level for the genomic sample having the lowest estimated readcoverage levels. In one or more additional embodiments, the target-sequence-coverage system 106 generates adjusted numbers of sequencing cycles sufficient to satisfy target read-coverage levels for all genomic samples.
[0078] In some implementations, the target-sequence-coverage system 106 generates and executes an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level for a given target genomic region. For example, in some implementations, the target- sequence-coverage system 106 performs the act 208 by estimating read-coverage levels of the target genomic region for all target genomic regions. The target-sequence-coverage system 106 can generate an adjusted number of sequencing cycles sufficient to satisfy the target readcoverage level for the target genomic region having the lowest estimated read-coverage levels. In one or more additional embodiments, the target-sequence-coverage system 106 generates adjusted numbers of sequencing cycles sufficient to satisfy target read-coverage levels for all target genomic regions.
[0079] As previously mentioned, in some implementations, the target-sequence-coverage system 106 provisionally re-maps nucleotide reads of the genomic samples to the reference sequence and estimates an updated read-coverage level. By re-mapping the nucleotide reads and estimating an updated read-coverage level, the target-sequence-coverage system 106 can more accurately predict whether the adjusted number of sequencing cycles is sufficient to satisfy the target read-coverage level across genomic samples. FIG. 3 illustrates an overview of the target-sequence-coverage system 106 estimating an updated read-coverage level in accordance with one or more implementations of the present disclosure. By way of overview, the target-sequence-coverage system 106 determines a checkpoint sequencing cycle at which to perform the provisional re-mapping of the nucleotide reads. The target-sequence-coverage system 106 can provisionally re-map nucleotide reads to the reference sequence and estimate an updated read-coverage level based on the provisionally re-mapped nucleotide reads.
[0080] FIG. 3 illustrates the target-sequence-coverage system 106 performing an act 302 of determining a checkpoint sequencing cycle. The target-sequence-coverage system 106 can determine a checkpoint sequencing cycle within a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles. As shown in FIG. 3, for instance, the target-sequence-coverage system 106 can use the following equation to determine the checkpoint sequencing cycle:
Acyc — Threshold Number = Checkpoint sequencing cycle
Where Ncyc represents the adjusted number of sequencing cycles and “Threshold Number” represents a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles.
[0081] In some implementations, the target-sequence-coverage system 106 automatically determines the threshold number of sequencing cycles before the last sequencing cycle of the adjusted number of sequencing cycles. For example, the target-sequence-coverage system 106 can automatically determine that the threshold number equals 5, 10, 20, 50, etc. sequencing cycles before the last sequencing cycle of the adjusted number of sequencing cycles. In another embodiment, the target-sequence-coverage system 106 determines the threshold number of sequencing cycles based on user input. For instance, the target-sequence-coverage system 106 may receive a user indication of a desired threshold number of sequencing cycles.
[0082] As further illustrated in FIG. 3, the target-sequence-coverage system 106 performs an act 304 of provisionally re-mapping nucleotide reads to the reference sequence. As shown, the target-sequence-coverage system 106 provisionally re-maps, at or before the checkpoint sequencing cycle, nucleotide reads of the genomic samples to the reference sequence corresponding to a target genomic region 310. FIG. 3 illustrates nucleotide reads 312 that have previously been mapped to a reference genome 308. The target-sequence-coverage system 106 maps the nucleotide reads 312 after the set of initial sequencing cycles. Nucleotide reads 314 shown in FIG. 3 comprise nucleotide reads not mapped previously to the target genomic region 310. At the checkpoint sequencing cycle, the target-sequence-coverage system 106 maps the nucleotide reads 314 to the target genomic region 310. [0083] As further shown in FIG. 3, the target-sequence-coverage system 106 performs an act 306 of estimating an updated read-coverage level. The target-sequence-coverage system 106 estimates the updated read-coverage level at or after the checkpoint sequencing cycle. More specifically, the updated read-coverage level indicates a predicted read-coverage level for the target genomic region of the genomic samples after execution of the adjusted number of sequencing cycles. The target-sequence-coverage system 106 estimates, for the target genomic region of the genomic samples, an updated read-coverage level based on the nucleotide reads of the genomic samples provisionally re-mapped to the reference sequence. In some implementations, the target-sequence-coverage system 106 estimates the updated read-coverage level based on the following equation:
Updated Read — Coverage Level
# of Remapped Nucleotide Reads
# of Cycles at Checkpoint
X Adjusted # of Sequencing Cycles + N Sequencing Cycles
[0084] Where the # of Remapped Nucleotide Reads represents a number of nucleotide reads of the genomic samples provisionally re-mapped to the reference sequence, # of Cycles at Checkpoint represents a number of total sequencing cycles from the beginning of the sequencing run to the checkpoint sequencing cycle, and Adjusted # of Sequencing Cycles represents the number of sequencing cycles within the adjusted number of sequencing cycles. In some implementations, the target-sequence-coverage system 106 further estimates the updated read-coverage level based on a margin number of sequencing cycles represented by N Sequencing Cycles. N Sequencing Cycles comprises a predetermined number of sequencing cycles that accounts for some variability in coverage gained between sequencing cycles. For example, the target-sequence-coverage system 106 may automatically determine the margin number of sequencing cycles N Sequencing Cycles. In some examples, the target-sequence-coverage system 106 recieves the margin number of sequencing cycles as input from a client device.
[0085] The target-sequence-coverage system 106 can compare the updated read-coverage levels for the genomic samples to the target read-coverage level. The target-sequence-coverage system 106 can perform various actions based on comparing the updated read-coverage levels and the target read-coverage level. For example, in some implementations, the target-sequence- coverage system 106 generates an updated adjusted number of sequencing cycles and executes the sequencing run according to the updated adjusted number of sequencing cycles. In particular, the target-sequence-coverage system 106 may perform sequencing cycles until finishing a safety threshold number of sequencing cycles comprising the updated adjusted number of sequencing cycles and a buffer number of sequencing cycles based on the updated read-coverage level. In other implementations, the target-sequence-coverage system 106 can determine that the adjusted number of sequencing cycles is sufficient to meet the target readcoverage level and execute the sequencing run according to the adjusted number of sequencing cycles. FIG. 6 and the corresponding paragraphs detail various actions that the target-sequence- coverage system 106 can perform based on comparing the updated read-coverage levels with the target read-coverage level.
[0086] As mentioned previously, the target-sequence-coverage system 106 provisionally maps nucleotide reads of genomic samples to a reference sequence. FIG. 4 illustrates the target- sequence-coverage system 106 provisionally mapping nucleotide reads 408a - 408e to a reference sequence in accordance with one or more embodiments of the present disclosure. FIG. 4 illustrates a reference genome 402 comprising a target genomic region 404 and adjacent genomic regions 406a - 406b. FIG. 4 illustrates how the target-sequence-coverage system 106 can identify nucleotide reads that are provisionally mapped to the target genomic region 404 or near the target genomic region.
[0087] The target-sequence-coverage system 106 can identify read mates of paired-end reads that are provisionally mapped to the reference genome 402. FIG. 4 illustrates the target- sequence-coverage system 106 mapping various paired-end nucleotide reads to the reference genome 402. For example, as shown in FIG. 4, nucleotide reads 408a - 408b represent a pair of paired-end reads comprising a first read mate and a second read mate and nucleotide reads 408c - 408d represent a different pair of paired-end reads comprising a first read mate and a second read mate that have been provisionally mapped to the reference genome 402. In some implementations, the target-sequence-coverage system 106 utilizes sequencing methods that index and begin determining base calls for both read mates early in a sequencing cycle. Accordingly, the target-sequence-coverage system 106 can access and provisionally map both read mates in a paired-end read to the reference genome 402 after the initial set of sequencing cycles. For instance, the target-sequence-coverage system 106 can access and provisionally map both read mates of the nucleotide reads 408a - 408b and nucleotide reads 408c - 408d after an initial set of sequencing cycles.
[0088] In some examples, the target-sequence-coverage system 106 can provisionally map a single read to the target genomic region 404. For instance, the target-sequence-coverage system 106 may map a single nucleotide read resulting from single-end sequencing or a first read mate from paired-end sequencing. For example, some sequencing systems generate base calls for a second read only after determining nucleotide reads for a first read mate. In another example, sequencing systems index a first read mate before indexing a second read mate — accordingly, the target-sequence-coverage system 106 cannot access information identifying a second read mate from paired-end sequencing. Accordingly, the target-sequence-coverage system 106 can provisionally map a first read mate. As shown in FIG. 4, nucleotide reads 408e- 408f can comprise nucleotide reads from single-end sequencing or a first read mate (e.g., Rl) of paired-end nucleotide reads.
[0089] As further shown in FIG. 4, the target-sequence-coverage system 106 can identify nucleotide reads that are provisionally mapped to the target genomic region 404 within the reference genome 402. More specifically, the target-sequence-coverage system 106 may determine locations of nucleotide reads of the genomic samples within the reference sequence corresponding to the target genomic region. For instance, the target-sequence-coverage system 106 provisionally maps the nucleotide reads 408a - 408b to the target genomic region 404. Additionally, the target-sequence-coverage system 106 can map a first read mate from paired- end nucleotide reads to the target genomic region 404. For example, the target-sequence- coverage system 106 can map the nucleotide read 408d to the target genomic region 404 while mapping its corresponding second read mate, the nucleotide read 408c, to a segment of the reference genome 402 that does not overlap with the target genomic region 404.
[0090] Based on such a provisional mapping, the target-sequence-coverage system 106 identifies nucleotide reads that are mapped near the target genomic region that will likely contribute to the coverage of the target genomic region 404. In some implementations, the target-sequence-coverage system 106 provisionally maps nucleotide reads to an adjacent genomic region within a threshold number of nucleobases of the target genomic region. FIG. 4 illustrates adjacent genomic regions 406a - 406b that surround the target genomic region 404. In some embodiments, the target-sequence-coverage system 106 automatically determines a threshold number of nucleobases for the adjacent genomic regions. For instance, the target- sequence-coverage system 106 can automatically determine that the threshold number of nucleobases equals 25, 50, 100, etc. nucleobases. In some implementations, the targetsequence-coverage system 106 determines the threshold number of nucleobases for the adjacent genomic regions based on user input. In other implementations, the target-sequence- coverage system 106 determines the threshold number of nucleobases based on the number of sequencing cycles in the initial set of sequencing cycles. A higher number of initial sequencing cycles may indicate that nucleotide reads are nearer to sequencing completion and, thus, close to their final read length. Accordingly, the target-sequence-coverage system 106 can determine lower threshold numbers of nucleobases for higher numbers of initial sequencing cycles. As shown in FIG. 4, the nucleotide read 408e is provisionally mapped to the adjacent genomic region 406b. Nucleotide reads that provisionally map to the adjacent genomic regions 406a - 406b may provide additional coverage for the target genomic region.
[0091] In some implementations, the target-sequence-coverage system 106 can estimate read-coverage levels of a target genomic region based on the mapped locations and read-growth directions of the nucleotide reads of the genomic samples. Generally, the target-sequence- coverage system 106 can evaluate the location of nucleotide reads within or near the target genomic region 404 and whether the read direction is toward (or away from) the center of the target genomic region 404 to determine if those nucleotide reads will contribute to coverage of the target genomic region 404. The target-sequence-coverage system 106 determines that nucleotide reads that are in closer proximity to the target genomic region 404 are more likely to affect the coverage of the target genomic region 404. In some embodiments, and as illustrated in FIG. 4, the target-sequence-coverage system 106 determines read-growth directions of nucleotide reads as part of provisionally mapping nucleotide reads to the target genomic region 404.
[0092] The target-sequence-coverage system 106 can determine read-growth directions of nucleotide reads growing upstream or downstream with respect to the target genomic region. For example, the left side of the reference genome 402 can represent the 5’ end, and the right side of the reference genome 402 can represent the 3’ end. In some embodiments, the target- sequence-coverage system 106 determines the read-growth directions of one or more of the nucleotide reads 408b, 408d, and 408e are upstream based on the nucleotide reads growing toward the 5’ end. For example, the target-sequence-coverage system 106 determines that the nucleotide read 408e is located downstream relative to the target genomic region 404. Though the nucleotide read 408e is not provisionally mapped to the target genomic region 404, the target-sequence-coverage system 106 can determine that the read-growth direction of the nucleotide read 408e is upstream and pointed toward the center of the target genomic region 404. In contrast, the target-sequence-coverage system 106 determines that while the nucleotide read 408f is in a similar location as the nucleotide read 408e, the read-growth direction of the nucleotide read 408f is opposite to the center of the target genomic region 404. Accordingly, the target-sequence-coverage system 106 may determine that the nucleotide read 408f is unlikely to influence coverage of the target genomic region 404.
[0093] As mentioned, the target-sequence-coverage system 106 estimates read-coverage levels of the target genomic region within one or more genomic samples. In accordance with one or more embodiments of the present disclosure, FIG. 5 illustrates the target-sequence- coverage system 106 identifying clusters of oligonucleotides provisionally mapped to a reference sequence corresponding with a target genomic region and identifying genomic samples to which the nucleotide sequence reads belong.
[0094] In some embodiments, as part of estimating read-coverage levels for individual genomic samples, the target-sequence-coverage system 106 determines an estimated total number of clusters belonging to each genomic sample. As illustrated in FIG. 5, the target- sequence-coverage system 106 performs an act 501 of determining filtered clusters corresponding to the target genomic region. The target-sequence-coverage system 106 can identify clusters of oligonucleotides satisfying a filtering threshold for signals of the subset of clusters of oligonucleotides. More specifically, the target-sequence-coverage system 106 determines quality metrics 520 for the subset of clusters of oligonucleotides. To illustrate, during the initial set of sequencing cycles, the target-sequence-coverage system 106 captures images of clusters of oligonucleotide within a flow cell region and evaluates the signals emitted from the clusters of oligonucleotides to determine the quality metrics 520. In some embodiments, the target-sequence-coverage system 106 can determine quality metrics 520 for clusters of oligonucleotides up to an evaluation cycle of the initial set of sequencing cycles. For example, the target-sequence-coverage system 106 can determine the quality metrics 520 for each sequencing cycle up to the 25th cycle comprising the evaluation cycle of the sequencing run.
[0095] In some embodiments, the quality metrics 520 comprise a chastity value. The term “chastity value” refers to a quality measurement used to assess a confidence or purity of a called nucleobase from a sequencing cycle. In particular, the chastity value is a measure of the confidence of the called base at each position within a nucleotide read. For example, the chastity value may be calculated based on the intensity of the fluorescent signals emitted from the clusters of oligonucleotides. The target-sequence-coverage system 106 measures the intensity of each of the four nucleotide-specific fluorescent signals (e.g., within two channels). The target-sequence-coverage system 106 may determine the chastity value by determining a ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. In some examples, the target-sequence-coverage system 106 can report the chastity value as a percent value ranging from 0%-100%.
[0096] In some embodiments, the quality metrics 520 comprise a base-call-quality score. The term “base-call-quality score” refers to a specific score or other measurement indicating an accuracy of a nucleobase call. In particular, a base-call-quality score comprises a value indicating a likelihood that one or more predicted nucleobase calls for a genomic coordinate contain errors. For example, in certain implementations, a base-call-quality score can comprise a Q score (e.g., a Phil’s Read Editor (PhRED) quality' score) predicting the error probability of any given nucleobase call. To illustrate, a base-call-quality score (or Q score) may indicate that a probability of an incorrect nucleobase call at a genomic coordinate is equal to 1 in 100 for a Q20 score, 1 in 1,000 for a Q30 score, 1 in 10,000 for a Q40 score, etc. In some cases, the base- call-quality score is generated by a machine-learning model or an algorithm, either of which can be scaled to be consistent with a PhRED scale.
[0097] In yet other embodiments, the quality metrics 520 comprise mapping-quality scores and are determined either during provisional mapping or later mapping of nucleotide reads to a reference sequence or reference genome. The term “mapping-quality score” refers to a metric or other measurement quantifying a quality or certainty of an alignment of nucleotide reads (or other nucleotide sequences or subsequences) with a reference sequence or reference genome. In some embodiments, for example, a mapping-quality score includes mapping quality (MAPQ) scores for nucleobase calls at genomic coordinates, where a MAPQ score represents -10 loglO Pr{mapping position is wrong}, rounded to the nearest integer. In the alternative to a mean or median mapping quality, in some implementations, a mapping-quality score includes a full distribution of mapping qualities for all nucleotide reads aligning with a reference genome at a genomic coordinate.
[0098] The target-sequence-coverage system 106 may generate other types of quality metrics 520 for identifying the filtered clusters. For example, in some implementations, the target-sequence-coverage system 106 evaluates si nal-to-noise ratio (SNR) for clusters of oligonucleotides.
[0099] As further illustrated in FIG. 5, as part of determining filtered clusters, the targetsequence-coverage system 106 can generate a pass filter map 522. In some such cases, the pass filter map 522 is limited to clusters of oligonucleotides corresponding to one or more target genomic regions. The target-sequence-coverage system 106 aggregates the quality metrics 520 for the initial set of sequencing cycles to generate the pass filter map 522 for the subset of clusters of oligonucleotides identified in the genomic sample map 518. Generally, the pass filter map 522 provides information about the outcome of quality filtering applied to the subset of clusters of oligonucleotides for the initial set of sequencing cycles. The pass filter map 522 indicates a percentage of clusters at a location that satisfy a filtering threshold over sequencing cycles up to, and in some instances including, the evaluation cycle. For example, the targetsequence-coverage system 106 determines a percent of filter-passing clusters for each cluster of the subset of clusters of oligonucleotides. For example, the target-sequence-coverage system 106 determines that across the sequencing cycles up to the evaluation cycle, 75% of the oligonucleotide cluster 512a comprise filter-passing clusters. Additionally, the target- sequence-coverage system 106 determines that up to the evaluation cycle, 93% of the oligonucleotide cluster 512b comprise filter-passing clusters. As part of generating the pass filter map 522 the target-sequence-coverage system 106 may utilize one or more filtering thresholds based on the generated base-call-quality metrics. For example, the target-sequence- coverage system 106 may utilize a chastity filtering threshold, a quality score filtering threshold, and/or a mapping-quality filtering threshold to identify clusters of oligonucleotides that satisfy the filtering threshold.
[0100] As shown in FIG. 5, the target-sequence-coverage system 106 performs an act 502 of determining a subset of clusters of oligonucleotides producing nucleotide reads provisionally mapped to the reference sequence. More specifically, the target-sequence- coverage system 106 can disregard clusters that did not pass filter and evaluate passing filter oligonucleotide clusters. For instance, the target-sequence-coverage system 106 determines, from filter-passing clusters, a subset of clusters of oligonucleotides provisionally mapped to the reference sequence. As mentioned, at or after an initial set of sequencing cycles, the target- sequence-coverage system 106 provisionally maps nucleotide reads to the reference genome. In at least one implementation, and as part of provisionally mapping the nucleotide reads, the target-sequence-coverage system 106 captures images of a flow cell comprising clusters of oligonucleotides. The target-sequence-coverage system 106 can estimate clusters of oligonucleotides that originate the nucleotide reads provisionally mapped to the reference sequence corresponding to the target genomic region. For example, the target-sequence- coverage system 106 determines that clusters of oligonucleotides 512a-512b within a flow cell region 510 produce nucleotide reads provisionally mapped to the reference sequence. In some implementations, the target-sequence-coverage system 106 improves efficiency of analysis by limiting its analysis to the subset of clusters of oligonucleotides provisionally mapped to a reference sequence (or mapped within a threshold number of nucleobases within the reference sequence) and ignoring or discarding the other clusters of oligonucleotides not provisionally mapped to the reference sequence (or mapped within the threshold number of nucleobases within the reference sequence).
[0101] As further shown in FIG. 5, the target-sequence-coverage system 106 may perform an act 504 of determining respective numbers of clusters of oligonucleotides belonging to respective genomic samples. More specifically, the target-sequence-coverage system 106 determines numbers of clusters of oligonucleotides belonging to each genomic sample for the initial set of sequencing cycles. The target-sequence-coverage system 106 determines, based on indexing sequences within the subset of clusters of oligonucleotides, respective numbers of clusters of oligonucleotides belonging to respective genomic samples. In some implementations, the target-sequence-coverage system 106 compares index sequences of nucleotide reads in clusters of oligonucleotides to a reference (or database) of known indexes to determine the genomic sample origin of each nucleotide reads. The target-sequence- coverage system 106 may sort clusters of oligonucleotides based on their originating samples. [0102] For instance, and as shown in FIG. 5, the target-sequence-coverage system 106 accesses raw sequencing data comprising indexing sequences 514 associated with a sample genomic sequence 524 and indexing sequences 516 associated with a sample genomic sequence 526. The indexing sequences 514-516 can comprise “barcodes” that act as unique identifiers for each genomic sample, allowing for differentiation and sorting of the nucleotide reads during demultiplexing. For example, the indexing sequences 514 indicate that the sample genomic sequence 524 comes from genomic sample I. The indexing sequences 516 indicate that the sample genomic sequence 526 originates from genomic sample II.
[0103] In one or more implementations, the target-sequence-coverage system 106 demultiplexes nucleotide reads by utilizing a reference of known indexes. FIG. 5 illustrates a reference of registered indexes 528. The target-sequence-coverage system 106 compares indexing sequences with known indexing sequences in the reference of registered indexes 528. The reference of registered indexes 528 associates each index barcode or sequence with its respective genomic sample. For example, and as illustrated, the reference of registered indexes 528 stores indexing sequences with their corresponding genomic samples. As shown, genomic samples may correspond with one or more unique barcodes.
[0104] In one or more embodiments, as part of performing the act 504, the target-sequence- coverage system 106 generates and stores a genomic sample map indicating the locations of clusters corresponding with each genomic sample. More particularly, the target-sequence- coverage system 106 generates a genomic sample map 518 that indicates locations of clusters corresponding to each of the genomic samples. As mentioned, the target-sequence-coverage system 106 improves efficiency by analyzing clusters of oligonucleotides producing nucleotide reads provisionally mapped to the reference sequence. Accordingly, the target-sequence- coverage system 106 analyzes indexing sequences for the oligonucleotide clusters 512a-512b to identify their originating genomic samples. As shown in FIG. 5, the target-sequence- coverage system 106 determines that the oligonucleotide cluster 512a originates from genomic sample I and the oligonucleotide cluster 512b originates from genomic sample II. Based on this analysis, the target-sequence-coverage system 106 can determine numbers of clusters of oligonucleotides belonging to each genomic sample of the genomic samples.
[0105] In some implementations, the target-sequence-coverage system 106 determines base calls for indexing sequences before determining base calls for nucleotide reads as described by U.S. Patent Application No. 63/517,160, entitled “Modifying Sequencing Cycles or Imaging During a Sequencing Run to Meet Customized Coverage Estimation,” filed August 2, 2023, for Alexander Fuhrmann et al. and assigned to Illumina, Inc., the disclosure of which is incorporated herein by reference in its entirety.
[0106] As further illustrated in FIG. 5, the target-sequence-coverage system 106 performs an act 508 of determining an estimated total number of clusters. Generally, the target-sequence- coverage system 106 uses the observed respective numbers of clusters of oligonucleotides for the initial set of sequencing cycles to infer an estimated total number of clusters for each genomic sample for the entire sequencing run. The target-sequence-coverage system 106 generates the estimated total number of filtered clusters corresponding to one or more target genomic samples based on the currently selected number of sequencing cycles for the sequencing run and the numbers of clusters of oligonucleotides belonging to each genomic sample of the genomic samples. In some examples, the target-sequence-coverage system 106 estimates a total number of clusters for a genomic sample for a sequencing run using the following equation:
# of clusters for a sample
- - - - - - — x currently selected # of sequencing cycles
# of initial sequencing cycles where # of clusters for a sample represents the number of clusters belonging to a genomic sample determined as part of the act 504. # of initial sequencing cycles represents the number of sequencing cycles in the initial set of sequencing cycles, and currently selected # of sequencing cycles represents the currently selected number of sequencing cycles.
[0107] As described previously, the target-sequence-coverage system 106 performs various actions based on the adjusted number of sequencing cycles. FIG. 6 illustrates an example decision flowchart by which the target-sequence-coverage system 106 executes a sequencing run according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure. As indicated above, the target-sequence- coverage system 106 can adjust a number of sequencing cycles to (i) satisfy a single target read-coverage level for a single target genomic region or (ii) satisfy multiple target readcoverage levels for multiple target genomic regions, respectively. As described below, for ease of explanation, this disclosure focuses its description of FIG. 6 with respect to a single target read-coverage level for a single target genomic region. But the target-sequence-coverage system 106 can adjust the following operations to account for nucleotide reads mapping to multiple target genomic regions and/or multiple corresponding target read-coverage levels.
[0108] As shown in FIG. 6, the target-sequence-coverage system 106 may begin at the start 602 by provisionally mapping nucleotide reads to the reference sequence and estimating readcoverage levels of the target genomic region. As further illustrated in FIG. 6, the target- sequence-coverage system 106 can perform an act 604 of generating an adjusted number of sequencing cycles.
[0109] The target-sequence-coverage system 106 can perform an evaluation 606 of the adjusted number of sequencing cycles by determining whether the adjusted number of sequencing cycles is higher than the currently selected number of sequencing cycles. More particularly, the target-sequence-coverage system 106 evaluates the highest adjusted number of sequencing cycles across genomic samples.
[0110] Based on determining that the adjusted number of sequencing cycles is not higher than the currently selected number of sequencing cycles, the target-sequence-coverage system 106 can perform an act 608 of performing the adjusted number of sequencing cycles or the currently selected number of sequencing cycles. For example, the target-sequence-coverage system 106 determines that both the adjusted number of sequencing cycles and the currently selected number of sequencing cycles will likely result in sufficient coverage of the target genomic region across all genomic samples. The target-sequence-coverage system 106 may elect to perform the adjusted number of sequencing cycles to further reduce the amount of consumable resources required to execute additional sequencing cycles. Accordingly, the target-sequence-coverage system 106 can more efficiently utilize consumables by reducing the number of fluidic reagent cycles and reagents required to meet the target coverage level relative to some existing sequencing systems.
[0H1] Based on determining that the adjusted number of sequencing cycles is higher than the currently selected number of sequencing cycles, the target-sequence-coverage system 106 can perform an evaluation 610 of whether the adjusted number of sequencing cycles is above a maximum number of sequencing cycles within a run. If the adjusted number of sequencing cycles is above a maximum number of cycles within a run, the target-sequence-coverage system 106 may determine that an additional sequencing run is required to meet the target readcoverage level for the target genomic region. Accordingly, the target-sequence-coverage system 106 may perform an act 612 of stopping sequencing or completing the maximum number of sequencing cycles. The target-sequence-coverage system 106 may determine to terminate sequencing after the initial set of sequencing cycles to conserve sequencing resources. More specifically, the target-sequence-coverage system 106 can cause the fluidics systems described below with respect to FIGS. 7-8 to terminate the sequencing run.
[0112] In some embodiments, the target-sequence-coverage system 106 may perform the act 612 of stopping sequencing or completing the maximum number of sequencing cycles based on an assessment of reagent volume. In particular, the target-sequence-coverage system 106 may determine that a sequencing device includes an insufficient amount of reagent(s) to complete an adjusted number of sequencing cycles. In some examples, the target-sequence- coverage system 106 evaluates a known amount of reagent in the reagent source or a detected amount of reagents in a sequencing device based on a detection sensor to determine whether the target-sequence-coverage system 106 may successfully complete the adjusted number of sequencing cycles with the known amount or detected amount of reagent. In some implementations, the target-sequence-coverage system 106 performs an additional act of providing, to a client device (e.g., the client device 114), a notification that existing levels of reagent are insufficient to complete the adjusted number of sequencing cycles such that an automated system or an operator can provide additional reagent. The target-sequence-coverage system 106 may then continue to perform the adjusted number of sequencing cycles.
[0113] As mentioned previously, the target-sequence-coverage system 106 may generate and execute an adjusted number of sequencing cycles sufficient to satisfy target read-coverage levels for the maximum number of target genomic regions and genomic samples. For example, the target-sequence-coverage system 106 may determine that an adjusted number of sequencing cycles to satisfy target read-coverage levels for all target genomic regions and genomic samples exceeds the maximum number of sequencing cycles within a run. In some examples, the target-sequence-coverage system 106 may determine an adjusted number of sequencing cycles that falls within a maximum number of cycles within a sequencing run. Based on this adjusted number of sequencing cycles, the target-sequence-coverage system 106 may further determine deficient target genomic regions and/or deficient genomic samples that are unlikely to meet their target read-coverage levels. In some examples, the target-sequence- coverage system 106 may provide, to a client device (e.g., the client device 114), a notification indicating the deficient target genomic regions and/or deficient genomic samples that are unlikely to meet their target read-coverage levels. The target-sequence-coverage system 106 may receive, from the client device, an indication to stop sequencing or to complete the maximum number of cy cles.
[0114] In some examples, the target-sequence-coverage system 106 can determine to complete the maximum number of cycles in the sequencing run as part of the act 612. The target-sequence-coverage system 106 can estimate maximum read-coverage levels of the target genomic region based on the maximum number of sequencing cycles in a sequencing run. Based on determining that the maximum estimated read-coverage levels are within a threshold range of the target read-coverage level, the target-sequence-coverage system 106 can complete the maximum number of sequencing cycles.
[0115] As shown in FIG. 6, based on determining that the number of sequencing cycles is equal to or less than the maximum number of cycles within a run, the target-sequence-coverage system 106 can perform an act 614 of continuing the adjusted number of sequencing cycles. Generally, the target-sequence-coverage system 106 continues to perform sequencing cycles after the initial set of sequencing cycles. In some implementations, the target-sequence- coverage system 106 performs the adjusted number of sequencing cycles to completion of the sequencing run.
[0116] As further shown, FIG. 6 illustrates the target-sequence-coverage system 106 performing an act 616 of estimating an updated read-coverage level at a checkpoint sequencing cycle. In some implementations, and as described above, the target-sequence-coverage system 106 performs a provisional re-mapping of the nucleotide reads at a checkpoint sequencing cycle. As indicated by FIG. 6, the act 616 is an optional act and can be performed iteratively. For instance, the target-sequence-coverage system 106 can continue to estimate updated read- coverage levels at different checkpoint sequencing cycles to ensure that the target readcoverage level will be met.
[0117] In addition to performing the act 614 or the act 616, the target-sequence-coverage system 106 can further perform an evaluation 618 of whether additional sequencing cycles are required to meet the target-read-coverage level based on the updated read-coverage-level. If additional sequencing cycles are required to meet the target-read-coverage level, the target- sequence-coverage system 106 can perform an act 620 of generating and performing an updated adjusted number of sequencing cycles. In some embodiments, the target-sequence- coverage system 106 evaluates the updated adjusted number of sequencing cycles in a similar way to how it evaluates the adjusted number of sequencing cycles. For instance, the target- sequence-coverage system 106 can determine whether the updated adjusted number of sequencing cycles is above a maximum number of cycles within a run. Based on this determination, the target-sequence-coverage system 106 can elect to terminate the sequencing run or complete the maximum number of sequencing cycles.
[0118] As further shown in FIG. 6, if the target-sequence-coverage system 106 determines that additional sequencing cycles are not required to meet the target-read-coverage level, the target-sequence-coverage system 106 can perform the act 608 of performing the adjusted number of sequencing cycles.
[0119] As mentioned, the target-sequence-coverage system 106 can terminate a sequencing run or adjust a number of sequencing cycles to meet a target read-coverage level. The targetsequence-coverage system 106 may execute the determined number of sequencing cycles by utilizing fluidics systems. FIG. 7 illustrates a schematic diagram of an example of a system (700) that may be used to perform an analysis on one or more samples of interest. In some implementations, the sample may include one or more clusters of nucleotides (e.g., DNA) that have been linearized to form a single stranded DNA (sstDNA). In the implementation shown, system (700) is configured to receive a flow cell cartridge assembly (702) including a flow cell assembly (703) and a sample cartridge (704). System (700) includes a flow cell receptacle (722) that receives flow cell cartridge assembly (702), a vacuum chuck (724) that supports flow cell assembly (703), and a flow cell interface (726) that is used to establish a fluidic coupling between system (700) and flow cell assembly (703). Flow cell interface (726) may include one or more manifolds. System (700) further includes a sipper manifold assembly (706), a sample loading manifold assembly (708), and a pump manifold assembly (710). System (700) also includes a drive assembly (712), a controller (714), an imaging system (716), and a waste reservoir (718). Controller (714) is electrically and/or communicatively coupled to drive assembly (712) and to imaging system (716); and is configured to cause drive assembly (712) and/or the imaging system (716) to perform various functions as disclosed herein.
[0120] In the present example, flow cell assembly (703) includes a flow cell (728) having a channel (730) and defining a plurality of first openings (732), which are fluidically coupled to the channel (730) and arranged on a first side (734) of the channel (730). Flow cell (728) further includes a plurality of second openings (736) fluidically coupled to the channel (730) and arranged on a second side (738) of the channel (730). Fluid may thus flow through flow cell (728) via channel. While the flow cell (728) is shown including one channel (730), flow cell (728) may include two or more channels (730). Flow cell assembly (703) also includes a flow cell manifold assembly (740) coupled to flow cell (728) and having a first manifold fluidic line (742) and a second manifold fluidic line (744). Flow cell manifold assembly (740) may be in the form of a laminate including a plurality of layers as discussed in more detail below.
[0121] In the implementation shown, first manifold fluidic line (742) has a first fluidic line opening (746) and is fluidically coupled to each of the plurality of first openings (732) of flow cell (728); and second manifold fluidic line (744) has a second fluidic line opening (748) and is fluidically coupled to each of the second openings (736). As shown, flow cell assembly (703) includes gaskets (750) coupled to flow cell manifold assembly (740) and fluidically coupled to fluidic line openings (746, 748). In some implementations where flow cell (728) includes a plurality of channels (730), flow cell manifold assembly (740) may include additional fluidic lines (752) that couple first fluidic line openings (746) to a single manifold port (754). In such implementations, a single gasket (750) may be coupled to flow cell manifold assembly (740) that surrounds the manifold port (754) and is in fluidic communication with a plurality of channels (730). In operation, flow cell interface (726) engages with corresponding gaskets (750) to establish a fluidic coupling between system (700) and flow cell (728). The engagement between flow cell interface (726) and gaskets (750) reduces or eliminates fluid leakage between flow cell interface (726) and flow cell (728).
[0122] In the implementation shown, first manifold fluidic line (742) has a portion (756) that is substantially parallel to a longitudinal axis (758) of channel (730); and second manifold fluidic line (744) has a portion (760) that is substantially parallel to longitudinal axis (758) of channel (730). Additionally , first manifold fluidic line (742) is shown being at least partially adjacent a first end (762) of flow cell (728) and spaced from a second end (764) of flow cell (728); and second manifold fluidic line (744) is shown being at least partially adjacent second end (764) of flow cell (728) and spaced from first end (762). Other arrangements of manifold fluidic lines (742, 744) may prove suitable, however.
[0123] In the implementation shown, system (700) includes a sample cartridge receptacle (766) that receives sample cartridge (704) that carries one or more samples of interest (e.g., an analyte). System (700) also includes a sample cartridge interface (768) that establishes a fluidic connection with sample cartridge (704). Sample loading manifold assembly (708) includes one or more sample valves (770). Pump manifold assembly (710) includes one or more pumps (772), one or more pump valves (774), and a cache (776). Valves (770, 774) and pumps (772) may take any suitable form. Cache (776) may include a serpentine cache and may temporarily store one or more reaction components during, for example, bypass manipulations of the system (700). While cache (776) is shown being included in pump manifold assembly (710), cache (776) may alternatively be located elsewhere (e.g., in sipper manifold assembly (706) or in another manifold downstream of a bypass fluidic line (778), etc.).
[0124] Sample loading manifold assembly (708) and pump manifold assembly (710) flow one or more samples of interest from sample cartridge (704) through a fluidic line (780) toward flow cell cartridge assembly (702). In some implementations, sample loading manifold assembly (708) may individually load or address each channel (730) of flow cell (728) with a respective sample of interest. The process of loading channel (730) with a sample of interest may occur automatically using sy stem (700). As shown in FIG. 7, sample cartridge (704) and sample loading manifold assembly (708) are positioned downstream of flow cell cartridge assembly (702). In the implementation shown, sample loading manifold assembly (708) is coupled between flow cell cartridge assembly (702) and pump manifold assembly (710). To draw a sample of interest from sample cartridge (704) and toward pump manifold assembly (710), sample valves (770), pump valves (774), and/or pumps (772) may be selectively actuated to urge the sample of interest toward pump manifold assembly (710). Sample cartridge (704) may include a plurality of sample reservoirs that are selectively fluidically accessible via the corresponding sample valves (770). To individually flow the sample of interest toward channel (730) of flow cell (728) and away from pump manifold assembly (710), sample valves (770), pump valves (774), and/or pumps (772) may be selectively actuated to urge the sample of interest toward flow cell cartridge assembly (702) and into respective channels (730) of flow cell (728).
[0125] Drive assembly (712) interfaces with sipper manifold assembly (706) and pump manifold assembly (710) to flow one or more reagents that interact with the sample within flow cell (728). In some scenarios, a reversible terminator is attached to the reagent to allow a single nucleotide to be incorporated onto a growing DNA strand. In some such implementations, one or more of the nucleotides has a unique fluorescent label that emits a color when excited. The color (or absence thereof) is used to detect the corresponding nucleotide. In the implementation shown, imaging system (716) excites one or more of the identifiable labels (e.g., a fluorescent label) and thereafter obtains image data for the identifiable labels. The labels may be excited by incident light and/or a laser and the image data may include one or more colors emitted by the respective labels in response to the excitation. The image data (e.g., detection data) may be analyzed by system (700). Examples of features and functionalities that may be incorporated into imaging system (716) will be described in greater detail below.
[0126] After the image data is obtained, drive assembly (712) interfaces with sipper manifold assembly (706) and pump manifold assembly (710) to flow another reaction component (e.g., a reagent) through flow cell (728) that is thereafter received by waste reservoir (718) via a primary waste fluidic line (782) and/or otherwise exhausted by system (700). Some reaction components may perform a flushing operation that chemically cleaves the fluorescent label and the reversible terminator from the sstDNA. The sstDNA may then be ready for another cycle.
[0127] The primary waste fluidic line (782) is coupled between pump manifold assembly (710) and waste reservoir (718). In some implementations, pumps (772) and/or pump valves (774) of pump manifold assembly (710) selectively flow the reaction components from flow cell cartridge assembly (702), through fluidic line (780) and sample loading manifold assembly (708) to primary waste fluidic line (782). Flow cell cartridge assembly (702) is coupled to a central valve (784) via flow cell interface (726). Central valve (784) is coupled with flow cell interface (726) via afluidic line (785). An auxiliary waste fluidic line (786) is coupled to central valve (784) and to waste reservoir (718). In some implementations, auxiliary waste fluidic line (786) receives excess fluid of a sample of interest from flow cell cartridge assembly (702), via central valve (784), and flows the excess fluid of the sample of interest to waste reservoir (718) when back loading the sample of interest into flow cell (728), as described herein.
[0128] Sipper manifold assembly (706) includes a shared line valve (788) and a bypass valve (790). Shared line valve (788) may be referred to as a reagent selector valve. Central valve (784) and the valves (788, 790) of sipper manifold assembly (706) may be selectively actuated to control the flow of fluid through fluidic lines (792, 794, 796). Sipper manifold assembly (706) may be coupled to a corresponding number of reagent reservoirs (798) via reagent sippers (800). Reagent reservoirs (798) may contain fluid (e.g., reagent and/or another reaction component). In some implementations, sipper manifold assembly (706) includes a plurality of ports. Each port of sipper manifold assembly (706) may receive one of the reagent sippers (800). Reagent sippers (800) may be referred to as fluidic lines. Some forms of reagent sippers (800) may include an array of sipper tubes extending downwardly along the z- dimension from ports in the body of sipper manifold assembly (706). Reagent reservoirs (798) may be provided in a cartridge, and the tubes of reagent sippers (800) may be configured to be inserted into corresponding reagent reservoirs (798) in the reagent cartridge so that liquid reagent may be drawn from each reagent reservoir (798) into the sipper manifold assembly (706).
[0129] Shared line valve (788) of sipper manifold assembly (706) is coupled to central valve (784) via shared reagent fluidic line (796). Different reagents may flow through shared reagent fluidic line (796) at different times. In some versions, when performing a flushing operation before changing between one reagent and another, pump manifold assembly (710) may draw wash buffer through shared reagent fluidic line (796), central valve (784), and flow cell cartridge assembly (702).
[0130] Bypass valve (790) of sipper manifold assembly (706) is coupled to central valve (784) via dedicated reagent fluidic lines (794, 796). Each of the dedicated reagent fluidic lines (794, 796) may be associated with a single reagent. The fluids that may flow through dedicated reagent fluidic lines (794, 796) may be used during sequencing operations and may include a cleave reagent, an incorporation reagent, a scan reagent, a cleave wash, and/or a wash buffer.
[0131] Bypass valve (790) is also coupled to cache (776) of pump manifold assembly (710) via bypass fluidic line (778). One or more reagent priming operations, hydration operations, mixing operations, and/or transfer operations may be performed using bypass fluidic line (778). The priming operations, the hydration operations, the mixing operations, and/or the transfer operations may be performed independent of flow cell cartridge assembly (702). Thus, the operations using bypass fluidic line (778) may occur during, for example, incubation of one or more samples of interest within flow cell cartridge assembly (702). That is, shared line valve (788) may be utilized independently of bypass valve (790) such that bypass valve (790) may utilize bypass fluidic line (778) and/or cache (776) to perform one or more operations while shared line valve (788) and/or central valve (784) simultaneously, substantially simultaneously, or offset synchronously perform other operations. [0132] Drive assembly (712) includes a pump drive assembly (802) and a valve drive assembly (804). Pump drive assembly (802) may be adapted to interface with one or more pumps (772) to pump fluid through flow cell (728) and/or to load one or more samples of interest into flow cell (728). Valve drive assembly (804) may be adapted to interface with one or more of the valves (770, 774, 784, 788, 790) to control the position of the corresponding valves (770, 774, 784, 788, 790).
[0133] FIG. 8 shows an example of a fluidic arrangement (820) that may be incorporated into a variation of system (700). Fluidic arrangement (820) of this example includes a pump manifold assembly (822), which may operate similar to pump manifold assembly (710) described above; a sample loading manifold assembly (828), which may operate similar to sample loading manifold assembly (708) described above; a flow cell interface (840), which may operate similar to flow cell interface (726) described above; a sipper manifold assembly (850), which may operate similar to sipper manifold assembly (706) described above; and a waste reservoir (870), which may operate similar to waste reservoir (718) described above. Pump manifold assembly (822) is coupled with a port assembly (858) of sipper manifold assembly (850) via a fluidic line (824), which may be similar to fluidic line (778); and with sample loading manifold assembly (828) via a fluidic line (826). Sample loading manifold assembly (828) is coupled with flow cell interface (840) via fluidic line (830), which may be similar to fluidic line (780); and with port assembly (858) via fluidic lines (832, 834). Flow cell interface (840) is coupled with sipper manifold assembly (850) via fluidic line (842), which may be similar to fluidic line (785). Sipper manifold assembly (850) includes a manifold body (852) and a common output port (856), which provides fluid communication via fluidic line
(785). A valve assembly (854) controls fluid flow through common output port (856) and may operate similar to central valve (784). Port assembly (858) of sipper manifold assembly (850) is coupled with waste reservoir (870) via fluidic line (872), which may be similar to fluidic line
(786).
[0134] A plurality of reagent sippers (860) extend from manifold body (852) and are fluidically coupled with valve assembly (854) via respective fluid channels (862) in manifold body (852). Reagent sippers (860) may operate similar to reagent sippers (800). Valve assembly (854) is operable to selectively couple fluid channels (862) with flow cell interface (840) via common output port (856) and fluidic line (830), to thereby selectively provide various reagents to flow cell interface (840). In other words, when each reagent sipper (860) is disposed in a different respective reagent (e.g., in a respective reagent reservoir (798)), a flow cell (e.g., like flow cell (728)) that is coupled with flow cell interface (840) may selectively receive those different reagents based on control of valve assembly (854).
[0135] A plurality of reagent sippers (860) extend from manifold body (852) and are fluidically coupled with valve assembly (854) via respective fluid channels (862) in manifold body (852). Reagent sippers (860) may operate similar to reagent sippers (800). Valve assembly (854) is operable to selectively couple fluid channels (862) with flow cell interface (840) via common output port (856) and fluidic line (830), to thereby selectively provide various reagents to flow cell interface (840). In other words, when each reagent sipper (860) is disposed in a different respective reagent (e.g., in a respective reagent reservoir (798)), a flow cell (e.g., like flow cell (728)) that is coupled with flow cell interface (840) may selectively receive those different reagents based on control of valve assembly (854).
[0136] Referring back to FIG. 7, controller (714) of the present example includes a user interface (806), a communication interface (808), one or more processors (810), and a memory (812) storing instructions executable by the one or more processors (810) to perform various functions including the disclosed implementations. User interface (806), communication interface (733), and memory (812) are electrically and/or communicatively coupled to the one or more processors (810). User interface (806) may be adapted to receive input from a user and to provide information to the user associated with the operation of system (700) and/or an analysis taking place. User interface (806) may include a touch screen, a display, a keyboard, a speaker(s), a mouse, a track ball, and/or a voice recognition system.
[0137] Communication interface (808) is adapted to enable communication between system (700) and a remote system(s) (e.g., computers) via a network(s) (e.g., the Internet, an intranet, a local-area network (LAN), a wide-area network (WAN), a coaxial-cable network, a wireless network, a wired network, a satellite network, a digital subscriber line (DSL) network, a cellular network, a Bluetooth connection, a near field communication (NFC) connection, etc.). Some of the communications provided to the remote system may be associated with analysis results, imaging data, etc. generated or otherwise obtained by system (700). Some of the communications provided to system (700) may be associated with a fluidics analysis operation, patient records, and/or a protocol(s) to be executed by system (700).
[0138] The one or more processors (810) and/or system (700) may include one or more of a processor-based system(s) or a microprocessor-based system(s). In some implementations, the one or more processors (810) and/or system (700) includes one or more of a programmable processor, a programmable controller, a microprocessor, a microcontroller, a graphics processing unit (GPU), a digital signal processor (DSP), a reduced-instruction set computer (RISC), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a field programmable logic device (FPLD), a logic circuit, and/or another logic-based device executing various functions including the ones described herein.
[0139] Memory (812) may include one or more of a semiconductor memory, a magnetically readable memory, an optical memory, a hard disk drive (HDD), an optical storage drive, a solid-state storage device, a solid-state drive (SSD), a flash memory, a read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), a random-access memory (RAM), a nonvolatile RAM (NVRAM) memory, a compact disc (CD), a compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a Blu-ray disk, a redundant array of independent disks (RAID) system, a cache and/or any other storage device or storage disk in which information is stored for any duration (e.g., permanently, temporarily, for extended periods of time, for buffering, for caching).
[0140] FIGS. 1-8, the corresponding text, and the examples provide a number of different methods, systems, devices, and non-transitory computer-readable media of the target- sequence-coverage system 106. In addition to the foregoing, one or more implementations can also be described in terms of flowcharts comprising acts for accomplishing a particular result, as shown in FIG. 9. FIG. 9 illustrates a flowchart of a series of acts 900 for executing a sequencing run according to an adjusted number of sequencing cycles in accordance with one or more embodiments of the present disclosure. While FIG. 9 illustrates acts according to one embodiment, alternative embodiments may omit, add to, reorder, and/or modify any of the acts shown in FIG. 9. The acts of FIG. 9 can be performed as part of a method. Alternatively, a non- transitory computer readable storage medium can comprise instructions that, when executed by one or more processors, cause a computing device or a system to perform the acts depicted in FIG. 9. In still further embodiments, a system comprising an imaging system, a fluidic system, and a computer comprising: at least one processor; and a non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the system to perform the acts of FIG. 9.
[0141] As shown in FIG. 9, the series of acts 900 includes an act 902 of receiving data input identifying a target genomic region, an act 904 of provisionally mapping nucleotide reads to a reference sequence corresponding to the target genomic region, an act 906 of estimating read-coverage levels of the target genomic region, an act 908 of generating an adjusted number of sequencing cycles, and an act 910 of executing the sequencing run according to the adjusted number of sequencing cycles. For example, the series of acts 900 can include acts to perform any of the operations described in the following clauses:
CLAUSE 1. A method comprising: receiving, for a sequencing run, data input identifying a target genomic region for one or more genomic samples and a target read-coverage level for the target genomic region; provisionally mapping, after an initial set of sequencing cycles and during the sequencing run, nucleotide reads of the one or more genomic samples to a reference sequence corresponding to the target genomic region; estimating, during the sequencing run, read-coverage levels of the target genomic region within the one or more genomic samples based on the nucleotide reads of the one or more genomic samples provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the sequencing run; generating, for the sequencing run and based on the estimated read-coverage levels, an adjusted number of sequencing cycles sufficient to satisfy the target readcoverage level within the target genomic region for one or more genomic samples of the one or more genomic samples; and executing the sequencing run according to the adjusted number of sequencing cycles.
CLAUSE 2. The method of clause 1, further comprising: determining a checkpoint sequencing cycle within a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles; provisionally re-mapping, at the checkpoint sequencing cycle, nucleotide reads of the one or more genomic samples to the reference sequence corresponding to the target genomic region; and estimating, for the target genomic region of the one or more genomic samples and at or after the checkpoint sequencing cycle, an updated read-coverage level based on the nucleotide reads of the one or more genomic samples provisionally re-mapped to the reference sequence.
CLAUSE 3. The method of clause 1, further comprising provisionally mapping the nucleotide reads of the one or more genomic samples to the reference sequence by: provisionally mapping one or more nucleotide reads of the one or more genomic samples to the target genomic region within a reference genome; or provisionally mapping one or more nucleotide reads of the one or more genomic samples to an adjacent genomic region within a threshold number of nucleobases of the target genomic region within the reference genome.
CLAUSE 4. The method of clause 1, further comprising terminating sequencing cycles of the sequencing run after the adjusted number of sequencing cycles finish.
CLAUSE 5. The method of clause 1, further comprising: determining locations of the nucleotide reads of the one or more genomic samples within the reference sequence corresponding to the target genomic region; determining read-growth directions of the nucleotide reads growing upstream or downstream with respect to the target genomic region; and estimating the read-coverage levels of the target genomic region within the one or more genomic samples based on the locations and read-growth directions of the nucleotide reads of the one or more genomic samples provisionally mapped to the reference sequence.
CLAUSE 6. The method of clause 1, further comprising estimatingthe read-coverage levels of the target genomic region within the one or more genomic samples by: determining, from a set of clusters of oligonucleotides for the sequencing run, a subset of clusters of oligonucleotides producing nucleotide reads provisionally mapped to the reference sequence; determining, based on indexing sequences within the subset of clusters of oligonucleotides, respective numbers of clusters of oligonucleotides belonging to respective genomic samples of the one or more genomic samples; and generating, for each genomic sample of the one or more genomic samples, an estimated total number of clusters of oligonucleotides based on the respective numbers of clusters of oligonucleotides and the currently selected number of sequencing cycles for the sequencing run.
CLAUSE 7. The method of clause 1, further comprising: determining, based on an estimated read-coverage level for a genomic sample of the one or more genomic samples at the target genomic region, the genomic sample is unlikely to satisfy the target read-coverage level within a threshold number of sequencing cycles of the adjusted number of sequencing cycles; and based on the genomic sample being unlikely to satisfy the target read-coverage level within the threshold number of sequencing cycles, terminating the sequencing run after the adjusted number of sequencing cycles.
CLAUSE 8. The method of clause 1, further comprising: determining, based on an estimated read-coverage level for a genomic sample of the one or more genomic samples at the target genomic region, the genomic sample is likely to satisfy the target read-coverage level within a threshold number of sequencing cycles after the adjusted number of sequencing cycles; and based on the genomic sample being likely to satisfy the target read-coverage level within the threshold number of sequencing cycles, continuing the sequencing run until the threshold number of sequencing cycles after the adjusted number of sequencing cycles.
CLAUSE 9. The method of clause 1, further comprising perform sequencing cycles of the sequencing run by: determining, for clusters of oligonucleotides immobilized on a nucleotide-sample substrate, base calls as part of paired-end nucleotide reads comprising first read mates and a second read mates; or determining, for clusters of oligonucleotides immobilized on the nucleotide-sample substrate, base calls as part of single-end nucleotide reads.
CLAUSE 10. The method of clause 1, further comprising: provisionally re-mapping, during a subset of sequencing cycles for first read mates and second read mates of paired-end nucleotide reads, a subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence corresponding to the target genomic region; and estimating, for the target genomic region of the one or more genomic samples and during the subset of sequencing cycles for the first read mates and the second read mates, updated read-coverage levels based on the subset of paired-end nucleotide reads provisionally re-mapped to the reference sequence.
CLAUSE 11. The method of clause 10, further comprising: estimating the read-coverage levels during the sequencing run in part by estimating the read-coverage levels for the target genomic region based on first read mates of the subset of paired-end nucleotide reads provisionally mapped to the reference sequence; provisionally re-mapping the subset of paired-end nucleotide reads in part by remapping, during a first subset of sequencing cycles for the first read mates of the paired-end nucleotide reads, a first subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence; estimating the updated read-coverage levels in part by estimating, for the target genomic region of the one or more genomic samples, a first updated read-coverage level before a last sequencing cycle of first subset of sequencing cycles for the first read mates; and performing the first subset of sequencing cycles for the first read mates until finishing a safety threshold number of sequencing cycles based on the first updated read-coverage level.
CLAUSE 12. The method of clause 11, further comprising: provisionally re-mapping the subset of paired-end nucleotide reads in part by remapping, during a second subset of sequencing cycles for the second read mates of the paired- end nucleotide reads, a second subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence; estimating the updated read-coverage levels in part by estimating, for the target genomic region of the one or more genomic samples, a second updated read-coverage level before a last sequencing cycle of the second subset of sequencing cycles for the second read mates; and generating, for the sequencing run and based on the second updated read-coverage level, an updated adjusted number of sequencing cycles sufficient to satisfy the target readcoverage level within the target genomic region for the one or more genomic samples of the one or more genomic samples.
CLAUSE 13. The method of clause 1, further comprising: provisionally mapping, after the initial set of sequencing cycles and during the sequencing run, additional nucleotide reads of the one or more genomic samples to an additional reference sequence corresponding to an additional target genomic region; estimating, during the sequencing run, additional read-coverage levels of the additional target genomic region within the one or more genomic samples based on the additional nucleotide reads of the one or more genomic samples provisionally mapped to the additional reference sequence and the currently selected number of sequencing cycles for the sequencing run; and generating, for the sequencing run and based on the estimated read-coverage levels and the additional read-coverage levels, the adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region and the additional target genomic region for one or more genomic samples of the one or more genomic samples.
CLAUSE 14. The method of clause 1, further comprising: receive, for the sequencing run, data input identifying the target read-coverage level for a first genomic sample of the one or more genomic samples and an additional target readcoverage level for the target genomic region of a second genomic sample of the one or more genomic samples; generate, for the sequencing run and based on the estimated read-coverage levels, an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for the first genomic sample and the additional target read-coverage level within the target genomic region for the second genomic sample; and execute the sequencing run according to the adjusted number of sequencing cycles.
CLAUSE 15. The method of clause 1, further comprising perform sequencing cycles of the sequencing run according to an order of indexing cycles before genomic sequencing cycles by: determining base calls for a first indexing sequence appended to a sample genomic sequence of a genomic sample; determining base calls for a second indexing sequence appended to the sample genomic sequence of the genomic sample; and after determining the base calls for the first indexing sequence and the second indexing sequence, determining base calls for a first nucleotide read corresponding to a first portion of the sample genomic sequence and determining base calls for a second nucleotide read corresponding to a second portion of the sample genomic sequence.
CLAUSE 16. The method of clause 1, further comprising generating the adjusted number of sequencing cycles for the sequencing run by increasing or decreasing a preset number of sequencing cycles for the sequencing run.
CLAUSE 17. The method of clause 1, further comprising detecting a reagent volume of a reagent cartridge in fluid communication with the fluidic system and operate the fluidic system to perform one or more additional sequencing cycles relative to the currently selected number of sequencing cycles until finishing the adjusted number of sequencing cycles by aspirating one or more reagents from the reagent cartridge.
CLAUSE 18. The method of clause 1, further comprising terminating operation of the fluidic system from performing one or more sequencing cycles of the currently selected number of sequencing cycles to finish the sequencing run after performing the adjusted number of sequencing cycles.
[0142] The methods described herein can be used in conjunction with a variety of nucleic acid sequencing techniques. Particularly applicable techniques are those wherein nucleic acids are attached at fixed locations in an array such that their relative positions do not change and wherein the array is repeatedly imaged. Embodiments in which images are obtained in different color channels, for example, coinciding with different labels used to distinguish one nucleobase type from another are particularly applicable. In some embodiments, the process to determine the nucleotide sequence of a target nucleic acid (i.e., a nucleic-acid polymer) can be an automated process. Preferred embodiments include sequencing-by-synthesis (SBS) techniques. [0143] SBS techniques generally involve the enzymatic extension of a nascent nucleic acid strand through the iterative addition of nucleotides against a template strand. In traditional methods of SBS, a single nucleotide monomer may be provided to a target nucleotide in the presence of a polymerase in each delivery. However, in the methods described herein, more than one type of nucleotide monomer can be provided to a target nucleic acid in the presence of a polymerase in a delivery.
[0144] SBS can utilize nucleotide monomers that have a terminator moiety or those that lack any terminator moieties. Methods utilizing nucleotide monomers lacking terminators include, for example, pyrosequencing and sequencing using y-phosphate-labeled nucleotides, as set forth in further detail below. In methods using nucleotide monomers lacking terminators, the number of nucleotides added in each cycle is generally variable and dependent upon the template sequence and the mode of nucleotide delivery . For SBS techniques that utilize nucleotide monomers having a terminator moiety, the terminator can be effectively irreversible under the sequencing conditions used as is the case for traditional Sanger sequencing which utilizes dideoxynucleotides, or the terminator can be reversible as is the case for sequencing methods developed by Solexa (now Illumina, Inc.).
[0145] SBS techniques can utilize nucleotide monomers that have a label moiety or those that lack a label moiety. Accordingly, incorporation events can be detected based on a characteristic of the label, such as fluorescence of the label; a characteristic of the nucleotide monomer such as molecular weight or charge; a byproduct of incorporation of the nucleotide, such as release of pyrophosphate; or the like. In embodiments, where two or more different nucleotides are present in a sequencing reagent, the different nucleotides can be distinguishable from each other, or alternatively, the two or more different labels can be the indistinguishable under the detection techniques being used. For example, the different nucleotides present in a sequencing reagent can have different labels and they can be distinguished using appropriate optics as exemplified by the sequencing methods developed by Solexa (now Illumina, Inc.).
[0146] Preferred embodiments include pyrosequencing techniques. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as particular nucleotides are incorporated into the nascent strand (Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M. and Nyren, P. (1996) "Real-time DNA sequencing using detection of pyrophosphate release." Analytical Biochemistry 242(1), 84-9; Ronaghi, M. (2001) "Pyrosequencing sheds light on DNA sequencing." Genome Res. 11(1), 3-11; Ronaghi, M., Uhlen, M. and Nyren, P. (1998) “A sequencing method based on real-time pyrophosphate.” Science 281(5375), 363; U.S. Pat. No. 6,210,891; U.S. Pat. No. 6,258,568 and U.S. Pat. No. 6,274,320, the disclosures of which are incorporated herein by reference in their entireties). In pyrosequencing, released PPi can be detected by being immediately converted to adenosine triphosphate (ATP) by ATP sulfurylase, and the level of ATP generated is detected via luciferase-produced photons. The nucleic acids to be sequenced can be attached to features in an array and the array can be imaged to capture the chemiluminescent signals that are produced due to incorporation of a nucleotides at the features of the array. An image can be obtained after the array is treated with a particular nucleotide type (e.g., A, T, C or G). Images obtained after addition of each nucleotide type will differ with regard to which features in the array are detected. These differences in the image reflect the different sequence content of the features on the array. However, the relative locations of each feature will remain unchanged in the images. The images can be stored, processed and analyzed using the methods set forth herein. For example, images obtained after treatment of the array with each different nucleotide type can be handled in the same way as exemplified herein for images obtained from different detection channels for reversible terminator-based sequencing methods.
[0147] In another exemplary type of SBS, cycle sequencing is accomplished by stepwise addition of reversible terminator nucleotides containing, for example, a cleavable or photobleachable dye label as described, for example, in WO 04/018497 and U.S. Pat. No. 7,057,026, the disclosures of which are incorporated herein by reference. This approach is being commercialized by Solexa (now Illumina Inc.), and is also described in WO 91/06678 and WO 07/123,744, each of which is incorporated herein by reference. The availability of fluorescently-labeled terminators in which both the termination can be reversed and the fluorescent label cleaved facilitates efficient cyclic reversible termination (CRT) sequencing. Polymerases can also be co-engineered to efficiently incorporate and extend from these modified nucleotides.
[0148] Preferably in reversible terminator-based sequencing embodiments, the labels do not substantially inhibit extension under SBS reaction conditions. However, the detection labels can be removable, for example, by cleavage or degradation. Images can be captured following incorporation of labels into arrayed nucleic acid features. In particular embodiments, each cycle involves simultaneous delivery of four different nucleotide types to the array and each nucleotide type has a spectrally distinct label. Four images can then be obtained, each using a detection channel that is selective for one of the four different labels. Alternatively, different nucleotide types can be added sequentially and an image of the array can be obtained between each addition step. In such embodiments, each image will show nucleic acid features that have incorporated nucleotides of a particular type. Different features are present or absent in the different images due the different sequence content of each feature. However, the relative position of the features will remain unchanged in the images. Images obtained from such reversible terminator-SBS methods can be stored, processed and analyzed as set forth herein. Following the image capture step, labels can be removed and reversible terminator moi eties can be removed for subsequent cycles of nucleotide addition and detection. Removal of the labels after they have been detected in a particular cycle and prior to a subsequent cycle can provide the advantage of reducing background signal and crosstalk between cycles. Examples of useful labels and removal methods are set forth below.
[0149] In particular embodiments some or all of the nucleotide monomers can include reversible terminators. In such embodiments, reversible terminators/cleavable fluors can include fluor linked to the ribose moiety via a 3' ester linkage (Metzker, Genome Res. 15: 1767- 1776 (2005), which is incorporated herein by reference). Other approaches have separated the terminator chemistry from the cleavage of the fluorescence label (Ruparel et al., Proc Natl Acad Sci USA 102: 5932-7 (2005), which is incorporated herein by reference in its entirety). Ruparel et al described the development of reversible terminators that used a small 3' allyl group to block extension, but could easily be deblocked by a short treatment with a palladium catalyst. The fluorophore was attached to the base via a photocleavable linker that could easily be cleaved by a 30 second exposure to long wavelength UV light. Thus, either disulfide reduction or photocleavage can be used as a cleavable linker. Another approach to reversible termination is the use of natural termination that ensues after placement of a bulky dye on a dNTP. The presence of a charged bulky' dye on the dNTP can act as an effective terminator through steric and/or electrostatic hindrance. The presence of one incorporation event prevents further incorporations unless the dye is removed. Cleavage of the dye removes the fluor and effectively reverses the termination. Examples of modified nucleotides are also described in U.S. Pat. No. 7,427,673, and U.S. Pat. No. 7,057,026, the disclosures of which are incorporated herein by reference in their entireties.
[0150] Additional exemplary SBS systems and methods which can be utilized with the methods and systems described herein are described in U.S. Patent Application Publication No. 2007/0166705, U.S. Patent Application Publication No. 2006/0188901, U.S. Pat. No. 7,057,026, U.S. Patent Application Publication No. 2006/0240439, U.S. Patent Application Publication No. 2006/0281109, PCT Publication No. WO 05/065814, U.S. Patent Application Publication No. 2005/0100900, PCT Publication No. WO 06/064199, PCT Publication No. WO 07/010,251, U.S. Patent Application Publication No. 2012/0270305 and U.S. Patent Application Publication No. 2013/0260372, the disclosures of which are incorporated herein by reference in their entireties.
[0151] Some embodiments can utilize detection of four different nucleotides using fewer than four different labels. For example, SBS can be performed utilizing methods and systems described in the incorporated materials of U.S. Patent Application Publication No. 2013/0079232. As a first example, a pair of nucleotide types can be detected at the same wavelength, but distinguished based on a difference in intensity for one member of the pair compared to the other, or based on a change to one member of the pair (e g. via chemical modification, photochemical modification or physical modification) that causes apparent signal to appear or disappear compared to the signal detected for the other member of the pair. As a second example, three of four different nucleotide types can be detected under particular conditions while a fourth nucleotide type lacks a label that is detectable under those conditions, or is minimally detected under those conditions (e.g., minimal detection due to background fluorescence, etc.). Incorporation of the first three nucleotide types into a nucleic acid can be determined based on presence of their respective signals and incorporation of the fourth nucleotide type into the nucleic acid can be determined based on absence or minimal detection of any signal. As a third example, one nucleotide type can include label(s) that are detected in two different channels, whereas other nucleotide types are detected in no more than one of the channels. The aforementioned three exemplary configurations are not considered mutually exclusive and can be used in various combinations. An exemplary embodiment that combines all three examples, is a fluorescent-based SBS method that uses a first nucleotide type that is detected in a first channel (e.g. dATP having a label that is detected in the first channel when excited by a first excitation wavelength), a second nucleotide type that is detected in a second channel (e.g. dCTP having a label that is detected in the second channel when excited by a second excitation wavelength), a third nucleotide type that is detected in both the first and the second channel (e.g. dTTP having at least one label that is detected in both channels when excited by the first and/or second excitation wavelength) and a fourth nucleotide type that lacks a label that is not, or minimally, detected in either channel (e.g. dGTP having no label).
[0152] Further, as described in the incorporated materials of U.S. Patent Application Publication No. 2013/0079232, sequencing data can be obtained using a single channel. In such so-called one-dye sequencing approaches, the first nucleotide type is labeled but the label is removed after the first image is generated, and the second nucleotide type is labeled only after a first image is generated. The third nucleotide type retains its label in both the first and second images, and the fourth nucleotide type remains unlabeled in both images.
[0153] Some embodiments can utilize sequencing by ligation techniques. Such techniques utilize DNA ligase to incorporate oligonucleotides and identify the incorporation of such oligonucleotides. The oligonucleotides typically have different labels that are correlated with the identity of a particular nucleotide in a sequence to which the oligonucleotides hybridize. As with other SBS methods, images can be obtained following treatment of an array of nucleic acid features with the labeled sequencing reagents. Each image will show nucleic acid features that have incorporated labels of a particular type. Different features are present or absent in the different images due the different sequence content of each feature, but the relative position of the features will remain unchanged in the images. Images obtained from ligation-based sequencing methods can be stored, processed and analyzed as set forth herein. Exemplary SBS systems and methods which can be utilized with the methods and systems described herein are described in U.S. Pat. No. 6,969,488, U.S. Pat. No. 6,172,218, and U.S. Pat. No. 6,306,597, the disclosures of which are incorporated herein by reference in their entireties.
[0154] Some embodiments can utilize nanopore sequencing (Deamer, D. W. & Akeson, M. "Nanopores and nucleic acids: prospects for ultrarapid sequencing." Trends Biotechnol. 18, 147-151 (2000); Deamer, D. and D. Branton, "Characterization of nucleic acids by nanopore analysis". Acc. Chem. Res. 35:817-825 (2002); Li, J., M. Gershow, D. Stein, E. Brandin, and J. A. Golovchenko, "DNA molecules and configurations in a solid-state nanopore microscope" Nat. Mater. 2:611-615 (2003), the disclosures of which are incorporated herein by reference in their entireties). In such embodiments, the target nucleic acid passes through a nanopore. The nanopore can be a synthetic pore or biological membrane protein, such as a-hemolysin. As the target nucleic acid passes through the nanopore, each base-pair can be identified by measuring fluctuations in the electrical conductance of the pore. (U.S. Pat. No. 7,001,792; Soni, G. V. & Meller, "A. Progress toward ultrafast DNA sequencing using solid-state nanopores." Clin. Chem. 53, 1996-2001 (2007); Healy, K. "Nanopore-based single-molecule DNA analysis." Nanomed. 2, 459-481 (2007); Cockroft, S. L., Chu, J., Amorin, M. & Ghadiri, M. R. "A singlemolecule nanopore device detects DNA polymerase activity with single-nucleotide resolution." J. Am. Chem. Soc. 130, 818-820 (2008), the disclosures of which are incorporated herein by reference in their entireties). Data obtained from nanopore sequencing can be stored, processed and analyzed as set forth herein. In particular, the data can be treated as an image in accordance with the exemplary treatment of optical images and other images that is set forth herein.
[0155] Some embodiments can utilize methods involving the real-time monitoring of DNA polymerase activity. Nucleotide incorporations can be detected through fluorescence resonance energy transfer (FRET) interactions between a fluorophore-bearing polymerase and y- phosphate-labeled nucleotides as described, for example, in U.S. Pat. No. 7,329,492 and U.S. Pat. No. 7,211,414 (each of which is incorporated herein by reference) or nucleotide incorporations can be detected with zero-mode waveguides as described, for example, in U.S. Pat. No. 7,315,019 (which is incorporated herein by reference) and using fluorescent nucleotide analogs and engineered polymerases as described, for example, in U.S. Pat. No. 7,405,281 and U.S. Patent Application Publication No. 2008/0108082 (each of which is incorporated herein by reference). The illumination can be restricted to a zeptoliter-scale volume around a surface- tethered polymerase such that incorporation of fluorescently labeled nucleotides can be observed with low background (Levene, M. J. et al. "Zero-mode waveguides for singlemolecule analysis at high concentrations." Science 299, 682-686 (2003); Lundquist, P. M. et al. "Parallel confocal detection of single molecules in real time." Opt. Lett. 33, 1026-1028 (2008); Korlach, J. et al. "Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nano structures." Proc. Natl. Acad. Sci. USA 105, 1176-1181 (2008), the disclosures of which are incorporated herein by reference in their entireties). Images obtained from such methods can be stored, processed and analyzed as set forth herein.
[0156] Some SBS embodiments include detection of a proton released upon incorporation of a nucleotide into an extension product. For example, sequencing based on detection of released protons can use an electrical detector and associated techniques that are commercially available from Ion Torrent (Guilford, CT, a Life Technologies subsidiary) or sequencing methods and systems described in US 2009/0026082 Al; US 2009/0127589 Al; US 2010/0137143 Al; or US 2010/0282617 Al, each of which is incorporated herein by reference. Methods set forth herein for amplifying target nucleic acids using kinetic exclusion can be readily applied to substrates used for detecting protons. More specifically, methods set forth herein can be used to produce clonal populations of amplicons that are used to detect protons. [0157] The above SBS methods can be advantageously carried out in multiplex formats such that multiple different target nucleic acids are manipulated simultaneously. In particular embodiments, different target nucleic acids can be treated in a common reaction vessel or on a surface of a particular substrate. This allows convenient delivery of sequencing reagents, removal of unreacted reagents and detection of incorporation events in a multiplex manner. In embodiments using surface-bound target nucleic acids, the target nucleic acids can be in an array format. In an array format, the target nucleic acids can be typically bound to a surface in a spatially distinguishable manner. The target nucleic acids can be bound by direct covalent attachment, attachment to a bead or other particle or binding to a polymerase or other molecule that is attached to the surface. The array can include a single copy of a target nucleic acid at each site (also referred to as a feature) or multiple copies having the same sequence can be present at each site or feature. Multiple copies can be produced by amplification methods such as, bridge amplification or emulsion PCR as described in further detail below.
[0158] The methods set forth herein can use arrays having features at any of a variety of densities including, for example, at least about 10 features/cm2, 100 features/cm2, 500 features/cm2, 1,000 features/cm2, 5,000 features/cm2, 10,000 features/cm2, 50,000 features/cm2, 100,000 features/cm2, 1,000,000 features/cm2, 5,000,000 features/cm2, or higher.
[0159] An advantage of the methods set forth herein is that they provide for rapid and efficient detection of a plurality of target nucleic acid in parallel. Accordingly the present disclosure provides integrated systems capable of preparing and detecting nucleic acids using techniques known in the art such as those exemplified above. Thus, an integrated system of the present disclosure can include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more immobilized DNA fragments, the system comprising components such as pumps, valves, reservoirs, fluidic lines and the like. A flow cell can be configured and/or used in an integrated system for detection of target nucleic acids. Exemplary flow cells are described, for example, in US 2010/0111768 Al and US Ser. No. 13/273,666, each of which is incorporated herein by reference. As exemplified for flow cells, one or more of the fluidic components of an integrated system can be used for an amplification method and for a detection method. Taking a nucleic acid sequencing embodiment as an example, one or more of the fluidic components of an integrated system can be used for an amplification method set forth herein and for the delivery of sequencing reagents in a sequencing method such as those exemplified above. Alternatively, an integrated system can include separate fluidic systems to cariy out amplification methods and to carry' out detection methods. Examples of integrated sequencing systems that are capable of creating amplified nucleic acids and also determining the sequence of the nucleic acids include, without limitation, the MiSeqTM platform (Illumina, Inc., San Diego, CA) and devices described in US Ser. No. 13/273,666, which is incorporated herein by reference.
[0160] The sequencing system described above sequences nucleic-acid polymers present in samples received by a sequencing device. As defined herein, “sample” and its derivatives, is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target. In some embodiments, the sample comprises DNA, RNA, PNA, LNA, chimeric or hybrid forms of nucleic acids. The sample can include any biological, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids. The term also includes any isolated nucleic acid sample such a genomic DNA, fresh- frozen or formalin-fixed paraffin-embedded nucleic acid specimen. It is also envisioned that the sample can be from a single individual, a collection of nucleic acid samples from genetically related members, nucleic acid samples from genetically unrelated members, nucleic acid samples (matched) from a single individual such as a tumor sample and normal tissue sample, or sample from a single source that contains two distinct forms of genetic material such as maternal and fetal DNA obtained from a maternal subject, or the presence of contaminating bacterial DNA in a sample that contains plant or animal DNA. In some embodiments, the source of nucleic acid material can include nucleic acids obtained from a newborn, for example as typically used for newborn screening.
[0161] The nucleic acid sample can include high molecular weight material such as genomic DNA (gDNA). The sample can include low molecular weight material such as nucleic acid molecules obtained from FFPE or archived DNA samples. In another embodiment, low molecular weight material includes enzymatically or mechanically fragmented DNA. The sample can include cell-free circulating DNA. In some embodiments, the sample can include nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, plasma, semen, hair, laser capture micro-dissections, surgical resections, and other clinical or laboratory obtained samples. In some embodiments, the sample can be an epidemiological, agricultural, forensic or pathogenic sample. In some embodiments, the sample can include nucleic acid molecules obtained from an animal such as a human or mammalian source. In another embodiment, the sample can include nucleic acid molecules obtained from a nonmammalian source such as a plant, bacteria, virus or fungus. In some embodiments, the source of the nucleic acid molecules may be an archived or extinct sample or species.
[0162] Further, the methods and compositions disclosed herein may be useful to amplify a nucleic acid sample having low-quality nucleic acid molecules, such as degraded and/or fragmented genomic DNA from a forensic sample. In one embodiment, forensic samples can include nucleic acids obtained from a crime scene, nucleic acids obtained from a missing persons DNA database, nucleic acids obtained from a laboratory associated with a forensic investigation or include forensic samples obtained by law enforcement agencies, one or more military services or any such personnel. The nucleic acid sample may be a purified sample or a crude DNA containing lysate, for example derived from a buccal swab, paper, fabric or other substrate that may be impregnated with saliva, blood, or other bodily fluids. As such, in some embodiments, the nucleic acid sample may comprise low amounts of, or fragmented portions of DNA, such as genomic DNA. In some embodiments, target sequences can be present in one or more bodily fluids including but not limited to, blood, sputum, plasma, semen, urine and serum. In some embodiments, target sequences can be obtained from hair, skin, tissue samples, autopsy or remains of a victim. In some embodiments, nucleic acids including one or more target sequences can be obtained from a deceased animal or human. In some embodiments, target sequences can include nucleic acids obtained from non-human DNA such a microbial, plant or entomological DNA. In some embodiments, target sequences or amplified target sequences are directed to purposes of human identification. In some embodiments, the disclosure relates generally to methods for identifying characteristics of a forensic sample. In some embodiments, the disclosure relates generally to human identification methods using one or more target specific primers disclosed herein or one or more target specific primers designed using the primer design criteria outlined herein. In one embodiment, a forensic or human identification sample containing at least one target sequence can be amplified using any one or more of the target-specific primers disclosed herein or using the primer criteria outlined herein. [0163] The components of the target-sequence-coverage system 106 can include software, hardware, or both. For example, the components of the target-sequence-coverage system 106 can include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices (e g., the local server device 102). When executed by the one or more processors, the computer-executable instructions of the target-sequence-coverage system 106 can cause the computing devices to perform the bubble detection methods described herein. Alternatively, the components of the target-sequence- coverage system 106 can comprise hardware, such as special purpose processing devices to perform a certain function or group of functions. Additionally, or alternatively, the components of the target-sequence-coverage system 106 can include a combination of computer-executable instructions and hardware.
[0164] Furthermore, the components of the target-sequence-coverage system 106 performing the functions described herein with respect to the target-sequence-coverage system 106 may, for example, be implemented as part of a stand-alone application, as a module of an application, as a plug-in for applications, as a library function or functions that may be called by other applications, and/or as a cloud-computing model. Thus, components of the targetsequence-coverage system 106 may be implemented as part of a stand-alone application on a personal computing device or a mobile device. Additionally, or alternatively, the components of the target-sequence-coverage system 106 may be implemented in any application that provides sequencing services including, but not limited to Illumina BaseSpace, Illumina MiSeq, Illumina NovaSeq, Illumina NextS eq, Illumina TruSeq, or Illumina TruSight software. “Illumina,” “BaseSpace,” “MiSeq,” “NovaSeq,” “NextSeq,” “TruSeq,” and “TruSight,” are either registered trademarks or trademarks of Illumina, Inc. in the United States and/or other countries.
[0165] Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices (e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer- readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.
[0166] Computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are non-transitory computer-readable storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the disclosure can comprise at least two distinctly different kinds of computer-readable media: non-transitory computer-readable storage media (devices) and transmission media.
[0167] Non-transitory computer-readable storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (SSDs) (e.g., based on RAM), Flash memory, phasechange memory (PCM), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
[0168] A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
[0169] Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to non-transitory computer-readable storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a NIC), and then eventually transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. Thus, it should be understood that non-transitory computer-readable storage media (devices) can be included in computer system components that also (or even primarily) utilize transmission media.
[0170] Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. In some embodiments, computer-executable instructions are executed on a general- purpose computer to turn the general-purpose computer into a special purpose computer implementing elements of the disclosure. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
[0171] Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
[0172] Embodiments of the present disclosure can also be implemented in cloud computing environments. In this description, “cloud computing” is defined as a model for enabling on- demand network access to a shared pool of configurable computing resources. For example, cloud computing can be employed in the marketplace to offer ubiquitous and convenient on- demand access to the shared pool of configurable computing resources. The shared pool of configurable computing resources can be rapidly provisioned via virtualization and released with low management effort or service provider interaction, and then scaled accordingly.
[0173] A cloud-computing model can be composed of various characteristics such as, for example, on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model can also expose various service models, such as, for example, Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (laaS). A cloud-computing model can also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth. In this description and in the claims, a “cloud-computing environment” is an environment in which cloud computing is employed.
[0174] FIG. 10 illustrates a block diagram of a computing device 1000 that may be configured to perform one or more of the processes described above. One will appreciate that one or more computing devices such as the computing device 1000 may implement the target- sequence-coverage system 106. As shown by FIG. 10, the computing device 1000 can comprise a processor 1002, a memory 1004, a storage device 1006, an I/O interface 1008, and a communication interface 1010, which may be communicatively coupled by way of a communication infrastructure 1012. In certain embodiments, the computing device 1000 can include fewer or more components than those shown in FIG. 10. The following paragraphs describe components of the computing device 1000 shown in FIG. 10 in additional detail.
[0175] In one or more embodiments, the processor 1002 includes hardware for executing instructions, such as those making up a computer program. As an example, and not by way of limitation, to execute instructions for dynamically modifying workflows, the processor 1002 may retrieve (or fetch) the instructions from an internal register, an internal cache, the memory 1004, or the storage device 1006 and decode and execute them. The memory 1004 may be a volatile or non-volatile memory used for storing data, metadata, and programs for execution by the processor(s). The storage device 1006 includes storage, such as a hard disk, flash disk drive, or other digital storage device, for storing data or instructions for performing the methods described herein.
[0176] The I/O interface 1008 allows a user to provide input to, receive output from, and otherwise transfer data to and receive data from computing device 1000. The 1/ O interface 1008 may include a mouse, a keypad or a keyboard, a touch screen, a camera, an optical scanner, network interface, modem, other known I/O devices or a combination of such I/O interfaces. The I/O interface 1008 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain embodiments, the I/O interface 1008 is configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
[0177] The communication interface 1010 can include hardware, software, or both. In any event, the communication interface 1010 can provide one or more interfaces for communication (such as, for example, packet-based communication) between the computing device 1000 and one or more other computing devices or networks. As an example, and not by way of limitation, the communication interface 1010 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI.
[0178] Additionally, the communication interface 1010 may facilitate communications with various types of wired or wireless networks. The communication interface 1010 may also facilitate communications using various communication protocols. The communication infrastructure 1012 may also include hardware, software, or both that couples components of the computing device 1000 to each other. For example, the communication interface 1010 may use one or more networks and/or protocols to enable a plurality of computing devices connected by a particular infrastructure to communicate with each other to perform one or more aspects of the processes described herein. To illustrate, the sequencing process can allow a plurality of devices (e.g., a client device, sequencing device, and server device(s)) to exchange information such as sequencing data and error notifications. [0179] In the foregoing specification, the present disclosure has been described with reference to specific exemplary embodiments thereof. Various embodiments and aspects of the present disclosure(s) are described with reference to details discussed herein, and the accompanying drawings illustrate the various embodiments. The description above and drawings are illustrative of the disclosure and are not to be construed as limiting the disclosure. Numerous specific details are described to provide a thorough understanding of various embodiments of the present disclosure.
[0180] The present disclosure may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. For example, the methods described herein may be performed with less or more steps/acts or the steps/acts may be performed in differing orders. Additionally, the steps/acts described herein may be repeated or performed in parallel with one another or in parallel with different instances of the same or similar steps/acts. The scope of the present application is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

CLAIMS We Claim:
1. A system comprising: an imaging system and a fluidic system; and a computing engine comprising: at least one processor; and a non-transitory computer readable medium comprising instructions that, when executed by the at least one processor, cause the system to: receive, for a sequencing run, data input identifying a target genomic region for one or more genomic samples and a target read-coverage level for the target genomic region; provisionally map, after an initial set of sequencing cycles and during the sequencing run, nucleotide reads of the one or more genomic samples to a reference sequence corresponding to the target genomic region; estimate, during the sequencing run, read-coverage levels of the target genomic region within the one or more genomic samples based on the nucleotide reads of the one or more genomic samples provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the sequencing run; generate, for the sequencing run and based on the estimated readcoverage levels, an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for one or more genomic samples of the one or more genomic samples; and execute the sequencing run according to the adjusted number of sequencing cycles.
2. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to: determine a checkpoint sequencing cycle within a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles; provisionally re-map, at the checkpoint sequencing cycle, nucleotide reads of the one or more genomic samples to the reference sequence corresponding to the target genomic region; and estimate, for the target genomic region of the one or more genomic samples and at or after the checkpoint sequencing cycle, an updated read-coverage level based on the nucleotide reads of the one or more genomic samples provisionally re-mapped to the reference sequence.
3. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to provisionally map the nucleotide reads of the one or more genomic samples to the reference sequence by: provisionally mapping one or more nucleotide reads of the one or more genomic samples to the target genomic region within a reference genome; or provisionally mapping one or more nucleotide reads of the one or more genomic samples to an adjacent genomic region within a threshold number of nucleobases of the target genomic region within the reference genome.
4. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to terminate sequencing cycles of the sequencing run after the adjusted number of sequencing cycles finish.
5. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to: determine locations of the nucleotide reads of the one or more genomic samples within the reference sequence corresponding to the target genomic region; determine read-growth directions of the nucleotide reads growing upstream or downstream with respect to the target genomic region; and estimate the read-coverage levels of the target genomic region within the one or more genomic samples based on the locations and read-growth directions of the nucleotide reads of the one or more genomic samples provisionally mapped to the reference sequence.
6. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to estimate the read-coverage levels of the target genomic region within the one or more genomic samples by: determining, from a set of clusters of oligonucleotides for the sequencing run, a subset of clusters of oligonucleotides producing nucleotide reads provisionally mapped to the reference sequence; determining, based on indexing sequences within the subset of clusters of oligonucleotides, respective numbers of clusters of oligonucleotides belonging to respective genomic samples of the one or more genomic samples; and generating, for each genomic sample of the one or more genomic samples, an estimated total number of clusters of oligonucleotides based on the respective numbers of clusters of oligonucleotides and the currently selected number of sequencing cycles for the sequencing run.
7. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to: determine, based on an estimated read-coverage level for a genomic sample of the one or more genomic samples at the target genomic region, the genomic sample is unlikely to satisfy the target read-coverage level within a threshold number of sequencing cycles of the adjusted number of sequencing cycles; and based on the genomic sample being unlikely to satisfy the target read-coverage level within the threshold number of sequencing cycles, terminate the sequencing run after the adjusted number of sequencing cycles.
8. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to: determine, based on an estimated read-coverage level for a genomic sample of the one or more genomic samples at the target genomic region, the genomic sample is likely to satisfy the target read-coverage level within a threshold number of sequencing cycles after the adjusted number of sequencing cycles; and based on the genomic sample being likely to satisfy the target read-coverage level within the threshold number of sequencing cycles, continue the sequencing run until the threshold number of sequencing cycles after the adjusted number of sequencing cycles.
9. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to perform sequencing cycles of the sequencing run by: determining, for clusters of oligonucleotides immobilized on a nucleotide-sample substrate, base calls as part of paired-end nucleotide reads comprising first read mates and a second read mates; or determining, for clusters of oligonucleotides immobilized on the nucleotide-sample substrate, base calls as part of single-end nucleotide reads.
10. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to: provisionally re-map, during a subset of sequencing cycles for first read mates and second read mates of paired-end nucleotide reads, a subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence corresponding to the target genomic region; and estimate, for the target genomic region of the one or more genomic samples and during the subset of sequencing cycles for the first read mates and the second read mates, updated read-coverage levels based on the subset of paired-end nucleotide reads provisionally remapped to the reference sequence.
11. The system of claim 10, further comprising instructions that, when executed by the at least one processor, cause the system to: estimate the read-coverage levels during the sequencing run in part by estimating the read-coverage levels for the target genomic region based on first read mates of the subset of paired-end nucleotide reads provisionally mapped to the reference sequence; provisionally re-map the subset of paired-end nucleotide reads in part by re-mapping, during a first subset of sequencing cycles for the first read mates of the paired-end nucleotide reads, a first subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence; estimate the updated read-coverage levels in part by estimating, for the target genomic region of the one or more genomic samples, a first updated read-coverage level before a last sequencing cycle of first subset of sequencing cycles for the first read mates; and perform the first subset of sequencing cycles for the first read mates until finishing a safety threshold number of sequencing cycles based on the first updated read-coverage level.
12. The system of claim 11, further comprising instructions that, when executed by the at least one processor, cause the system to: provisionally re-map the subset of paired-end nucleotide reads in part by re-mapping, during a second subset of sequencing cycles for the second read mates of the paired-end nucleotide reads, a second subset of paired-end nucleotide reads of the one or more genomic samples to the reference sequence; estimate the updated read-coverage levels in part by estimating, for the target genomic region of the one or more genomic samples, a second updated read-coverage level before a last sequencing cycle of the second subset of sequencing cycles for the second read mates; and generate, for the sequencing run and based on the second updated read-coverage level, an updated adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for the one or more genomic samples of the one or more genomic samples.
13. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to: provisionally map, after the initial set of sequencing cycles and during the sequencing run, additional nucleotide reads of the one or more genomic samples to an additional reference sequence corresponding to an additional target genomic region; estimate, during the sequencing run, additional read-coverage levels of the additional target genomic region within the one or more genomic samples based on the additional nucleotide reads of the one or more genomic samples provisionally mapped to the additional reference sequence and the currently selected number of sequencing cycles for the sequencing run; and generate, for the sequencing run and based on the estimated read-coverage levels and the additional read-coverage levels, the adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region and the additional target genomic region for the one or more genomic samples.
14. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to: receive, for the sequencing run, data input identifying the target read-coverage level for a first genomic sample of the one or more genomic samples and an additional target readcoverage level for the target genomic region of a second genomic sample of the one or more genomic samples; generate, for the sequencing run and based on the estimated read-coverage levels, an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for the first genomic sample and the additional target read-coverage level within the target genomic region for the second genomic sample; and execute the sequencing run according to the adjusted number of sequencing cycles.
15. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to perform sequencing cycles of the sequencing run according to an order of indexing cycles before genomic sequencing cycles by: determining base calls for a first indexing sequence appended to a sample genomic sequence of a genomic sample of the one or more genomic samples; determining base calls for a second indexing sequence appended to the sample genomic sequence of the genomic sample; and after determining the base calls for the first indexing sequence and the second indexing sequence, determining base calls for a first nucleotide read corresponding to a first portion of the sample genomic sequence and determining base calls for a second nucleotide read corresponding to a second portion of the sample genomic sequence.
16. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to generate the adjusted number of sequencing cycles for the sequencing run by increasing or decreasing a preset number of sequencing cycles for the sequencing run.
17. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to detect a reagent volume of a reagent cartridge in fluid communication with the fluidic system and operate the fluidic system to perform one or more additional sequencing cycles relative to the currently selected number of sequencing cycles until finishing the adjusted number of sequencing cycles by aspirating one or more reagents from the reagent cartridge.
18. The system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to terminate operation of the fluidic system from performing one or more sequencing cycles of the currently selected number of sequencing cycles to finish the sequencing run after performing the adjusted number of sequencing cycles.
19. A method comprising: receiving, for a sequencing run, data input identifying a target genomic region for one or more genomic samples and a target read-coverage level for the target genomic region; provisionally mapping, after an initial set of sequencing cycles and during the sequencing run, nucleotide reads of the one or more genomic samples to a reference sequence corresponding to the target genomic region; estimating, during the sequencing run, read-coverage levels of the target genomic region within the one or more genomic samples based on the nucleotide reads of the one or more genomic samples provisionally mapped to the reference sequence and a currently selected number of sequencing cycles for the sequencing run; generating, for the sequencing run and based on the estimated read-coverage levels, an adjusted number of sequencing cycles sufficient to satisfy the target read-coverage level within the target genomic region for one or more genomic samples of the one or more genomic samples; and executing the sequencing run according to the adjusted number of sequencing cycles.
20. The method of claim 19, further comprising: determining a checkpoint sequencing cycle within a threshold number of sequencing cycles before a last sequencing cycle of the adjusted number of sequencing cycles; provisionally re-mapping, at the checkpoint sequencing cycle, nucleotide reads of the one or more genomic samples to the reference sequence corresponding to the target genomic region; and estimating, for the target genomic region of the one or more genomic samples and at or after the checkpoint sequencing cycle, an updated read-coverage level based on the nucleotide reads of the one or more genomic samples provisionally re-mapped to the reference sequence.
21. The method of claim 19, further comprising provisionally mapping the nucleotide reads of the one or more genomic samples to the reference sequence by: provisionally mapping one or more nucleotide reads of the one or more genomic samples to the target genomic region within a reference genome; or provisionally mapping one or more nucleotide reads of the one or more genomic samples to an adjacent genomic region within a threshold number of nucleobases of the target genomic region within the reference genome.
PCT/US2025/028563 2024-05-13 2025-05-09 Modifying sequencing cycles during a sequencing run to meet customized coverage estimations for a target genomic region Pending WO2025240241A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463646237P 2024-05-13 2024-05-13
US63/646,237 2024-05-13

Publications (1)

Publication Number Publication Date
WO2025240241A1 true WO2025240241A1 (en) 2025-11-20

Family

ID=96141265

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/028563 Pending WO2025240241A1 (en) 2024-05-13 2025-05-09 Modifying sequencing cycles during a sequencing run to meet customized coverage estimations for a target genomic region

Country Status (1)

Country Link
WO (1) WO2025240241A1 (en)

Citations (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991006678A1 (en) 1989-10-26 1991-05-16 Sri International Dna sequencing
US6172218B1 (en) 1994-10-13 2001-01-09 Lynx Therapeutics, Inc. Oligonucleotide tags for sorting and identification
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
US6258568B1 (en) 1996-12-23 2001-07-10 Pyrosequencing Ab Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation
US6274320B1 (en) 1999-09-16 2001-08-14 Curagen Corporation Method of sequencing a nucleic acid
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
WO2004018497A2 (en) 2002-08-23 2004-03-04 Solexa Limited Modified nucleotides for polynucleotide sequencing
US20050100900A1 (en) 1997-04-01 2005-05-12 Manteia Sa Method of nucleic acid amplification
US6969488B2 (en) 1998-05-22 2005-11-29 Solexa, Inc. System and apparatus for sequential processing of analytes
US7001792B2 (en) 2000-04-24 2006-02-21 Eagle Research & Development, Llc Ultra-fast nucleic acid sequencing device and a method for making and using the same
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
WO2006064199A1 (en) 2004-12-13 2006-06-22 Solexa Limited Improved method of nucleotide detection
US20060240439A1 (en) 2003-09-11 2006-10-26 Smith Geoffrey P Modified polymerases for improved incorporation of nucleotide analogues
US20060281109A1 (en) 2005-05-10 2006-12-14 Barr Ost Tobias W Polymerases
WO2007010251A2 (en) 2005-07-20 2007-01-25 Solexa Limited Preparation of templates for nucleic acid sequencing
US7211414B2 (en) 2000-12-01 2007-05-01 Visigen Biotechnologies, Inc. Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
WO2007123744A2 (en) 2006-03-31 2007-11-01 Solexa, Inc. Systems and devices for sequence by synthesis analysis
US7315019B2 (en) 2004-09-17 2008-01-01 Pacific Biosciences Of California, Inc. Arrays of optical confinements and uses thereof
US7329492B2 (en) 2000-07-07 2008-02-12 Visigen Biotechnologies, Inc. Methods for real-time single molecule sequence determination
US20080108082A1 (en) 2006-10-23 2008-05-08 Pacific Biosciences Of California, Inc. Polymerase enzymes and reagents for enhanced nucleic acid sequencing
US7405281B2 (en) 2005-09-29 2008-07-29 Pacific Biosciences Of California, Inc. Fluorescent nucleotide analogs and uses therefor
US20090026082A1 (en) 2006-12-14 2009-01-29 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20090127589A1 (en) 2006-12-14 2009-05-21 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20100137143A1 (en) 2008-10-22 2010-06-03 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes
US20100282617A1 (en) 2006-12-14 2010-11-11 Ion Torrent Systems Incorporated Methods and apparatus for detecting molecular interactions using fet arrays
US20120270305A1 (en) 2011-01-10 2012-10-25 Illumina Inc. Systems, methods, and apparatuses to image a sample for biological or chemical analysis
US20130079232A1 (en) 2011-09-23 2013-03-28 Illumina, Inc. Methods and compositions for nucleic acid sequencing
US20130260372A1 (en) 2012-04-03 2013-10-03 Illumina, Inc. Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing
US20190385699A1 (en) * 2016-10-07 2019-12-19 Illumina, Inc. System and method for secondary analysis of nucleotide sequencing data
WO2022119812A1 (en) * 2020-12-02 2022-06-09 Illumina Software, Inc. System and method for detection of genetic alterations
WO2023035110A1 (en) * 2021-09-07 2023-03-16 深圳华大智造科技股份有限公司 Method for analyzing sequence of target polynucleotide
WO2024073519A1 (en) * 2022-09-30 2024-04-04 Illumina, Inc. Machine-learning model for refining structural variant calls

Patent Citations (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991006678A1 (en) 1989-10-26 1991-05-16 Sri International Dna sequencing
US6172218B1 (en) 1994-10-13 2001-01-09 Lynx Therapeutics, Inc. Oligonucleotide tags for sorting and identification
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
US6210891B1 (en) 1996-09-27 2001-04-03 Pyrosequencing Ab Method of sequencing DNA
US6258568B1 (en) 1996-12-23 2001-07-10 Pyrosequencing Ab Method of sequencing DNA based on the detection of the release of pyrophosphate and enzymatic nucleotide degradation
US20050100900A1 (en) 1997-04-01 2005-05-12 Manteia Sa Method of nucleic acid amplification
US6969488B2 (en) 1998-05-22 2005-11-29 Solexa, Inc. System and apparatus for sequential processing of analytes
US6274320B1 (en) 1999-09-16 2001-08-14 Curagen Corporation Method of sequencing a nucleic acid
US7001792B2 (en) 2000-04-24 2006-02-21 Eagle Research & Development, Llc Ultra-fast nucleic acid sequencing device and a method for making and using the same
US7329492B2 (en) 2000-07-07 2008-02-12 Visigen Biotechnologies, Inc. Methods for real-time single molecule sequence determination
US7211414B2 (en) 2000-12-01 2007-05-01 Visigen Biotechnologies, Inc. Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
US20060188901A1 (en) 2001-12-04 2006-08-24 Solexa Limited Labelled nucleotides
US7427673B2 (en) 2001-12-04 2008-09-23 Illumina Cambridge Limited Labelled nucleotides
US20070166705A1 (en) 2002-08-23 2007-07-19 John Milton Modified nucleotides
WO2004018497A2 (en) 2002-08-23 2004-03-04 Solexa Limited Modified nucleotides for polynucleotide sequencing
US20060240439A1 (en) 2003-09-11 2006-10-26 Smith Geoffrey P Modified polymerases for improved incorporation of nucleotide analogues
US7315019B2 (en) 2004-09-17 2008-01-01 Pacific Biosciences Of California, Inc. Arrays of optical confinements and uses thereof
WO2006064199A1 (en) 2004-12-13 2006-06-22 Solexa Limited Improved method of nucleotide detection
US20060281109A1 (en) 2005-05-10 2006-12-14 Barr Ost Tobias W Polymerases
WO2007010251A2 (en) 2005-07-20 2007-01-25 Solexa Limited Preparation of templates for nucleic acid sequencing
US7405281B2 (en) 2005-09-29 2008-07-29 Pacific Biosciences Of California, Inc. Fluorescent nucleotide analogs and uses therefor
WO2007123744A2 (en) 2006-03-31 2007-11-01 Solexa, Inc. Systems and devices for sequence by synthesis analysis
US20100111768A1 (en) 2006-03-31 2010-05-06 Solexa, Inc. Systems and devices for sequence by synthesis analysis
US20080108082A1 (en) 2006-10-23 2008-05-08 Pacific Biosciences Of California, Inc. Polymerase enzymes and reagents for enhanced nucleic acid sequencing
US20090026082A1 (en) 2006-12-14 2009-01-29 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20090127589A1 (en) 2006-12-14 2009-05-21 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes using large scale FET arrays
US20100282617A1 (en) 2006-12-14 2010-11-11 Ion Torrent Systems Incorporated Methods and apparatus for detecting molecular interactions using fet arrays
US20100137143A1 (en) 2008-10-22 2010-06-03 Ion Torrent Systems Incorporated Methods and apparatus for measuring analytes
US20120270305A1 (en) 2011-01-10 2012-10-25 Illumina Inc. Systems, methods, and apparatuses to image a sample for biological or chemical analysis
US20130079232A1 (en) 2011-09-23 2013-03-28 Illumina, Inc. Methods and compositions for nucleic acid sequencing
US20130260372A1 (en) 2012-04-03 2013-10-03 Illumina, Inc. Integrated optoelectronic read head and fluidic cartridge useful for nucleic acid sequencing
US20190385699A1 (en) * 2016-10-07 2019-12-19 Illumina, Inc. System and method for secondary analysis of nucleotide sequencing data
US11646102B2 (en) 2016-10-07 2023-05-09 Illumina, Inc. System and method for secondary analysis of nucleotide sequencing data
WO2022119812A1 (en) * 2020-12-02 2022-06-09 Illumina Software, Inc. System and method for detection of genetic alterations
WO2023035110A1 (en) * 2021-09-07 2023-03-16 深圳华大智造科技股份有限公司 Method for analyzing sequence of target polynucleotide
WO2024073519A1 (en) * 2022-09-30 2024-04-04 Illumina, Inc. Machine-learning model for refining structural variant calls

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
COCKROFT, S. L.CHU, J.AMORIN, M.GHADIRI, M. R.: "A single-molecule nanopore device detects DN A polymerase activity with single-nucleotide resolution", J. AM. CHEM. SOC., vol. 130, 2008, pages 818 - 820, XP055097434, DOI: 10.1021/ja077082c
DEAMER, D. W.AKESON, M.: "Nanopores and nucleic acids: prospects for ultrarapid sequencing", TRENDS BIOTECHNOL., vol. 18, 2000, pages 147 - 151, XP004194002, DOI: 10.1016/S0167-7799(00)01426-8
DEAMER, D.D. BRANTON: "Characterization of nucleic acids by nanopore analysis", ACC. CHEM. RES., vol. 35, 2002, pages 817 - 825, XP002226144, DOI: 10.1021/ar000138m
HEALY, K.: "Nanopore-based single-molecule DNA analysis", NANOMED, vol. 2, 2007, pages 459 - 481, XP009111262, DOI: 10.2217/17435889.2.4.459
KORLACH, J. ET AL.: "Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nano structures", PROC. NATL. ACAD. SCI. USA, vol. 105, 2008, pages 1176 - 1181
LEVENE, M. J. ET AL.: "Zero-mode waveguides for single-molecule analysis at high concentrations", SCIENCE, vol. 299, 2003, pages 682 - 686, XP002341055, DOI: 10.1126/science.1079700
LI, JM. GERSHOWD. STEINE. BRANDINJ. A. GOLOVCHENKO: "DNA molecules and configurations in a solid-state nanopore microscope", NAT. MATER., vol. 2, 2003, pages 611 - 615, XP009039572, DOI: 10.1038/nmat965
LUNDQUIST, P. M. ET AL.: "Parallel confocal detection of single molecules in real time", OPT. LETT., vol. 33, 2008, pages 1026 - 1028, XP001522593, DOI: 10.1364/OL.33.001026
METZKER, GENOME RES., vol. 15, 2005, pages 1767 - 1776
RONAGHI, M.: "Pyrosequencing sheds light on DNA sequencing", GENOME RES., vol. 11, no. 1, 2001, pages 3 - 11, XP000980886, DOI: 10.1101/gr.11.1.3
RONAGHI, M.KARAMOHAMED, S.PETTERSSON, B.UHLEN, M.NYREN, P.: "Real-time DNA sequencing using detection of pyrophosphate release", ANALYTICAL BIOCHEMISTRY, vol. 242, no. 1, 1996, pages 84 - 9, XP002388725, DOI: 10.1006/abio.1996.0432
RONAGHI, M.UBKLEN, M.NYREN, P.: "1\ sequencing method based on real-time pyrophosphate", SCIENCE, vol. 281, no. 5375, 1998, pages 363, XP002135869, DOI: 10.1126/science.281.5375.363
RUPAREL ET AL., PROC NATL ACAD SCI USA, vol. 102, 2005, pages 5932 - 7
SONI, G. V.MELLER: "A. Progress toward ultrafast DNA sequencing using solid-state nanopores", CLIN. CHEM., vol. 53, 2007, pages 1996 - 2001, XP055076185, DOI: 10.1373/clinchem.2007.091231

Similar Documents

Publication Publication Date Title
US20240038327A1 (en) Rapid single-cell multiomics processing using an executable file
US20220415442A1 (en) Signal-to-noise-ratio metric for determining nucleotide-base calls and base-call quality
US20240404624A1 (en) Structural variant alignment and variant calling by utilizing a structural-variant reference genome
US20230420082A1 (en) Generating and implementing a structural variation graph genome
US20240112753A1 (en) Target-variant-reference panel for imputing target variants
US20230095961A1 (en) Graph reference genome and base-calling approach using imputed haplotypes
EP4544554A1 (en) Improved human leukocyte antigen (hla) genotyping
WO2025240241A1 (en) Modifying sequencing cycles during a sequencing run to meet customized coverage estimations for a target genomic region
US20250210141A1 (en) Enhanced mapping and alignment of nucleotide reads utilizing an improved haplotype data structure with allele-variant differences
US20240127906A1 (en) Detecting and correcting methylation values from methylation sequencing assays
US20240177802A1 (en) Accurately predicting variants from methylation sequencing data
WO2025006570A2 (en) Modifying sequencing cycles or imaging during a sequencing run to meet customized coverage estimation
US20250210137A1 (en) Directly determining signal-to-noise-ratio metrics for accelerated convergence in determining nucleotide-base calls and base-call quality
US20230420080A1 (en) Split-read alignment by intelligently identifying and scoring candidate split groups
US20230313271A1 (en) Machine-learning models for detecting and adjusting values for nucleotide methylation levels
US20230340571A1 (en) Machine-learning models for selecting oligonucleotide probes for array technologies
US20230410944A1 (en) Calibration sequences for nucelotide sequencing
WO2025184234A1 (en) A personalized haplotype database for improved mapping and alignment of nucleotide reads and improved genotype calling
WO2025006565A1 (en) Variant calling with methylation-level estimation
WO2024206848A1 (en) Tandem repeat genotyping
WO2025250996A2 (en) Call generation and recalibration models for implementing personalized diploid reference haplotypes in genotype calling
WO2025193747A1 (en) Machine-learning models for ordering and expediting sequencing tasks or corresponding nucleotide-sample slides
WO2025090883A1 (en) Detecting variants in nucleotide sequences based on haplotype diversity