[go: up one dir, main page]

WO2025024791A1 - Systems and methods for mobile-enabled targeted nanopore sequencing with mobile-enabled real-time fusion detection - Google Patents

Systems and methods for mobile-enabled targeted nanopore sequencing with mobile-enabled real-time fusion detection Download PDF

Info

Publication number
WO2025024791A1
WO2025024791A1 PCT/US2024/039820 US2024039820W WO2025024791A1 WO 2025024791 A1 WO2025024791 A1 WO 2025024791A1 US 2024039820 W US2024039820 W US 2024039820W WO 2025024791 A1 WO2025024791 A1 WO 2025024791A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing device
base pair
pair information
chunks
electrical signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/039820
Other languages
French (fr)
Inventor
Ka Yee Yeung-Rhee
Ling-Hong Hung
Jerald Radich
Shishir REDDY
Olga SALA-TORRA
Cecilia YEUNG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Washington
Fred Hutchinson Cancer Center
Original Assignee
University of Washington
Fred Hutchinson Cancer Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Washington, Fred Hutchinson Cancer Center filed Critical University of Washington
Publication of WO2025024791A1 publication Critical patent/WO2025024791A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/483Physical analysis of biological material
    • G01N33/487Physical analysis of biological material of liquid biological material
    • G01N33/48707Physical analysis of biological material of liquid biological material by electrical means
    • G01N33/48721Investigating individual macromolecules, e.g. by translocation through nanopores
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • This disclosure relates generally to nucleic acid sequencing, and in particular but not exclusively, relates to efficient processing of sequencing data to detect features including fusion genes or genetic variants.
  • A acute leukemia
  • AL cancer of the blood that has high mortality rates despite a plethora of available treatments.
  • the variation of treatment response and survival are largely based on distinct cytogenetic and molecular aberrations that characterize AL subtypes.
  • AL frequently presents with recurrent gene fusions that impact risk stratification and therapy choice.
  • a computer-implemented method of processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals is provided.
  • a computing device receives an electrical signal record generated by a flow cell.
  • the computing device breaks the electrical signal record into a plurality of chunks.
  • the computing device generates partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks.
  • the computing device generates an alignment for the partial base pair information.
  • the computing device executes a feature detection pipeline on the electrical signal record.
  • the computing device discards the electrical signal record.
  • a mobile computing device comprising at least one processor, a wireless communication interface, a display, and a non-transitory computer- readable medium.
  • the non-transitory computer-readable medium has computer-executable instructions stored thereon that, in response to execution by the at least one processor, cause the mobile computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals.
  • the actions comprise: receiving, by the mobile computing device via the wireless communication interface, an electrical signal record generated by a flow cell; breaking, by the mobile computing device, the electrical signal record into a plurality of chunks; generating, by the mobile computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the mobile computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the mobile computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library, discarding, by the mobile computing device, the electrical signal record.
  • a non-transitory computer-readable medium having computer-executable instructions stored thereon is provided.
  • the instructions in response to execution by one or more processors of a computing device, cause the computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the actions comprising: receiving, by the computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the
  • FIG. 1 is a schematic illustration of a system for nanopore-based analysis according to various aspects of the present disclosure.
  • FIG. 2 is a schematic illustration of a non-limiting example embodiment of a flow cell according to various aspects of the present disclosure.
  • FIG. 3 is a block diagram that illustrates aspects of a non-limiting example embodiment of an analysis computing device according to various aspects of the present disclosure.
  • FIG. 4 is a schematic illustration of a previously used, naive approach to processing nanopore signals that is incapable of providing adequate performance, particularly when executed using a mobile computing device for analysis.
  • FIG. 5 is a schematic illustration of a non-limiting example embodiment of a new approach to processing nanopore signals, according to various aspects of the present disclosure.
  • FIG. 6A - FIG. 6B are a flowchart that illustrates a non-limiting example embodiment of a method of processing nanopore signals to detect one or more predetermined features of a sequence library in a sample, according to various aspects of the present disclosure.
  • the present disclosure describes ultra-rapid techniques for processing sequencing data generated by nanopore sequencing devices to detect features such as fusion genes or genetic variants in a sample.
  • the fast turnaround time and utilization of portable, easily accessible hardware can assist oncologists and other physicians overcome time-related challenges to get patients on appropriate treatments faster.
  • FIG. 1 is a schematic illustration of a system for nanopore-based analysis according to various aspects of the present disclosure.
  • a sample 108 is obtained from a subject 102 using known techniques.
  • the sample 108 may be a tissue biopsy, a swab, a blood sample, or any other suitable type of sample 108.
  • the sample 108 is prepared (e.g., combined with one or more buffers, enzymes, etc.), and the prepared sample 108 is provided to a flow cell 104 of a sequencing device.
  • a sequencing device is a MinlON sequencing device provided by Oxford Nanopore Technologies pic.
  • Some non-limiting examples of devices for implementing a flow cell 104 are a Flongle Flow Cell, a MinlON Flow Cell, and the PromethlON Flow Cell, each also provided by Oxford Nanopore Technologies pic.
  • the flow cell 104 generates signals based on interactions between the sample 108 and the nanopores of the flow cell 104, and provides the signals to the analysis computing device 106 for analysis.
  • FIG. 2 is a schematic illustration of a non-limiting example embodiment of a flow cell according to various aspects of the present disclosure.
  • the flow cell 104 includes a sample well 204, a plurality of nanopores 202, a processor 206. and a communication interface 208.
  • the sample well 204 is configured to accept the sample 108 (e.g.. to receive drops of sample 108 from a pipette) and to provide the sample 108 to the plurality of nanopores 202.
  • the processor 206 is configured to control a voltage applied to the plurality of nanopores 202 and to read electrical signals generated by the nanopores 202.
  • the processor 206 may also be configured to generate and store records of the electrical signals generated by the nanopores 202, each record including electrical signals representing an interaction of a molecule with a nanopore 202 of the plurality of nanopores 202.
  • the communication interface 208 is configured to transmit the signals detected by the processor 206 and/or the records generated by the processor 206 such as the analysis computing device 106, using a wired or wireless network, a USB connection, or any other suitable communication technique.
  • the processor 206, communication interface 208, and potentially other components may be implemented on an ASIC or FPGA that is part of the flow cell 104.
  • FIG. 3 is a block diagram that illustrates aspects of a non-limiting example embodiment of an analysis computing device according to various aspects of the present disclosure.
  • the analysis computing device 106 is configured to receive records of electrical signals from a flow cell 104 and to analyze the signals to determine whether one or more predetermined features (e.g.. fusion genes or genetic variants) are present in the sample 108 represented by the electrical signals.
  • predetermined features e.g.. fusion genes or genetic variants
  • the illustrated analysis computing device 106 may be implemented by any computing device or collection of computing devices, including but not limited to a desktop computing device, a laptop computing device, a mobile computing device, a server computing device, a computing device of a cloud computing device, and/or combinations thereof.
  • a desktop computing device including but not limited to a desktop computing device, a laptop computing device, a mobile computing device, a server computing device, a computing device of a cloud computing device, and/or combinations thereof.
  • the techniques described herein are designed to be particularly efficient, such that the techniques may be successfully and efficiently executed using the reduced processing power available on mobile computing devices such as smartphones or tablet computing devices.
  • the analysis computing device 106 includes one or more processors 302, one or more communication interfaces 304, a library data store 308, and a computer- readable medium 306.
  • data store refers to any suitable device configured to store data for access by a computing device.
  • a data store is a highly reliable, highspeed relational database management system (DBMS) executing on one or more computing devices and accessible over a high-speed network.
  • DBMS relational database management system
  • Another example of a data store is a key-value store.
  • any other suitable storage technique and/or device capable of quickly and reliably providing the stored data in response to queries may be used, and the computing device may be accessible locally instead of over a network, or may be provided as a cloud-based service.
  • a data store may also include data stored in an organized manner on a computer-readable storage medium, such as a hard disk drive, a flash memory, RAM, ROM.
  • the library data store 308 is configured to store a sequence library that includes information representing one or more predetermined features that the analysis computing device 106 is configured to detect.
  • the predetermined features may include one or more fusion genes, genetic variants (e.g., single nucleotide polymorphisms, structural variants, duplications, deletions, insertions, etc.), or other features that are associated with a cancer type, a copy number variant, a base modification, or another characteristic.
  • the library data store 308 may be stored on a computer-readable medium within the analysis computing device 106. In other embodiments, the library data store 308 may be present on another device, and the components of the analysis computing device 106 may query 7 the library 7 data store 308 using a communication interface 208 when information from the sequence library is desired.
  • the processors 302 may include any suitable type of general-purpose computer processor.
  • the processors 302 may include one or more special-purpose computer processors or Al accelerators optimized for specific computing tasks, including but not limited to graphical processing units (GPUs), vision processing units (VPUs), and tensor processing units (TPUs).
  • GPUs graphical processing units
  • VPUs vision processing units
  • TPUs tensor processing units
  • the communication interfaces 304 include one or more hardware and or software interfaces suitable for providing communication links between components.
  • the communication interfaces 304 may support one or more wired communication technologies (including but not limited to Ethernet, FireWire, and USB), one or more wireless communication technologies (including but not limited to Wi-Fi, WiMAX, Bluetooth, mobile hotspot, 2G, 3G, 4G, 5G, and LTE), and/or combinations thereof.
  • the computer-readable medium 306 has stored thereon logic that, in response to execution by the one or more processors 302, cause the analysis computing device 106 to provide a signal gathering engine 310, a speed-optimized basecalling engine 312, an accuracy -optimized basecalling engine 314, and a feature detection pipeline engine 316.
  • a computer-readable medium refers to a removable or nonremovable device that implements any technology capable of storing information in a volatile or non-volatile manner to be read by a processor of a computing device, including but not limited to: a hard drive; a flash memory; a solid state drive; random-access memory (RAM); read-only memory (ROM); a CD-ROM, a DVD, or other disk storage; a magnetic cassette; a magnetic tape; and a magnetic disk storage.
  • a computer- readable medium may include one or more devices that provide a cloud storage bucket.
  • the signal gathering engine 310 is configured to retrieve files including electrical signal records from the flow cell 104. and to organize the electrical signal records for further processing.
  • the feature detection pipeline engine 316 is configured to perform various actions for determining whether base pair information indicates the presence of one or more features of the predetermined features in the sample.
  • the speed-optimized basecalling engine 312 is configured to determine base pair information from electrical signals in the electrical signal records using techniques that are optimized to quickly determine the base pair information, with a tradeoff of a lower accuracy rate.
  • the accuracy-optimized basecalling engine 314 is also configured to determine base pair information from electrical signals in the electrical signal records, but using techniques that are optimized to accurately determine the base pair information, with a tradeoff of slower operation if executed for the same amount of data as the speed-optimized basecalling engine 312.
  • both the speed-optimized basecalling engine 312 and the accuracy-optimized basecalling engine 314 may use similar techniques, such as deep neural networks, to generate base pair information, with adjustments to the techniques between the two engines in order to provide the tradeoffs of speed versus accuracy.
  • the speed-optimized basecalling engine 312 and the accuracy-optimized basecalling engine 314 may use deep neural networks having different architectures, such that the speed-optimized basecalling engine 312 uses a deep neural network having an architecture that operates faster than the deep neural network of the accuracy-optimized basecalling engine 314 by having fewer layers or that is otherwise smaller than the deep neural network for the accuracy -optimized basecalling engine 314.
  • the speed-optimized basecalling engine 312 may perform fewer evaluations of the model than the accuracy-optimized basecalling engine 314. may downsample the electrical signal records before processing, or may learn on smaller sets of pre-evaluated data in order to perform faster than the accuracy-optimized basecalling engine 314.
  • the speed-optimized basecalling engine 312 may use a first basecalling tool that is known to be fast with some sacrifice of accuracy, and the accuracy-optimized basecalling engine 314 may use a second basecalling tool that is know n to be slower but more accurate. Any suitable basecalling tools or techniques maybe used by the speed-optimized basecalling engine 312 and the accuracy-optimized basecalling engine 314, including but not limited to Guppy, Bonito, Nanocall, Chiron, or any other appropriate basecalling tool.
  • engine refers to logic embodied in hardware or software instructions, which can be written in one or more programming languages, including but not limited to C, C++. C#, COBOL, JAVATM, PHP, Perl, HTML, CSS, JavaScript, VBScript, ASPX, Go, and Python.
  • An engine may be compiled into executable programs or written in interpreted programming languages.
  • Software engines may be callable from other engines or from themselves.
  • the engines described herein refer to logical modules that can be merged with other engines, or can be divided into sub-engines.
  • the engines can be implemented by logic stored in any type of computer-readable medium or computer storage device and be stored on and executed by one or more general purpose computers, thus creating a special purpose computer configured to provide the engine or the functionality thereof.
  • the engines can be implemented by logic programmed into an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or another hardware device.
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • FIG. 4 is a schematic illustration of a previously used, naive approach to processing nanopore signals that is incapable of providing adequate performance, particularly when executed using a mobile computing device for analysis.
  • an electrical signal record 402 is obtained from the flow cell 104.
  • the electrical signal record 402 may represent electrical signals generated during a read of a molecule by a nanopore 202 of the flow cell 104, and may include information corresponding to 150,000 or more base pairs of the molecule.
  • base calling is performed on the entirety of the information in the electrical signal record 402, to generate base pair information 404 for the entire electrical signal record 402.
  • the electrical signal record 402 may be broken into chunks prior to performing base calling in order to allow parallel processing of the chunks, but in the technique illustrated in FIG. 4.
  • base pair information 404 for the entirety of the electrical signal record 402 is determined, whether or not the electrical signal record 402 is broken into chunks to facilitate the computation.
  • This base pair information 404 is then provided to a feature detection pipeline 406, which performs one or more actions for detecting predetermined features from the feature library 408 within the base pair information 404.
  • the actions may include, but are not limited to, one or more of alignment and detection of fusion genes, genetic variants, or other features.
  • the feature detection pipeline 406 then produces one or more detected features 410, if any were detected in the base pair information 404.
  • the amount of time and computing resources consumed by the feature detection pipeline 406 is dependent on the size of the base pair information 404 provided to it and the number of predetermined features in the feature library 408.
  • the base pair information 404 for the entirety of the electrical signal record 402 and all of the predetermined features in the feature library 408, the maximum amount of computing power and time is consumed for every received electrical signal record 402. This may on its own be enough to make this technique unsuitable for efficiently providing results in diagnostic settings. This problem is exacerbated even further when considering that more than one electrical signal record 402 will be processed: in the processing of a typical sample 108.
  • the flow cell 104 will generate on the order of thousands of electrical signal records 402 per hour. Executing basecalling and performing the actions of the feature detection pipeline 406 for the entirety of every electrical signal record 402 produced by the flow cell 104 for all of the features in the feature library 408 consumes an extraordinary amount of computing resources and takes an impractically long amount time, even when using high-powered computing hardw are. What is desired are techniques that reduce this computational burden enough to be able to efficiently detect features using even computing hardware with low-powered or otherwise limited computing resources, such as mobile computing devices.
  • FIG. 5 is a schematic illustration of a non-limiting example embodiment of a new approach to processing nanopore signals, according to various aspects of the present disclosure.
  • the approach illustrated in FIG. 5 introduces enough efficiency gain into the processing of the nanopore signals that the techniques can be successfully and efficiently executed on a mobile computing device.
  • an electrical signal record 502 is received, similar to FIG. 4. However, instead of basecalling the entire electrical signal record 502, the electrical signal record 502 is broken into a plurality of chunks 504, 506, 508, 510. Though four chunks are illustrated, the electrical signal record 502 is typically divided into a greater number of chunks. In some embodiments, the electrical signal record 502 may be divided into a predetermined number of chunks.
  • the electrical signal record 502 may be divided into a number of chunks in a range between 90 chunks and 110 chunks, such as one hundred chunks, so that an electrical signal record 502 that includes information related to about 150,000 base pairs is divided into chunks that each include information related to about 1 ,500 base pairs.
  • the electrical signal record 502 may be divided into chunks based on a chunk size, such that each chunk includes information related to a predetermined number of base pairs, and the number of chunks is based on the size of the electrical signal record 502.
  • the electrical signal record 502 may be divided based on a predetermined chunk size in a range between 900 base pairs and 1100 base pairs, such as 1000 base pairs, so that an electrical signal record 502 that includes information related to about 150,000 base pairs is divided into about 150 chunks.
  • the illustrated technique instead of basecalling the entire electrical signal record 502, the illustrated technique only performs basecalling on the first chunk 504, and thus generates partial base pair information 512 for the limited amount of data represented by the first chunk 504. Alignment 514 is then performed on this partial base pair information 512, either as part of a pipeline or separately, in order to align the partial base pair information 512 to the predetermined features of the feature library 516. Because the basecalling and alignment are performed over a much smaller amount of information, the amount of computing resources used is greatly reduced.
  • the alignment of the partial base pair information 512 indicates that the electrical signal record 502 is relevant to one or more of the predetermined features.
  • the predetermined features will be related to a limited number of portions of the genome, but the molecule used to generate the electrical signal record 502 may be from any portion of the genome.
  • the alignment 514 of the first chunk 504 indicates whether the electrical signal record 502 is likely to be associated with a portion of the genome relevant to any of the predetermined features in the feature library 516, or is instead not likely to include relevant information.
  • the alignment 514 indicates that the electrical signal record 502 is not likely to be associated with a relevant portion of the genome, then further processing of the electrical signal record 502 stops, and the electrical signal record 502 is discarded. It has been determined in testing that roughly 95% of all electrical signal records 502 can be discarded at this step, meaning that for 95% of the electrical signal records 502, the processing may be limited to basecalling and aligning information from a single chunk 504, thus dramatically reducing the amount of computing resources used.
  • the techniques illustrated in FIG. 5 optimize the pipelines in additional ways.
  • the electrical signal record 502 (or the chunks 504, 506, 508, 510 previously created) is provided to a speed-optimized pipeline 518 that uses the speed-optimized basecalling engine 312 to create base pair information for the entire electrical signal record 502, and then uses the feature detection pipeline engine 316 to determine whether any of the predetermined features in the feature library' 516 are detected in the base pair information.
  • the speed-optimized basecalling engine 312 uses various techniques, described in further detail below, for which a reduction in accuracy is accepted for a tradeoff in greater speed in producing results.
  • the result of the speed-optimized pipeline 518 is a set of candidate features 520, indicating predetermined features from the feature library 516 that were detected by the speed-optimized pipeline 518. If no predetermined features were detected by the speed- optimized pipeline 518, the technique may cease processing the base pair information, but if one or more candidate features 520 are detected, then the electrical signal record 502 is provided to an accuracy -optimized pipeline 522 that uses the accuracy-optimized basecalling engine 314 to re-create the base pair information for the entire electrical signal record 502, and then uses the feature detection pipeline engine 316 to determine whether the base pair information indicates the presence of the candidate features 520. These are then provided as one or more detected features 524.
  • the accuracy-optimized basecalling engine 314 uses various techniques, described in further detail below, for which an increase in processing time is accepted for a tradeoff in higher accuracy results. Even though the accuracy-optimized pipeline 522 may use more computing resources than the speed-optimized pipeline 518, the overall computing cost is nevertheless reduced by searching only for the candidate features 520 (instead of all predetermined features in the feature library 516), and by allowing both the alignment 514 and the speed-optimized pipeline 518 to filter out electrical signal records 502 that are unlikely to be successfully identified as including any predetermined features.
  • FIG. 6A - FIG. 6B are a flowchart that illustrates a non-limiting example embodiment of a method of processing nanopore signals to detect one or more predetermined features of a sequence library in a sample, according to various aspects of the present disclosure.
  • the method 600 is a non-limiting example embodiment of the techniques illustrated schematically in FIG. 5. By filtering out irrelevant electrical signal records using alignments of a small portion of the data, and by further filtering out irrelevant electrical signal records using a speed-optimized pipeline, the method 600 provides dramatic increases in the speed of the analysis, making the method 600 suitable for use even on low-powered computing hardware such as mobile computing devices.
  • the method 600 proceeds through a continuation terminal ("terminal A") to block 602, where a signal gathering engine 310 of an analysis computing device 106 checks a directory for a new file that includes a record of electrical signals generated by a flow cell 104.
  • the signal gathering engine 310 may connect to the flow cell 104 via a network using a communication interface 304, and check a file storage location within the flow cell 104.
  • the flow cell 104 may be connected to the analysis computing device 106 via a network and a communication interface 304, and may write the new file to a file storage location on the analysis computing device 106.
  • the signal gathering engine 310 may check the file storage location local to the analysis computing device 106 for the new file.
  • the signal gathering engine 310 may check a file storage location external to both the analy sis computing device 106 and the flow cell 104, where the flow cell 104 is configured to store new files.
  • the method 600 then proceeds to a decision block 604, where a decision is made based on whether a new file was found in the directory. If no new file was found, then the result of decision block 604 is NO, and the method 600 returns to block 602. In some embodiments, the signal gathering engine 310 may be configured to wait a predetermined amount of time before again performing the actions of block 602. Otherwise, if a new file was found, then the result of decision block 604 is YES. and the method 600 proceeds to block 606.
  • the actions of block 602 may be initiated by a notification, a callback, or another non-polling technique such that the actions of block 602 will be initiated in response to the new file being created.
  • the signal gathering engine 310 retrieves the new file from the directory.
  • the signal gathering engine 310 may retrieve the new file by copying it to a local storage location via a network using a communication interface 304.
  • the signal gathering engine 310 may copy the new' file from an initial directory to an in-process directory in order to indicate that the file has been retrieved.
  • the signal gathering engine 310 breaks the record of electrical signals generated by the flow cell 104 in the new- file into a plurality of chunks.
  • the signal gathering engine 310 may create separate files for each chunk of the plurality of chunks.
  • the signal gathering engine 310 may use pointers or other indexes into the new file to define the chunks.
  • the signal gathering engine 310 may create a first chunk at block 608, and may wait until further chunks are needed for processing before creating the other chunks.
  • a speed-optimized basecalling engine 312 of the analysis computing device 106 generates partial base pair information using a first chunk of the plurality of chunks.
  • the partial base pair information includes base pair information for the electrical signals represented by the first chunk, but not the remainder of the chunks of the plurality of chunks.
  • the first chunk is a chunk representing either a start or an end of the electrical signal record. In some embodiments, the first chunk may be any of the chunks in the plurality of chunks.
  • any suitable technique or tool may be used by the speed-optimized basecalling engine 312 to generate the partial base pair information, including but not limited to using a Bonito basecalling utility 7 , a Nanocall basecalling utility, or any other suitable basecalling utility or technique.
  • the accuracy -optimized basecalling engine 314 may be used to generate the partial base pair information for the first chunk, since the amount of information to be processed is relatively small, and so the longer computation time used by the accuracy-optimized basecalling engine 314 may be a fair trade-off for the increase in accuracy.
  • a feature detection pipeline engine 316 of the analysis computing device 106 generates an alignment for the partial base pair information.
  • the alignment is an attempt to find a position within the genome associated with the one or more predetermined features of the feature library within which the partial base pair information appears, or at least which the partial base pair information closely matches. Any suitable technique or tool may be used by the feature detection pipeline engine 316 for generating the alignment, including but not limited to one or more of Minimap2. QAlign, or PyPore.
  • a stand-alone or otherwise separate alignment engine may be used instead of providing the partial base pair information to the feature detection pipeline engine 316.
  • the method 600 then proceeds to a continuation terminal ("terminal B"). From terminal B (FIG. 6B), the method 600 proceeds to block 614, where the feature detection pipeline engine 316 compares the alignment for the partial base pair information to one or more predetermined features in a library data store 308.
  • the alignment generated at block 612 may indicate a location within an entire genome with which the partial base pair information is closely matched, and the comparison at block 614 may compare the indicated location to locations associated with the predetermined features.
  • the alignment generated at block 612 may indicate a location within one or more of the predetermined features with which the partial base pair information is closely matched, if any. and the comparison at block 614 may check whether a successful alignment within the predetermined features was determined or not.
  • the method 600 then proceeds to a decision block 616, where a decision is made based on whether the alignment of the partial base pair information indicates that the record of electrical signals generated by the flow cell 104 is relevant to the predetermined features. If the record of electrical signals generated by the flow cell 104 is not relevant to the predetermined features (e.g., an alignment of the partial base pair information to the predetermined features was not possible, or the alignment position for the partial base pair information was not associated with at least one of the predetermined features), then the result of decision block 616 is NO, and the method 600 proceeds to block 618, where the signal gathenng engine 310 discards the record of electrical signals, and then to a continuation terminal ("terminal C").
  • a decision is made based on whether the alignment of the partial base pair information indicates that the record of electrical signals generated by the flow cell 104 is relevant to the predetermined features. If the record of electrical signals generated by the flow cell 104 is not relevant to the predetermined features (e.g., an alignment of the partial base pair information to the predetermined features was
  • the speed-optimized basecalling engine 312 generates fast base pair information using a remainder of the chunks of the plurality 7 of chunks.
  • each of the chunks may be processed by the speed-optimized basecalling engine 312 separately, and two or more chunks may be processed concurrently in order to reduce the wall clock time for the overall operation.
  • the previously computed partial base pair information may be combined with the base pair information for the remainder of the chunks to create the fast base pair information.
  • the feature detection pipeline engine 316 of the analysis computing device 106 analyzes the fast base pair information to determine one or more candidate features of the predetermined features in the library 7 data store 308.
  • the feature detection pipeline engine 316 may use an alignment tool to determine whether the fast base pair information can be aligned to any of the predetermined features in the library data store 308, and if the fast base pair information can be aligned to a predetermined feature, that predetermined feature is determined to be a candidate feature.
  • the method 600 then proceeds to a decision block 624, where a decision is made based on whether the feature detection pipeline engine 316 found any candidate features. If no candidate features were found, then the result of decision block 624 is NO, and the method 600 proceeds to block 626, where the signal gathering engine 310 discards the electrical signal record, and then to a continuation terminal ("terminal C"). Otherwise, if one or more candidate features were found, then the result of decision block 624 is YES, and the method 600 proceeds to block 628. [0059] At block 628, the accuracy-optimized basecalling engine 314 generates accurate base pair information using the chunks of the plurality of chunks.
  • fast base pair information and “accurate base pair information” is for purposes of identifying which basecalling engine was used to generate the respective base pair information, in that the speed-optimized basecalling engine 312 generally produces base pair information faster than the accuracy-optimized basecalling engine 314, and the accuracy-optimized basecalling engine 314 generally produces base pair information that is more accurate than the speed-optimized basecalling engine 312. That said, in some embodiments, the accuracy of the fast base pair information may be similar to the accuracy of the accurate base pair information, and the speed of computation of the accurate base pair information may be similar to the speed of computation of the fast base pair information, despite the use of these terms.
  • the feature detection pipeline engine 316 analyzes the accurate base pair information and the one or more candidate features to determine one or more detected predetermined features in the sample. As with the processing at block 622, the feature detection pipeline engine 316 may use an alignment tool to determine whether the accurate base pair information can be aligned to any of the candidate features, and if so, then the candidate feature is determined to be a detected predetermined feature.
  • the method 600 then proceeds to terminal C, and from terminal C to a decision block 632. where a determination is made regarding whether the method 600 should terminate.
  • the method 600 may operate for a predetermined number of iterations, or for a predetermined length of time. In some embodiments, the method 600 may operate until a predetermined feature has been detected a predetermined number of times, thus confirming the presence of the predetermined feature in the sample. In some embodiments, the method 600 may operate until the flow cell 104 has not provided a new file for a threshold amount of time. In some embodiments, the method 600 may operate until manually terminated by an operator.
  • a non-limiting example embodiment of the systems and techniques described above was tested to identify fusion genes related to acute leukemia.
  • one BCR- ABL1 fusion was detected in a 20% dilution of a Ph+ cell line, after 2.07h of sequencing.
  • fusion genes are present in approximately 30% of acute myeloid leukemias and 30% of acute lymphoblastic leukemias. These fusion genes serve a role driving prognosis as well as treatment, for example the use of ATRA and arsenic in acute promyelocytic leukemia, and the addition of tyrosine kinase inhibitors in Philadelphia chromosome positive leukemia.
  • CRISPR-cas9 based enrichment of a small gene panel was performed for sequencing using the Oxford Nanopore Technologies MinlON MklC as a flow cell 104.
  • An non-limiting example embodiment of the method 600 was implemented using a mobile phone as the analysis computing device 106. Execution of this test demonstrated real-time data streaming from the MklC flow cell 104 to the mobile phone analysis computing device 106 allows simultaneous analyses while sequencing is occurring on the flow cell 104.
  • the tested embodiment demonstrated that diagnoses can be fast, inexpensive, and portable by reducing resources, sample requirements, and time to clinical diagnosis.
  • library preparation includes an initial dephosphorylation step that renders the 5’ ends of the DNA inaccessible to adapter ligation and is followed by the addition of directional, target-specific RNA guides complexed with tracrRNA and Cas9 enzyme to generate double strand DNA breaks on both ends of our region of interest.
  • the Cas9 complex remains bound to the 5’ end of the guide, and the resulting new DNA ends contain a phosphorylated 5 ’end that is available for dA tailing. This results in preferential adapter ligation to these new ends.
  • the libraries were sequenced on a MinlON flow cell version 9.4. (Oxford Nanopore Technologies, Oxford, UK) on an Mklc nanopore sequencer.
  • a real-time experiment was performed where electrical signal records were streamed directly from the sequencing device to the mobile phone for fusion detection, as described in method 600.
  • Mobile processing was carried out using an Ubuntu Linux virtual machine hosted on a Pixel 6 Pro Android phone. Data from the MklC during sequencing was transferred to the phone in real time using a local hotspot connection.
  • computational analyses including basecalling, alignment, and variant calling were completed using processors of the phone.
  • Reliable fusion detection is achieved when 3 fusion reads are detected.
  • the test can detect three fusions in 9 hours from sample receipt to reporting.
  • These turnaround times are significantly faster than any current fusion detection assays used in clinical laboratories: the fastest NGS platform currently available takes 48- 72 hours from sample to reporting, whereas typical NGS assays take 5-10 days.
  • the technology disclosed herein can be implemented using low-cost hardware in emergency rooms and oncology clinics to deliver fast diagnosis and delivery of personalized oncology treatment.
  • phenotype refers to an appearance of an organism based on a multifactorial combination of genetic traits and environmental factors; a tissue type (e g., heart tissue vs. adrenal tissue); an organism type (e.g., a strain of bacteria); or an expressed gene.
  • tissue type e g., heart tissue vs. adrenal tissue
  • organism type e.g., a strain of bacteria
  • nanopore refers to a pore of nanometer size used to generate ionic current changes in response to interactions with molecules present therein.
  • nucleic acid refers to a polymer of monomer units or "residues".
  • the monomer subunits, or residues, of the nucleic acids each contain a nitrogenous base (i.e., nucleobase) a five-carbon sugar, and a phosphate group.
  • the identity of each residue is ty pically indicated herein with reference to the identity 7 of the nucleobase (or nitrogenous base) structure of each residue.
  • Canonical nucleobases include adenine (A), guanine (G), thymine (T), uracil (U) (in RNA instead of thymine (T) residues) and cytosine (C).
  • nucleic acids of the present disclosure can include any modified nucleobase, nucleobase analogs, and/or non-canonical nucleobase, as are well-known in the art.
  • Modifications to the nucleic acid monomers, or residues encompass any chemical change in the structure of the nucleic acid monomer, or residue, that results in a noncanonical subunit structure. Such chemical changes can result from, for example, epigenetic modifications (such as to genomic DNA or RNA), or damage resulting from radiation, chemical, or other means.
  • noncanonical subunits which can result from a modification, include uracil (for DNA), 5- methylcytosine, 5- hydroxymethylcytosine. 5-formethylcytosine.
  • abasic lesion is a location along the deoxyribose backbone but lacking a base.
  • peptide nucleic acids PNAs
  • phosphorothioate DNA PNAs
  • the five- carbon sugar to which the nucleobases are attached can vary' depending on the ty pe of nucleic acid.
  • the sugar is deoxyribose in DNA and is ribose in RNA.
  • the nucleic acid residues can also be referred with respect to the nucleoside structure, such as adenosine, guanosine, 5-methyluridine, uridine, and cytidine.
  • alternative nomenclature for the nucleoside also includes indicating a "ribo" or deoxyrobo" prefix before the nucleobase to infer the ty pe of five-carbon sugar.
  • nucleic acid polymer can be or comprise a deoxyribonucleotide (DNA) polymer, a ribonucleotide (RNA) polymer.
  • the nucleic acids can also be or comprise a PNA polymer, or a combination of any of the polymer types described herein (e.g., contain residues with different sugars).
  • peptide refers to refers to natural biological or artificially manufactured short chains of amino acid monomers linked by peptide (amide) bonds. As used herein, a peptide has at least 2 amino acid repeating units.
  • polypeptide or protein refers to a polymer in which the monomers are amino acid residues that are joined together through amide bonds.
  • amino acids are alpha-amino acids
  • either the L-optical isomer or the D-optical isomer can be used, the L-isomers being preferred.
  • polypeptide or protein as used herein encompasses any amino acid sequence and includes modified sequences such as glycoproteins.
  • polypeptide is specifically intended to cover naturally occurring proteins, as well as those that are recombinantly or synthetically produced.
  • Protein can be any of various naturally occurring substances that consist of amino-acid residues joined by peptide bonds, contain the elements carbon, hydrogen, nitrogen, oxygen, usually sulfur, and occasionally other elements (such as phosphorus or iron), and include many essential biological compounds (such as enzymes, hormones, or antibodies).
  • tissue refers to an aggregate of similar cells and cell products forming a definite kind of structural material with a specific function, in a multicellular organism.
  • organ refers to a group of tissues in a living organism that have been adapted to perform a specific function.
  • Example 1 A computer-implemented method of processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the method comprising: receiving, by a computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library', discarding, by the computing device, the electrical signal record.
  • Example 2 The computer-implemented method of Example 1, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed-optimized basecalling engine; executing, by the computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality' of chunks using an accuracy -optimized basecalling engine; and executing, by the computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
  • Example 3 The computer-implemented method of Example 2, wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
  • Example 4 The computer-implemented method of any one of Examples 2-3, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
  • Example 5 The computer-implemented method of any one of Examples 1-4, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.
  • Example 6 The computer-implemented method of any one of Examples 1-5, wherein receiving the electrical signal record generated by the flow cell includes: monitoring a directory' for new' files that include electrical signal records generated by the flow cell; and in response to detecting a new file in the directory, retrieving the new file from the directory to receive the electrical signal record.
  • Example 7 The computer-implemented method of any one of Examples 1-6, wherein the one or more predetermined features include at least one of one or more sequence variants or one or more fusion genes.
  • Example 8 The computer-implemented method of any one of Examples 1-7, wherein the computing device is a mobile computing device.
  • Example 9. A mobile computing device, comprising: at least one processor; a wireless communication interface; a display; and a non-transitory computer-readable medium having computer-executable instructions stored thereon that, in response to execution by the at least one processor, cause the mobile computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the actions comprising: receiving, by the mobile computing device via the wireless communication interface, an electrical signal record generated by a flow cell; breaking, by the mobile computing device, the electrical signal record into a plurality of chunks; generating, by the mobile computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the mobile computing device, an alignment for the partial base pair information
  • Example 10 The mobile computing device of Example 9, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the mobile computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed-optimized basecalling engine; executing, by the mobile computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the mobile computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality of chunks using an accuracy- optimized basecalling engine; and executing, by the mobile computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
  • Example 11 The mobile computing device of Example 10, wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
  • Example 12 The mobile computing device of any one of Examples 10-11, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
  • Example 13 The mobile computing device of any one of Examples 9-12, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.
  • Example 14 The mobile computing device of any one of Examples 9-13, wherein receiving the electrical signal record generated by the flow cell includes: monitoring a directory' for new files that include electrical signal records generated by the flow cell; and in response to detecting a new file in the directory, retrieving the new file from the directory to receive the electrical signal record.
  • Example 15 The mobile computing device of any one of Examples 9-14, wherein the one or more predetermined features include at least one of one or more sequence variants or one or more fusion genes.
  • Example 16 A non-transitory computer-readable medium having computerexecutable instructions stored thereon that, in response to execution by one or more processors of a computing device, cause the computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the actions comprising: receiving, by the computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality’ of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one
  • Example 17 The non-transitory computer-readable medium of Example 16, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed-optimized basecalling engine; executing, by the computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality of chunks using an accuracy- optimized basecalling engine; and executing, by the computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
  • Example 18 The non-transitory computer-readable medium of Example 17. wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
  • Example 19 The non-transitory computer-readable medium of any one of Examples 17-18, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
  • Example 20 The non-transitory computer-readable medium of any one of Examples 16-19, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Epidemiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Nanotechnology (AREA)
  • Bioethics (AREA)
  • Hematology (AREA)
  • Urology & Nephrology (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Biochemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analyzing Materials By The Use Of Electric Means (AREA)

Abstract

In some embodiments, a computer-implemented method of processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals is provided. A computing device receives an electrical signal record generated by a flow cell, and breaks the electrical signal record into a plurality of chunks. The computing device generates partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks, and generates an alignment for the partial base pair information. In response to determining that the alignment corresponds to at least one predetermined feature in the feature library, the computing device executes a feature detection pipeline on the electrical signal record. In response to determining that the alignment does not correspond to at least one predetermined feature in the feature library, the computing device discards the electrical signal record.

Description

SYSTEMS AND METHODS FOR MOBILE-ENABLED TARGETED NANOPORE SEQUENCING WITH MOBILE-ENABLED REAL-TIME FUSION DETECTION
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of Provisional Application No. 63/529,329, filed July 27, 2023, the entire disclosure of which is hereby incorporated by reference herein for all purposes.
STATEMENT OF GOVERNMENT LICENSE RIGHTS
[0002] This invention was made with government support under CA280520 awarded by the National Institutes of Health. The government has certain rights in the invention.
TECHNICAL FIELD
[0003] This disclosure relates generally to nucleic acid sequencing, and in particular but not exclusively, relates to efficient processing of sequencing data to detect features including fusion genes or genetic variants.
BACKGROUND
[0004] Effective treatments for many conditions may be selected if an accurate genetic characterization of the condition can be obtained. For example, acute leukemia (AL) is cancer of the blood that has high mortality rates despite a plethora of available treatments. The variation of treatment response and survival are largely based on distinct cytogenetic and molecular aberrations that characterize AL subtypes. AL frequently presents with recurrent gene fusions that impact risk stratification and therapy choice.
[0005] Unfortunately, current fusion detection methods require a long turnaround time (7-10 days) or advanced knowledge of the genes involved in the fusions. Fast and cost- effective methods of detecting AL gene fusions, mutations, and other genetic features are desired to improve clinical outcomes for patients. SUMMARY
[0006] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
[0007] In some embodiments, a computer-implemented method of processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals is provided. A computing device receives an electrical signal record generated by a flow cell. The computing device breaks the electrical signal record into a plurality of chunks. The computing device generates partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks. The computing device generates an alignment for the partial base pair information. In response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, the computing device executes a feature detection pipeline on the electrical signal record. In response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library, the computing device discards the electrical signal record.
[0008] In some embodiments, a mobile computing device comprising at least one processor, a wireless communication interface, a display, and a non-transitory computer- readable medium is provided. The non-transitory computer-readable medium has computer-executable instructions stored thereon that, in response to execution by the at least one processor, cause the mobile computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals. The actions comprise: receiving, by the mobile computing device via the wireless communication interface, an electrical signal record generated by a flow cell; breaking, by the mobile computing device, the electrical signal record into a plurality of chunks; generating, by the mobile computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the mobile computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the mobile computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library, discarding, by the mobile computing device, the electrical signal record.
[0009] In some embodiments, a non-transitory computer-readable medium having computer-executable instructions stored thereon is provided. The instructions, in response to execution by one or more processors of a computing device, cause the computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the actions comprising: receiving, by the computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library, discarding, by the computing device, the electrical signal record. BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The foregoing aspects and many of the attendant advantages of this disclosure will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
[0011] FIG. 1 is a schematic illustration of a system for nanopore-based analysis according to various aspects of the present disclosure.
[0012] FIG. 2 is a schematic illustration of a non-limiting example embodiment of a flow cell according to various aspects of the present disclosure.
[0013] FIG. 3 is a block diagram that illustrates aspects of a non-limiting example embodiment of an analysis computing device according to various aspects of the present disclosure.
[0014] FIG. 4 is a schematic illustration of a previously used, naive approach to processing nanopore signals that is incapable of providing adequate performance, particularly when executed using a mobile computing device for analysis.
[0015] FIG. 5 is a schematic illustration of a non-limiting example embodiment of a new approach to processing nanopore signals, according to various aspects of the present disclosure.
[0016] FIG. 6A - FIG. 6B are a flowchart that illustrates a non-limiting example embodiment of a method of processing nanopore signals to detect one or more predetermined features of a sequence library in a sample, according to various aspects of the present disclosure.
DETAILED DESCRIPTION
[0017] The present disclosure describes ultra-rapid techniques for processing sequencing data generated by nanopore sequencing devices to detect features such as fusion genes or genetic variants in a sample. The fast turnaround time and utilization of portable, easily accessible hardware can assist oncologists and other physicians overcome time-related challenges to get patients on appropriate treatments faster.
[0018] FIG. 1 is a schematic illustration of a system for nanopore-based analysis according to various aspects of the present disclosure. As shown, in the system 100, a sample 108 is obtained from a subject 102 using known techniques. The sample 108 may be a tissue biopsy, a swab, a blood sample, or any other suitable type of sample 108. The sample 108 is prepared (e.g., combined with one or more buffers, enzymes, etc.), and the prepared sample 108 is provided to a flow cell 104 of a sequencing device. One nonlimiting example of a sequencing device is a MinlON sequencing device provided by Oxford Nanopore Technologies pic. Some non-limiting examples of devices for implementing a flow cell 104 are a Flongle Flow Cell, a MinlON Flow Cell, and the PromethlON Flow Cell, each also provided by Oxford Nanopore Technologies pic. The flow cell 104 generates signals based on interactions between the sample 108 and the nanopores of the flow cell 104, and provides the signals to the analysis computing device 106 for analysis.
[0019] FIG. 2 is a schematic illustration of a non-limiting example embodiment of a flow cell according to various aspects of the present disclosure. As shown, the flow cell 104 includes a sample well 204, a plurality of nanopores 202, a processor 206. and a communication interface 208. The sample well 204 is configured to accept the sample 108 (e.g.. to receive drops of sample 108 from a pipette) and to provide the sample 108 to the plurality of nanopores 202. The processor 206 is configured to control a voltage applied to the plurality of nanopores 202 and to read electrical signals generated by the nanopores 202. In some embodiments, the processor 206 may also be configured to generate and store records of the electrical signals generated by the nanopores 202, each record including electrical signals representing an interaction of a molecule with a nanopore 202 of the plurality of nanopores 202. In some embodiments, the communication interface 208 is configured to transmit the signals detected by the processor 206 and/or the records generated by the processor 206 such as the analysis computing device 106, using a wired or wireless network, a USB connection, or any other suitable communication technique. In some embodiments, the processor 206, communication interface 208, and potentially other components (such as a computer-readable medium for storing the records of the electrical signals and/or other information) may be implemented on an ASIC or FPGA that is part of the flow cell 104.
[0020] FIG. 3 is a block diagram that illustrates aspects of a non-limiting example embodiment of an analysis computing device according to various aspects of the present disclosure. The analysis computing device 106 is configured to receive records of electrical signals from a flow cell 104 and to analyze the signals to determine whether one or more predetermined features (e.g.. fusion genes or genetic variants) are present in the sample 108 represented by the electrical signals.
[0021] The illustrated analysis computing device 106 may be implemented by any computing device or collection of computing devices, including but not limited to a desktop computing device, a laptop computing device, a mobile computing device, a server computing device, a computing device of a cloud computing device, and/or combinations thereof. However, the techniques described herein are designed to be particularly efficient, such that the techniques may be successfully and efficiently executed using the reduced processing power available on mobile computing devices such as smartphones or tablet computing devices.
[0022] As shown, the analysis computing device 106 includes one or more processors 302, one or more communication interfaces 304, a library data store 308, and a computer- readable medium 306.
[0023] As used herein, "data store" refers to any suitable device configured to store data for access by a computing device. One example of a data store is a highly reliable, highspeed relational database management system (DBMS) executing on one or more computing devices and accessible over a high-speed network. Another example of a data store is a key-value store. However, any other suitable storage technique and/or device capable of quickly and reliably providing the stored data in response to queries may be used, and the computing device may be accessible locally instead of over a network, or may be provided as a cloud-based service. A data store may also include data stored in an organized manner on a computer-readable storage medium, such as a hard disk drive, a flash memory, RAM, ROM. or any other type of computer-readable storage medium. One of ordinary' skill in the art will recognize that separate data stores described herein may be combined into a single data store, and/or a single data store described herein may be separated into multiple data stores, without departing from the scope of the present disclosure.
[0024] In some embodiments, the library data store 308 is configured to store a sequence library that includes information representing one or more predetermined features that the analysis computing device 106 is configured to detect. The predetermined features may include one or more fusion genes, genetic variants (e.g., single nucleotide polymorphisms, structural variants, duplications, deletions, insertions, etc.), or other features that are associated with a cancer type, a copy number variant, a base modification, or another characteristic. In some embodiments, the library data store 308 may be stored on a computer-readable medium within the analysis computing device 106. In other embodiments, the library data store 308 may be present on another device, and the components of the analysis computing device 106 may query7 the library7 data store 308 using a communication interface 208 when information from the sequence library is desired.
[0025] In some embodiments, the processors 302 may include any suitable type of general-purpose computer processor. In some embodiments, the processors 302 may include one or more special-purpose computer processors or Al accelerators optimized for specific computing tasks, including but not limited to graphical processing units (GPUs), vision processing units (VPUs), and tensor processing units (TPUs).
[0026] In some embodiments, the communication interfaces 304 include one or more hardware and or software interfaces suitable for providing communication links between components. The communication interfaces 304 may support one or more wired communication technologies (including but not limited to Ethernet, FireWire, and USB), one or more wireless communication technologies (including but not limited to Wi-Fi, WiMAX, Bluetooth, mobile hotspot, 2G, 3G, 4G, 5G, and LTE), and/or combinations thereof.
[0027] As shown, the computer-readable medium 306 has stored thereon logic that, in response to execution by the one or more processors 302, cause the analysis computing device 106 to provide a signal gathering engine 310, a speed-optimized basecalling engine 312, an accuracy -optimized basecalling engine 314, and a feature detection pipeline engine 316.
[0028] As used herein, "computer-readable medium" refers to a removable or nonremovable device that implements any technology capable of storing information in a volatile or non-volatile manner to be read by a processor of a computing device, including but not limited to: a hard drive; a flash memory; a solid state drive; random-access memory (RAM); read-only memory (ROM); a CD-ROM, a DVD, or other disk storage; a magnetic cassette; a magnetic tape; and a magnetic disk storage. In some embodiments, a computer- readable medium may include one or more devices that provide a cloud storage bucket.
[0029] In some embodiments, the signal gathering engine 310 is configured to retrieve files including electrical signal records from the flow cell 104. and to organize the electrical signal records for further processing. In some embodiments, the feature detection pipeline engine 316 is configured to perform various actions for determining whether base pair information indicates the presence of one or more features of the predetermined features in the sample.
[0030] In some embodiments, the speed-optimized basecalling engine 312 is configured to determine base pair information from electrical signals in the electrical signal records using techniques that are optimized to quickly determine the base pair information, with a tradeoff of a lower accuracy rate. In some embodiments, the accuracy-optimized basecalling engine 314 is also configured to determine base pair information from electrical signals in the electrical signal records, but using techniques that are optimized to accurately determine the base pair information, with a tradeoff of slower operation if executed for the same amount of data as the speed-optimized basecalling engine 312.
[0031] In some embodiments, both the speed-optimized basecalling engine 312 and the accuracy-optimized basecalling engine 314 may use similar techniques, such as deep neural networks, to generate base pair information, with adjustments to the techniques between the two engines in order to provide the tradeoffs of speed versus accuracy. As one nonlimiting example, the speed-optimized basecalling engine 312 and the accuracy-optimized basecalling engine 314 may use deep neural networks having different architectures, such that the speed-optimized basecalling engine 312 uses a deep neural network having an architecture that operates faster than the deep neural network of the accuracy-optimized basecalling engine 314 by having fewer layers or that is otherwise smaller than the deep neural network for the accuracy -optimized basecalling engine 314. As another nonlimiting example, the speed-optimized basecalling engine 312 may perform fewer evaluations of the model than the accuracy-optimized basecalling engine 314. may downsample the electrical signal records before processing, or may learn on smaller sets of pre-evaluated data in order to perform faster than the accuracy-optimized basecalling engine 314. As another non-limiting example, the speed-optimized basecalling engine 312 may use a first basecalling tool that is known to be fast with some sacrifice of accuracy, and the accuracy-optimized basecalling engine 314 may use a second basecalling tool that is know n to be slower but more accurate. Any suitable basecalling tools or techniques maybe used by the speed-optimized basecalling engine 312 and the accuracy-optimized basecalling engine 314, including but not limited to Guppy, Bonito, Nanocall, Chiron, or any other appropriate basecalling tool.
[0032] Further description of the configuration of each of these components is provided below.
[0033] As used herein, "engine" refers to logic embodied in hardware or software instructions, which can be written in one or more programming languages, including but not limited to C, C++. C#, COBOL, JAVA™, PHP, Perl, HTML, CSS, JavaScript, VBScript, ASPX, Go, and Python. An engine may be compiled into executable programs or written in interpreted programming languages. Software engines may be callable from other engines or from themselves. Generally, the engines described herein refer to logical modules that can be merged with other engines, or can be divided into sub-engines. The engines can be implemented by logic stored in any type of computer-readable medium or computer storage device and be stored on and executed by one or more general purpose computers, thus creating a special purpose computer configured to provide the engine or the functionality thereof. The engines can be implemented by logic programmed into an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or another hardware device.
[0034] FIG. 4 is a schematic illustration of a previously used, naive approach to processing nanopore signals that is incapable of providing adequate performance, particularly when executed using a mobile computing device for analysis. In FIG. 4, an electrical signal record 402 is obtained from the flow cell 104. The electrical signal record 402 may represent electrical signals generated during a read of a molecule by a nanopore 202 of the flow cell 104, and may include information corresponding to 150,000 or more base pairs of the molecule.
[0035] Once the electrical signal record 402 is obtained, base calling is performed on the entirety of the information in the electrical signal record 402, to generate base pair information 404 for the entire electrical signal record 402. In some embodiments, the electrical signal record 402 may be broken into chunks prior to performing base calling in order to allow parallel processing of the chunks, but in the technique illustrated in FIG. 4. base pair information 404 for the entirety of the electrical signal record 402 is determined, whether or not the electrical signal record 402 is broken into chunks to facilitate the computation.
[0036] This base pair information 404 is then provided to a feature detection pipeline 406, which performs one or more actions for detecting predetermined features from the feature library 408 within the base pair information 404. The actions may include, but are not limited to, one or more of alignment and detection of fusion genes, genetic variants, or other features. The feature detection pipeline 406 then produces one or more detected features 410, if any were detected in the base pair information 404.
[0037] Unfortunately, while this approach is relatively simple and produces accurate results for a given electrical signal record 402, it is fairly slow and cumbersome. First, the amount of time and computing resources consumed by the feature detection pipeline 406 is dependent on the size of the base pair information 404 provided to it and the number of predetermined features in the feature library 408. By using the base pair information 404 for the entirety of the electrical signal record 402 and all of the predetermined features in the feature library 408, the maximum amount of computing power and time is consumed for every received electrical signal record 402. This may on its own be enough to make this technique unsuitable for efficiently providing results in diagnostic settings. This problem is exacerbated even further when considering that more than one electrical signal record 402 will be processed: in the processing of a typical sample 108. the flow cell 104 will generate on the order of thousands of electrical signal records 402 per hour. Executing basecalling and performing the actions of the feature detection pipeline 406 for the entirety of every electrical signal record 402 produced by the flow cell 104 for all of the features in the feature library 408 consumes an extraordinary amount of computing resources and takes an impractically long amount time, even when using high-powered computing hardw are. What is desired are techniques that reduce this computational burden enough to be able to efficiently detect features using even computing hardware with low-powered or otherwise limited computing resources, such as mobile computing devices.
[0038] FIG. 5 is a schematic illustration of a non-limiting example embodiment of a new approach to processing nanopore signals, according to various aspects of the present disclosure. The approach illustrated in FIG. 5 introduces enough efficiency gain into the processing of the nanopore signals that the techniques can be successfully and efficiently executed on a mobile computing device. [0039] As shown, an electrical signal record 502 is received, similar to FIG. 4. However, instead of basecalling the entire electrical signal record 502, the electrical signal record 502 is broken into a plurality of chunks 504, 506, 508, 510. Though four chunks are illustrated, the electrical signal record 502 is typically divided into a greater number of chunks. In some embodiments, the electrical signal record 502 may be divided into a predetermined number of chunks. As one non-limiting example, the electrical signal record 502 may be divided into a number of chunks in a range between 90 chunks and 110 chunks, such as one hundred chunks, so that an electrical signal record 502 that includes information related to about 150,000 base pairs is divided into chunks that each include information related to about 1 ,500 base pairs. As another example, the electrical signal record 502 may be divided into chunks based on a chunk size, such that each chunk includes information related to a predetermined number of base pairs, and the number of chunks is based on the size of the electrical signal record 502. As one non-limiting example, the electrical signal record 502 may be divided based on a predetermined chunk size in a range between 900 base pairs and 1100 base pairs, such as 1000 base pairs, so that an electrical signal record 502 that includes information related to about 150,000 base pairs is divided into about 150 chunks.
[0040] Once broken into chunks, instead of basecalling the entire electrical signal record 502, the illustrated technique only performs basecalling on the first chunk 504, and thus generates partial base pair information 512 for the limited amount of data represented by the first chunk 504. Alignment 514 is then performed on this partial base pair information 512, either as part of a pipeline or separately, in order to align the partial base pair information 512 to the predetermined features of the feature library 516. Because the basecalling and alignment are performed over a much smaller amount of information, the amount of computing resources used is greatly reduced.
[0041] At this point, it is determined whether or not the alignment of the partial base pair information 512 indicates that the electrical signal record 502 is relevant to one or more of the predetermined features. The predetermined features will be related to a limited number of portions of the genome, but the molecule used to generate the electrical signal record 502 may be from any portion of the genome. As such, the alignment 514 of the first chunk 504 indicates whether the electrical signal record 502 is likely to be associated with a portion of the genome relevant to any of the predetermined features in the feature library 516, or is instead not likely to include relevant information.
[0042] If the alignment 514 indicates that the electrical signal record 502 is not likely to be associated with a relevant portion of the genome, then further processing of the electrical signal record 502 stops, and the electrical signal record 502 is discarded. It has been determined in testing that roughly 95% of all electrical signal records 502 can be discarded at this step, meaning that for 95% of the electrical signal records 502, the processing may be limited to basecalling and aligning information from a single chunk 504, thus dramatically reducing the amount of computing resources used.
[0043] If the alignment 514 indicates that the electrical signal record 502 is likely to be associated with a relevant portion of the genome, then further processing is applied to the electrical signal record 502. While performing any type of processing pipeline is already optimized compared to the techniques illustrated in FIG. 4 by virtue of being able to ignore 95% of the input data, the techniques illustrated in FIG. 5 optimize the pipelines in additional ways. For example, the electrical signal record 502 (or the chunks 504, 506, 508, 510 previously created) is provided to a speed-optimized pipeline 518 that uses the speed-optimized basecalling engine 312 to create base pair information for the entire electrical signal record 502, and then uses the feature detection pipeline engine 316 to determine whether any of the predetermined features in the feature library' 516 are detected in the base pair information. The speed-optimized basecalling engine 312 uses various techniques, described in further detail below, for which a reduction in accuracy is accepted for a tradeoff in greater speed in producing results.
[0044] The result of the speed-optimized pipeline 518 is a set of candidate features 520, indicating predetermined features from the feature library 516 that were detected by the speed-optimized pipeline 518. If no predetermined features were detected by the speed- optimized pipeline 518, the technique may cease processing the base pair information, but if one or more candidate features 520 are detected, then the electrical signal record 502 is provided to an accuracy -optimized pipeline 522 that uses the accuracy-optimized basecalling engine 314 to re-create the base pair information for the entire electrical signal record 502, and then uses the feature detection pipeline engine 316 to determine whether the base pair information indicates the presence of the candidate features 520. These are then provided as one or more detected features 524.
[0045] The accuracy-optimized basecalling engine 314 uses various techniques, described in further detail below, for which an increase in processing time is accepted for a tradeoff in higher accuracy results. Even though the accuracy-optimized pipeline 522 may use more computing resources than the speed-optimized pipeline 518, the overall computing cost is nevertheless reduced by searching only for the candidate features 520 (instead of all predetermined features in the feature library 516), and by allowing both the alignment 514 and the speed-optimized pipeline 518 to filter out electrical signal records 502 that are unlikely to be successfully identified as including any predetermined features.
[0046] FIG. 6A - FIG. 6B are a flowchart that illustrates a non-limiting example embodiment of a method of processing nanopore signals to detect one or more predetermined features of a sequence library in a sample, according to various aspects of the present disclosure. The method 600 is a non-limiting example embodiment of the techniques illustrated schematically in FIG. 5. By filtering out irrelevant electrical signal records using alignments of a small portion of the data, and by further filtering out irrelevant electrical signal records using a speed-optimized pipeline, the method 600 provides dramatic increases in the speed of the analysis, making the method 600 suitable for use even on low-powered computing hardware such as mobile computing devices.
[0047] From a start block, the method 600 proceeds through a continuation terminal ("terminal A") to block 602, where a signal gathering engine 310 of an analysis computing device 106 checks a directory for a new file that includes a record of electrical signals generated by a flow cell 104. In some embodiments, the signal gathering engine 310 may connect to the flow cell 104 via a network using a communication interface 304, and check a file storage location within the flow cell 104. In some embodiments, the flow cell 104 may be connected to the analysis computing device 106 via a network and a communication interface 304, and may write the new file to a file storage location on the analysis computing device 106. such that the signal gathering engine 310 may check the file storage location local to the analysis computing device 106 for the new file. In some embodiments, the signal gathering engine 310 may check a file storage location external to both the analy sis computing device 106 and the flow cell 104, where the flow cell 104 is configured to store new files.
[0048] The method 600 then proceeds to a decision block 604, where a decision is made based on whether a new file was found in the directory. If no new file was found, then the result of decision block 604 is NO, and the method 600 returns to block 602. In some embodiments, the signal gathering engine 310 may be configured to wait a predetermined amount of time before again performing the actions of block 602. Otherwise, if a new file was found, then the result of decision block 604 is YES. and the method 600 proceeds to block 606. In some embodiments, instead of using the decision block 604 to loop back to block 602 when a new file is not found, the actions of block 602 may be initiated by a notification, a callback, or another non-polling technique such that the actions of block 602 will be initiated in response to the new file being created.
[0049] At block 606, the signal gathering engine 310 retrieves the new file from the directory. In some embodiments in which the new file is stored on another device, the signal gathering engine 310 may retrieve the new file by copying it to a local storage location via a network using a communication interface 304. In some embodiments in which the new file is already stored on the analysis computing device 106, the signal gathering engine 310 may copy the new' file from an initial directory to an in-process directory in order to indicate that the file has been retrieved.
[0050] At block 608, the signal gathering engine 310 breaks the record of electrical signals generated by the flow cell 104 in the new- file into a plurality of chunks. In some embodiments, the signal gathering engine 310 may create separate files for each chunk of the plurality of chunks. In some embodiments, the signal gathering engine 310 may use pointers or other indexes into the new file to define the chunks. In some embodiments, the signal gathering engine 310 may create a first chunk at block 608, and may wait until further chunks are needed for processing before creating the other chunks.
[0051] At block 610, a speed-optimized basecalling engine 312 of the analysis computing device 106 generates partial base pair information using a first chunk of the plurality of chunks. The partial base pair information includes base pair information for the electrical signals represented by the first chunk, but not the remainder of the chunks of the plurality of chunks. In some embodiments, the first chunk is a chunk representing either a start or an end of the electrical signal record. In some embodiments, the first chunk may be any of the chunks in the plurality of chunks. Any suitable technique or tool may be used by the speed-optimized basecalling engine 312 to generate the partial base pair information, including but not limited to using a Bonito basecalling utility7, a Nanocall basecalling utility, or any other suitable basecalling utility or technique. In some embodiments, instead of using the speed-optimized basecalling engine 312, the accuracy -optimized basecalling engine 314 may be used to generate the partial base pair information for the first chunk, since the amount of information to be processed is relatively small, and so the longer computation time used by the accuracy-optimized basecalling engine 314 may be a fair trade-off for the increase in accuracy.
[0052] At block 612, a feature detection pipeline engine 316 of the analysis computing device 106 generates an alignment for the partial base pair information. In some embodiments, the alignment is an attempt to find a position within the genome associated with the one or more predetermined features of the feature library within which the partial base pair information appears, or at least which the partial base pair information closely matches. Any suitable technique or tool may be used by the feature detection pipeline engine 316 for generating the alignment, including but not limited to one or more of Minimap2. QAlign, or PyPore. In some embodiments, instead of providing the partial base pair information to the feature detection pipeline engine 316, a stand-alone or otherwise separate alignment engine may be used.
[0053] The method 600 then proceeds to a continuation terminal ("terminal B"). From terminal B (FIG. 6B), the method 600 proceeds to block 614, where the feature detection pipeline engine 316 compares the alignment for the partial base pair information to one or more predetermined features in a library data store 308. In some embodiments, the alignment generated at block 612 may indicate a location within an entire genome with which the partial base pair information is closely matched, and the comparison at block 614 may compare the indicated location to locations associated with the predetermined features. In some embodiments, the alignment generated at block 612 may indicate a location within one or more of the predetermined features with which the partial base pair information is closely matched, if any. and the comparison at block 614 may check whether a successful alignment within the predetermined features was determined or not.
[0054] The method 600 then proceeds to a decision block 616, where a decision is made based on whether the alignment of the partial base pair information indicates that the record of electrical signals generated by the flow cell 104 is relevant to the predetermined features. If the record of electrical signals generated by the flow cell 104 is not relevant to the predetermined features (e.g., an alignment of the partial base pair information to the predetermined features was not possible, or the alignment position for the partial base pair information was not associated with at least one of the predetermined features), then the result of decision block 616 is NO, and the method 600 proceeds to block 618, where the signal gathenng engine 310 discards the record of electrical signals, and then to a continuation terminal ("terminal C"). As discussed above, in testing it was determined that about 95% of the electrical signal records may be discarded at this point, and since the basecalling and alignment to this point has been limited to one one-hundredth of the data of the electrical signal record, the amount of computing resources utilized is drastically reduced for this vast majority of the electrical signal records. [0055] Returning to decision block 616, if it is determined that the record of electrical signals generated by the flow cell 104 is relevant to the predetermined features (e.g., an alignment of the partial base pair information to a predetermined feature was possible, or the alignment position for the partial base pair information was associated with at least one of the predetermined features), then the result of decision block 616 is YES, and the method 600 proceeds to block 620.
[0056] At block 620. the speed-optimized basecalling engine 312 generates fast base pair information using a remainder of the chunks of the plurality7 of chunks. In some embodiments, each of the chunks may be processed by the speed-optimized basecalling engine 312 separately, and two or more chunks may be processed concurrently in order to reduce the wall clock time for the overall operation. In some embodiments, the previously computed partial base pair information may be combined with the base pair information for the remainder of the chunks to create the fast base pair information.
[0057] At block 622, the feature detection pipeline engine 316 of the analysis computing device 106 analyzes the fast base pair information to determine one or more candidate features of the predetermined features in the library7 data store 308. In some embodiments, the feature detection pipeline engine 316 may use an alignment tool to determine whether the fast base pair information can be aligned to any of the predetermined features in the library data store 308, and if the fast base pair information can be aligned to a predetermined feature, that predetermined feature is determined to be a candidate feature.
[0058] The method 600 then proceeds to a decision block 624, where a decision is made based on whether the feature detection pipeline engine 316 found any candidate features. If no candidate features were found, then the result of decision block 624 is NO, and the method 600 proceeds to block 626, where the signal gathering engine 310 discards the electrical signal record, and then to a continuation terminal ("terminal C"). Otherwise, if one or more candidate features were found, then the result of decision block 624 is YES, and the method 600 proceeds to block 628. [0059] At block 628, the accuracy-optimized basecalling engine 314 generates accurate base pair information using the chunks of the plurality of chunks. The use of the terms "fast base pair information” and “accurate base pair information” is for purposes of identifying which basecalling engine was used to generate the respective base pair information, in that the speed-optimized basecalling engine 312 generally produces base pair information faster than the accuracy-optimized basecalling engine 314, and the accuracy-optimized basecalling engine 314 generally produces base pair information that is more accurate than the speed-optimized basecalling engine 312. That said, in some embodiments, the accuracy of the fast base pair information may be similar to the accuracy of the accurate base pair information, and the speed of computation of the accurate base pair information may be similar to the speed of computation of the fast base pair information, despite the use of these terms.
[0060] At block 630, the feature detection pipeline engine 316 analyzes the accurate base pair information and the one or more candidate features to determine one or more detected predetermined features in the sample. As with the processing at block 622, the feature detection pipeline engine 316 may use an alignment tool to determine whether the accurate base pair information can be aligned to any of the candidate features, and if so, then the candidate feature is determined to be a detected predetermined feature.
[0061] The method 600 then proceeds to terminal C, and from terminal C to a decision block 632. where a determination is made regarding whether the method 600 should terminate. In some embodiments, the method 600 may operate for a predetermined number of iterations, or for a predetermined length of time. In some embodiments, the method 600 may operate until a predetermined feature has been detected a predetermined number of times, thus confirming the presence of the predetermined feature in the sample. In some embodiments, the method 600 may operate until the flow cell 104 has not provided a new file for a threshold amount of time. In some embodiments, the method 600 may operate until manually terminated by an operator. [0062] If it is determined that the method 600 should continue, then the result of decision block 632 is NO, and the method 600 returns to block 602 via a continuation terminal ("terminal A"). Otherwise, if it is determined that the method 600 should terminate, then the result of decision block 632 is YES. and the method 600 proceeds to an end block and terminates.
EXAMPLE
[0063] A non-limiting example embodiment of the systems and techniques described above was tested to identify fusion genes related to acute leukemia. In the test, one BCR- ABL1 fusion was detected in a 20% dilution of a Ph+ cell line, after 2.07h of sequencing.
[0064] Leukemias frequently present with recurrent fusions that impact risk stratification and therapy choice. Clinically defining or otherwise significant fusion genes are present in approximately 30% of acute myeloid leukemias and 30% of acute lymphoblastic leukemias. These fusion genes serve a role driving prognosis as well as treatment, for example the use of ATRA and arsenic in acute promyelocytic leukemia, and the addition of tyrosine kinase inhibitors in Philadelphia chromosome positive leukemia.
[0065] To generate electrical signal records for the ultrarapid and portable assay design described above, CRISPR-cas9 based enrichment of a small gene panel was performed for sequencing using the Oxford Nanopore Technologies MinlON MklC as a flow cell 104. An non-limiting example embodiment of the method 600 was implemented using a mobile phone as the analysis computing device 106. Execution of this test demonstrated real-time data streaming from the MklC flow cell 104 to the mobile phone analysis computing device 106 allows simultaneous analyses while sequencing is occurring on the flow cell 104. The tested embodiment demonstrated that diagnoses can be fast, inexpensive, and portable by reducing resources, sample requirements, and time to clinical diagnosis.
[0066] Given the ultrarapid and integrated sample-to-reporting workflow, we envision the ability to assist medical doctors such as oncologists in the management of patients with quick results allowing for a new paradigm of precision point of care diagnostics. [0067] In the test, a dilution of 20% KCL22 (a cell line with BCR-ABL1 fusion) in normal cells (GM 12878) DNA was extracted with Mag Attract HMW (Qiagen, Germantown, MD, USA) following standard protocol. cRNA guides were designed to direct Cas9 to cut in the genomic proximity of each of the genes involved in each one of the translocations studied. When the target region was large, guides were tiled across the region to maximize coverage. Guides were designed to capture PML-RARA, BCR-ABL1, inv(16), and KMT2A-AF4. 5000 ng of DNA were used as input.
[0068] Briefly, library preparation includes an initial dephosphorylation step that renders the 5’ ends of the DNA inaccessible to adapter ligation and is followed by the addition of directional, target-specific RNA guides complexed with tracrRNA and Cas9 enzyme to generate double strand DNA breaks on both ends of our region of interest. The Cas9 complex remains bound to the 5’ end of the guide, and the resulting new DNA ends contain a phosphorylated 5 ’end that is available for dA tailing. This results in preferential adapter ligation to these new ends. The libraries were sequenced on a MinlON flow cell version 9.4. (Oxford Nanopore Technologies, Oxford, UK) on an Mklc nanopore sequencer.
[0069] A real-time experiment was performed where electrical signal records were streamed directly from the sequencing device to the mobile phone for fusion detection, as described in method 600. Mobile processing was carried out using an Ubuntu Linux virtual machine hosted on a Pixel 6 Pro Android phone. Data from the MklC during sequencing was transferred to the phone in real time using a local hotspot connection. As discussed in the description of method 600, computational analyses including basecalling, alignment, and variant calling were completed using processors of the phone.
[0070] Data analysis steps (including basecalling, alignment, fusion detection, and visualization of fusion calls) were performed concurrently with the sequencing experiment to confirm if a patient sample has a fusion. The test demonstrated the feasibili ty of fast fusion detection using the ultrarapid CRISPR-cas9 enrichment-based library preparation protocol paired with the mobile-enabled data analysis pipeline described in method 600. The mobile phone pipeline processes reads on average at a rate of 83.63 reads/minute, including basecalling, alignment, and fusion detection. In the tested real-time analysis, the phone was able to find and call one BCR-ABL1 fusion after 2.07 hours.
[0071] Reliable fusion detection is achieved when 3 fusion reads are detected. In realtime experiments, the test can detect three fusions in 9 hours from sample receipt to reporting. These turnaround times are significantly faster than any current fusion detection assays used in clinical laboratories: the fastest NGS platform currently available takes 48- 72 hours from sample to reporting, whereas typical NGS assays take 5-10 days. Even with these extremely fast turnaround times, the technology disclosed herein can be implemented using low-cost hardware in emergency rooms and oncology clinics to deliver fast diagnosis and delivery of personalized oncology treatment.
[0072] The complete disclosure of all patents, patent applications, and publications, and electronically available material including, for instance, nucleotide sequence submission in, e.g., GenBank and RefSeq, and amino acid sequence submissions in, e.g.. SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq cited herein are incorporated by reference in their entirety. Supplementary materials referenced in publications (such as supplementary tables, supplementary' figures, supplementary materials and methods, and/or supplementary experimental data) are likewise incorporated by reference in their entirety'. In the event that any inconsistency exists between the disclosure of the present application and the disclosure(s) of any document incorporated herein by reference, the disclosure of the present application shall govern.
[0073] The foregoing detailed description and examples have been given for clarity of understanding only. No unnecessary limitations are to be understood therefrom. The disclosure is not limited to the exact details shown and described, for variations obvious to one skilled in the art will be included within the disclosure defined by the claims.
[0074] The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While the specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure.
[0075] Specific elements of any foregoing embodiments can be combined or substituted for elements in other embodiments. Moreover, the inclusion of specific elements in at least some of these embodiments may be optional, wherein further embodiments may include one or more embodiments that specifically exclude one or more of these specific elements. Furthermore, while advantages associated with certain embodiments of the disclosure have been described in the context of these embodiments, other embodiments may also exhibit such advantages, and not all embodiments need necessarily exhibit such advantages to fall within the scope of the disclosure.
[0076] As used herein and unless otherwise indicated, the terms “a” and “an” are taken to mean “one”, “at least one” or “one or more”. Unless otherwise required by context, singular terms used herein shall include pluralities and plural terms shall include the singular.
[0077] Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein,” “above.” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application.
[0078] Unless otherwise indicated, all numbers expressing quantities of components, molecular weights, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about." Accordingly, unless otherwise indicated to the contrary7, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the present disclosure. At the very least, and not as an attempt to limit the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques.
[0079] Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. All numerical values, however, inherently contain a range necessarily resulting from the standard deviation found in their respective testing measurements.
[0080] All headings are for the convenience of the reader and should not be used to limit the meaning of the text that follows the heading, unless so specified.
[0081] All of the references cited herein are incorporated by reference. Aspects of the disclosure can be modified, if necessary, to employ the systems, functions, and concepts of the above references and application to provide yet further embodiments of the disclosure. These and other changes can be made to the disclosure in light of the detailed description.
[0082] It will be appreciated that, although specific embodiments of the disclosure have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the disclosure. Accordingly, the disclosure is not limited except as by the claims.
[0083] As used herein, “phenotype’' refers to an appearance of an organism based on a multifactorial combination of genetic traits and environmental factors; a tissue type (e g., heart tissue vs. adrenal tissue); an organism type (e.g., a strain of bacteria); or an expressed gene.
[0084] As used herein, “nanopore” refers to a pore of nanometer size used to generate ionic current changes in response to interactions with molecules present therein.
[0085] As used herein, “nucleic acid” refers to a polymer of monomer units or "residues". The monomer subunits, or residues, of the nucleic acids each contain a nitrogenous base (i.e., nucleobase) a five-carbon sugar, and a phosphate group. The identity of each residue is ty pically indicated herein with reference to the identity7 of the nucleobase (or nitrogenous base) structure of each residue. Canonical nucleobases include adenine (A), guanine (G), thymine (T), uracil (U) (in RNA instead of thymine (T) residues) and cytosine (C). However, the nucleic acids of the present disclosure can include any modified nucleobase, nucleobase analogs, and/or non-canonical nucleobase, as are well-known in the art. Modifications to the nucleic acid monomers, or residues, encompass any chemical change in the structure of the nucleic acid monomer, or residue, that results in a noncanonical subunit structure. Such chemical changes can result from, for example, epigenetic modifications (such as to genomic DNA or RNA), or damage resulting from radiation, chemical, or other means. Illustrative and nonlimiting examples of noncanonical subunits, which can result from a modification, include uracil (for DNA), 5- methylcytosine, 5- hydroxymethylcytosine. 5-formethylcytosine. 5-carboxycytosine b-glucosyl-5- hydroxymethylcytosine, 8-oxoguanine, 2-amino-adenosine, 2-amino-deoxyadenosine, 2- thiothymidine, pyrrolo-pyrimidine, 2-thiocytidine, or an abasic lesion. An abasic lesion is a location along the deoxyribose backbone but lacking a base. Known analogs of natural nucleotides hybridize to nucleic acids in a manner similar to naturally occurring nucleotides, such as peptide nucleic acids (PNAs) and phosphorothioate DNA. The five- carbon sugar to which the nucleobases are attached can vary' depending on the ty pe of nucleic acid. For example, the sugar is deoxyribose in DNA and is ribose in RNA. In some instances herein, the nucleic acid residues can also be referred with respect to the nucleoside structure, such as adenosine, guanosine, 5-methyluridine, uridine, and cytidine. Moreover, alternative nomenclature for the nucleoside also includes indicating a "ribo" or deoxyrobo" prefix before the nucleobase to infer the ty pe of five-carbon sugar. For example, "ribocytosine" as occasionally used herein is equivalent to a cytidine residue because it indicates the presence of a ribose sugar in the RNA molecule at that residue. A nucleic acid polymer can be or comprise a deoxyribonucleotide (DNA) polymer, a ribonucleotide (RNA) polymer. The nucleic acids can also be or comprise a PNA polymer, or a combination of any of the polymer types described herein (e.g., contain residues with different sugars). [0086] As used herein, "peptide" refers to refers to natural biological or artificially manufactured short chains of amino acid monomers linked by peptide (amide) bonds. As used herein, a peptide has at least 2 amino acid repeating units.
[0087] As used herein, '‘polypeptide” or “protein” refers to a polymer in which the monomers are amino acid residues that are joined together through amide bonds. When the amino acids are alpha-amino acids, either the L-optical isomer or the D-optical isomer can be used, the L-isomers being preferred. The term polypeptide or protein as used herein encompasses any amino acid sequence and includes modified sequences such as glycoproteins. The term polypeptide is specifically intended to cover naturally occurring proteins, as well as those that are recombinantly or synthetically produced. “Protein” can be any of various naturally occurring substances that consist of amino-acid residues joined by peptide bonds, contain the elements carbon, hydrogen, nitrogen, oxygen, usually sulfur, and occasionally other elements (such as phosphorus or iron), and include many essential biological compounds (such as enzymes, hormones, or antibodies).
[0088] As used herein, “tissue” refers to an aggregate of similar cells and cell products forming a definite kind of structural material with a specific function, in a multicellular organism.
[0089] As used herein, “organ” refers to a group of tissues in a living organism that have been adapted to perform a specific function.
EXAMPLE EMBODIMENTS
[0090] The following numbered paragraphs illustrate non-limiting example embodiments of the disclosed subject matter.
[0091] Example 1. A computer-implemented method of processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the method comprising: receiving, by a computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library', discarding, by the computing device, the electrical signal record.
[0092] Example 2. The computer-implemented method of Example 1, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed-optimized basecalling engine; executing, by the computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality' of chunks using an accuracy -optimized basecalling engine; and executing, by the computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
[0093] Example 3. The computer-implemented method of Example 2, wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
[0094] Example 4. The computer-implemented method of any one of Examples 2-3, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
[0095] Example 5. The computer-implemented method of any one of Examples 1-4, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.
[0096] Example 6. The computer-implemented method of any one of Examples 1-5, wherein receiving the electrical signal record generated by the flow cell includes: monitoring a directory' for new' files that include electrical signal records generated by the flow cell; and in response to detecting a new file in the directory, retrieving the new file from the directory to receive the electrical signal record.
[0097] Example 7. The computer-implemented method of any one of Examples 1-6, wherein the one or more predetermined features include at least one of one or more sequence variants or one or more fusion genes.
[0098] Example 8. The computer-implemented method of any one of Examples 1-7, wherein the computing device is a mobile computing device. [0099] Example 9. A mobile computing device, comprising: at least one processor; a wireless communication interface; a display; and a non-transitory computer-readable medium having computer-executable instructions stored thereon that, in response to execution by the at least one processor, cause the mobile computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the actions comprising: receiving, by the mobile computing device via the wireless communication interface, an electrical signal record generated by a flow cell; breaking, by the mobile computing device, the electrical signal record into a plurality of chunks; generating, by the mobile computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the mobile computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the mobile computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library, discarding, by the mobile computing device, the electrical signal record.
[0100] Example 10. The mobile computing device of Example 9, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the mobile computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed-optimized basecalling engine; executing, by the mobile computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the mobile computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality of chunks using an accuracy- optimized basecalling engine; and executing, by the mobile computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
[0101] Example 11. The mobile computing device of Example 10, wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
[0102] Example 12. The mobile computing device of any one of Examples 10-11, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
[0103] Example 13. The mobile computing device of any one of Examples 9-12, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.
[0104] Example 14. The mobile computing device of any one of Examples 9-13, wherein receiving the electrical signal record generated by the flow cell includes: monitoring a directory' for new files that include electrical signal records generated by the flow cell; and in response to detecting a new file in the directory, retrieving the new file from the directory to receive the electrical signal record.
[0105] Example 15. The mobile computing device of any one of Examples 9-14, wherein the one or more predetermined features include at least one of one or more sequence variants or one or more fusion genes.
[0106] Example 16. A non-transitory computer-readable medium having computerexecutable instructions stored thereon that, in response to execution by one or more processors of a computing device, cause the computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the actions comprising: receiving, by the computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality’ of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library, discarding, by the computing device, the electrical signal record.
[0107] Example 17. The non-transitory computer-readable medium of Example 16, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed-optimized basecalling engine; executing, by the computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality of chunks using an accuracy- optimized basecalling engine; and executing, by the computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
[0108] Example 18. The non-transitory computer-readable medium of Example 17. wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
[0109] Example 19. The non-transitory computer-readable medium of any one of Examples 17-18, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
[0110] Example 20. The non-transitory computer-readable medium of any one of Examples 16-19, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.

Claims

CLAIMS The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. A computer-implemented method of processing nanopore signals to detect one or more predetermined features of a feature library’ in a sample used to generate the nanopore signals, the method comprising: receiving, by a computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality' of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library’, discarding, by the computing device, the electrical signal record.
2. The computer-implemented method of claim 1, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed- optimized basecalling engine; executing, by the computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality of chunks using an accuracy - optimized basecalling engine; and executing, by the computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
3. The computer-implemented method of claim 2, wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library' includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
4. The computer-implemented method of claim 2, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
5. The computer-implemented method of claim 1, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.
6. The computer-implemented method of claim 1 , wherein receiving the electrical signal record generated by the flow cell includes: monitoring a directory for new files that include electrical signal records generated by the flow cell; and in response to detecting a new file in the directory', retrieving the new file from the directory to receive the electrical signal record.
7. The computer-implemented method of claim 1, wherein the one or more predetermined features include at least one of one or more sequence variants or one or more fusion genes.
8. The computer-implemented method of claim 1. wherein the computing device is a mobile computing device.
9. A mobile computing device, comprising: at least one processor; a wireless communication interface; a display; and a non-transitoiy computer-readable medium having computer-executable instructions stored thereon that, in response to execution by the at least one processor, cause the mobile computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library' in a sample used to generate the nanopore signals, the actions comprising: receiving, by the mobile computing device via the wireless communication interface, an electrical signal record generated by a flow cell; breaking, by the mobile computing device, the electrical signal record into a plurality7 of chunks; generating, by the mobile computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the mobile computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the mobile computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library, discarding, by the mobile computing device, the electrical signal record.
10. The mobile computing device of claim 9, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the mobile computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed-optimized basecalling engine; executing, by the mobile computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library; and in response to determining the one or more candidate features: generating, by the mobile computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality of chunks using an accuracy -optimized basecalling engine; and executing, by the mobile computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
11. The mobile computing device of claim 10, wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature library includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
12. The mobile computing device of claim 10, wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
13. The mobile computing device of claim 9, wherein breaking the electrical signal record into the plurality' of chunks includes breaking the electrical signal record into at least one hundred chunks.
14. The mobile computing device of claim 9, wherein receiving the electrical signal record generated by the flow cell includes: monitoring a directory for new files that include electrical signal records generated by the flow cell; and in response to detecting a new file in the directory, retrieving the new file from the directory to receive the electrical signal record.
15. The mobile computing device of claim 9, wherein the one or more predetermined features include at least one of one or more sequence variants or one or more fusion genes.
16. A non-transitory computer-readable medium having computer-executable instructions stored thereon that, in response to execution by one or more processors of a computing device, cause the computing device to perform actions for processing nanopore signals to detect one or more predetermined features of a feature library in a sample used to generate the nanopore signals, the actions comprising: receiving, by the computing device, an electrical signal record generated by a flow cell; breaking, by the computing device, the electrical signal record into a plurality of chunks; generating, by the computing device, partial base pair information by basecalling electrical signals of a first chunk of the plurality of chunks without basecalling electrical signals of a remainder of the chunks of the plurality of chunks; generating, by the computing device, an alignment for the partial base pair information; in response to determining that the alignment corresponds to at least one of the one or more predetermined features in the feature library, executing, by the computing device, a feature detection pipeline on the electrical signal record; and in response to determining that the alignment does not correspond to at least one of the one or more predetermined features in the feature library', discarding, by the computing device, the electrical signal record.
17. The non-transitory computer-readable medium of claim 16, wherein executing the feature detection pipeline on the electrical signal record includes: generating, by the computing device, fast base pair information by basecalling electrical signals of the remainder of the chunks of the plurality of chunks using a speed- optimized basecalling engine; executing, by the computing device, a feature detection pipeline on the fast base pair information to determine one or more candidate features of the predetermined features of the feature library: and in response to determining the one or more candidate features: generating, by the computing device, accurate base pair information by basecalling the electrical signals of the chunks of the plurality of chunks using an accuracy- optimized basecalling engine; and executing, by the computing device, the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample.
18. The non-transitory computer-readable medium of claim 17, wherein executing the feature detection pipeline on the fast base pair information to determine the one or more candidate features of the predetermined features of the feature I i bran includes: generating an alignment for the fast base pair information; and comparing the alignment for the fast base pair information to locations associated with the predetermined features of the feature library.
19. The non-transitory computer-readable medium of claim 17. wherein executing the feature detection pipeline on the accurate base pair information and the one or more candidate features to determine the detected one or more predetermined features in the sample includes: generating an alignment for the accurate base pair information; and comparing the alignment for the accurate base pair information to locations associated with the one or more candidate features.
20. The non-transitory computer-readable medium of claim 16, wherein breaking the electrical signal record into the plurality of chunks includes breaking the electrical signal record into at least one hundred chunks.
PCT/US2024/039820 2023-07-27 2024-07-26 Systems and methods for mobile-enabled targeted nanopore sequencing with mobile-enabled real-time fusion detection Pending WO2025024791A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363529329P 2023-07-27 2023-07-27
US63/529,329 2023-07-27

Publications (1)

Publication Number Publication Date
WO2025024791A1 true WO2025024791A1 (en) 2025-01-30

Family

ID=94375439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/039820 Pending WO2025024791A1 (en) 2023-07-27 2024-07-26 Systems and methods for mobile-enabled targeted nanopore sequencing with mobile-enabled real-time fusion detection

Country Status (1)

Country Link
WO (1) WO2025024791A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170322137A1 (en) * 2016-05-06 2017-11-09 Deutsches Rheuma-Forschungszentrum Berlin Method and system for characterizing particles using a flow cytometer
US20200176082A1 (en) * 2018-11-28 2020-06-04 Oxford Nanopore Technologies Ltd. Analysis of nanopore signal using a machine-learning technique
CN111206080B (en) * 2020-04-16 2020-08-28 元码基因科技(北京)股份有限公司 Method for detecting fragmented nucleic acid mutation and methylation based on nanopore sequencing
US20200377936A1 (en) * 2012-04-19 2020-12-03 University Of Washington Through Its Center For Commercialization Methods and compositions for generating reference maps for nanopore-based polymer analysis
US20210116436A1 (en) * 2019-05-31 2021-04-22 Illumina, Inc. Obtaining information from a biological sample in a flow cell
US20220170087A1 (en) * 2019-03-18 2022-06-02 Steve Tung Method of DNA Base-Calling from a Nanochannel DNA Sequencer
US20230221296A1 (en) * 2020-04-13 2023-07-13 Nanjing University Nanopore single-molecule protein sequencer

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200377936A1 (en) * 2012-04-19 2020-12-03 University Of Washington Through Its Center For Commercialization Methods and compositions for generating reference maps for nanopore-based polymer analysis
US20170322137A1 (en) * 2016-05-06 2017-11-09 Deutsches Rheuma-Forschungszentrum Berlin Method and system for characterizing particles using a flow cytometer
US20200176082A1 (en) * 2018-11-28 2020-06-04 Oxford Nanopore Technologies Ltd. Analysis of nanopore signal using a machine-learning technique
US20220170087A1 (en) * 2019-03-18 2022-06-02 Steve Tung Method of DNA Base-Calling from a Nanochannel DNA Sequencer
US20210116436A1 (en) * 2019-05-31 2021-04-22 Illumina, Inc. Obtaining information from a biological sample in a flow cell
US20230221296A1 (en) * 2020-04-13 2023-07-13 Nanjing University Nanopore single-molecule protein sequencer
CN111206080B (en) * 2020-04-16 2020-08-28 元码基因科技(北京)股份有限公司 Method for detecting fragmented nucleic acid mutation and methylation based on nanopore sequencing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARTIN SAMUEL, HEAVENS DARREN, LAN YUXUAN, HORSFIELD SAMUEL, CLARK MATTHEW D., LEGGETT RICHARD M.: "Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples", GENOME BIOLOGY, vol. 23, no. 1, 1 December 2022 (2022-12-01), UK, pages 11 - 11-27, XP093269553, ISSN: 1474-760X, DOI: 10.1186/s13059-021-02582-x *
WEN ET AL.: "A guide to signal processing algorithms for nanopore sensors", ACS SENSORS, vol. 6, no. 10, 2021, pages 3536 - 3555, XP055861844, [retrieved on 20240915], DOI: https://pubs.acs.org/ doi/pdf/10.1021/acssensors.1c01618 *

Similar Documents

Publication Publication Date Title
Ucar et al. The chromatin accessibility signature of human immune aging stems from CD8+ T cells
Schubert et al. Perturbation-response genes reveal signaling footprints in cancer gene expression
Campana Minimal residual disease monitoring in childhood acute lymphoblastic leukemia
Pugh et al. VisCap: inference and visualization of germ-line copy-number variants from targeted clinical sequencing data
Yu et al. A resource for cell line authentication, annotation and quality control
US11615864B2 (en) Accurate and sensitive unveiling of chimeric biomolecule sequences and applications thereof
CN110178184B (en) Oncogenic splice variant determination
CN106282320B (en) Methods and devices for detecting somatic mutations
Ge et al. Liquid biopsy: Comprehensive overview of circulating tumor DNA
Morandini et al. ATAC-clock: An aging clock based on chromatin accessibility
Yao et al. Identification of novel recurrent STAT3-RARA fusions in acute promyelocytic leukemia lacking t (15; 17)(q22; q12)/PML-RARA
CN115132276A (en) Solid tumor mutant gene detection and analysis method and system
Moore et al. Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
Sala-Torra et al. Rapid detection of myeloid neoplasm fusions using single-molecule long-read sequencing
WO2024086499A1 (en) Systems and methods for detecting fusion genes from sequencing data
WO2025024791A1 (en) Systems and methods for mobile-enabled targeted nanopore sequencing with mobile-enabled real-time fusion detection
US20220068434A1 (en) Monitoring mutations using prior knowledge of variants
Jensen et al. Noninvasive detection of a balanced fetal translocation from maternal plasma
Simpson et al. Lifting the curse from high-dimensional data: automated projection pursuit clustering for a variety of biological data modalities
US20200143905A1 (en) Methods and compositions for germline variant detection
Najjar et al. Prediction of alternative pre-mRNA splicing outcomes
Argyropoulos et al. Mining microarray data to identify transcription factors expressed in naive resting but not activated T lymphocytes
Benton et al. A mind map for managing minimal residual disease in acute myeloid leukemia
KR20230132768A (en) Cancer diagnosis and classification by non-human metagenomic pathway analysis
Chen et al. Advancing MRD Detection in Multiple Myeloma: Technologies, Applications, and Future Perspectives

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24846577

Country of ref document: EP

Kind code of ref document: A1