[go: up one dir, main page]

WO2025238370A1 - Séquençage de polynucléotides amélioré - Google Patents

Séquençage de polynucléotides amélioré

Info

Publication number
WO2025238370A1
WO2025238370A1 PCT/GB2025/051060 GB2025051060W WO2025238370A1 WO 2025238370 A1 WO2025238370 A1 WO 2025238370A1 GB 2025051060 W GB2025051060 W GB 2025051060W WO 2025238370 A1 WO2025238370 A1 WO 2025238370A1
Authority
WO
WIPO (PCT)
Prior art keywords
pair
read sequence
polynucleotide
measurement signal
segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/GB2025/051060
Other languages
English (en)
Inventor
Michael Vella
Katherine Ruth LAWRENCE
Samuel George DAVIS
Filipe John Lopes TOSTEVIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oxford Nanopore Technologies PLC
Original Assignee
Oxford Nanopore Technologies PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oxford Nanopore Technologies PLC filed Critical Oxford Nanopore Technologies PLC
Publication of WO2025238370A1 publication Critical patent/WO2025238370A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present disclosure relates to sequencing polynucleotides using machine learning models.
  • the disclosure has particular, but not exclusive, relevance to sequencing DNA or RNA.
  • Gene sequencing or the determination of the order of nucleotides in plant or animal genes, is an indispensable tool for genomic research, forensic biology, virology, clinical diagnostics, and other technical fields involving genomic analysis.
  • estimation of mutated or modified DNA or RNA sequences can help diagnose diseases such as cancer and guide treatment frameworks as well as drug development. Accurate estimation of the nucleotide sequence in the molecules that make up genes is therefore a valuable goal.
  • Genes are made up of polynucleotides such as DNA that has a double-stranded structure with each strand containing nucleobases coupled to complementary bases on the other strand.
  • Nanopore sensors are commercially available, such as the MinlONTM device sold by Oxford Nanopore Technologies Ltd, comprising an array of nanopores integrated with an electronic chip.
  • a nanopore sequencing sample preparation may involve fragmenting doublestranded polynucleotides into shorter fragments.
  • one of the two complementary strands of the polynucleotide is translocated through the nanopore to sequence the polynucleotide.
  • the strands become separated, and once the template strand is translocated, the complementary strand is free to diffuse away from the nanopore.
  • the complementary stand enters the nanopore immediately following passage of the first stand with the possibility that it does not enter the pore at all in the duration of the sequencing process.
  • the relevant information on the polynucleotide sequence is encoded in both strands, and this redundancy may be exploited, for example by obtaining respective signals for both complementary strands of the polynucleotide, and analysing them together.
  • the two strands may be joined by a hairpin adapter ligated onto both ends of the strands.
  • a method of estimating a nucleotide sequence of a polynucleotide comprising a pair of complementary polynucleotide strands and apparatus for carrying out the method.
  • the method includes providing a composite polynucleotide strand comprising the pair of complementary polynucleotide strands connected endwise to one another by a bridging molecule, translocating the composite polynucleotide strand in singlestranded form through a nanopore, generating a measurement signal indicative of interactions between the nanopore and the composite polynucleotide strand during translocation, processing the measurement signal using a first machine learning model to determine a read sequence, determining, using the read sequence, a segmentation point separating a pair of read sequence segments corresponding to the pair of complementary strands, performing an alignment procedure to obtain aligned segment data, the alignment procedure being performed on the pair of read sequence segments and/or a corresponding pair of segments of the measurement signal determined using the segmentation point, and processing the aligned segment data using a second machine learning model to estimate the nucleotide sequence of the polynucleotide.
  • Suitable bridging molecules or moieties include, but are not limited to a polymeric linker, a chemical linker, a polynucleotide or a polypeptide.
  • the bridging molecule may comprise nucleotides and may have a predetermined nucleotide sequence, and determining the segmentation point may include aligning the predetermined nucleotide sequence with the read sequence to identify a portion of the read sequence corresponding to the bridging nucleotide.
  • determining the segmentation point may include aligning the predetermined nucleotide sequence with the read sequence to identify a portion of the read sequence corresponding to the bridging nucleotide.
  • the method may include obtaining, based on outputs of the first machine learning model, a mapping that associates locations within the read sequence to corresponding locations within the measurement signal.
  • the method may further include determining, using the mapping, a point within the measurement signal corresponding to the segmentation point separating the pair of read sequence segments, followed by segmenting the measurement signal at the corresponding point to determine the pair of segments of the measurement signal.
  • the method may finally include performing the alignment procedure on the pair of segments of the measurement signal to obtain the aligned measurement signal segments, wherein the aligned segment data comprises the aligned measurement signal segments.
  • a computer-implemented method of estimating a nucleotide sequence of a polynucleotide comprising a pair of complementary polynucleotide strands a data processing system comprising means for carrying out the computer-implemented method, and a computer program product (such as one or more non-transitory storage media) comprising instructions which, when executed by a computer, cause the computer to carry out the computer-implemented method.
  • the computer-implemented method includes obtaining a measurement signal indicative of interactions between a nanopore and a composite polynucleotide strand comprising the pair of complementary polynucleotide strands connected endwise to one another by a bridging molecule, processing the measurement signal using a first machine learning model to determine a read sequence, determining, using the read sequence, a segmentation point separating a pair of read sequence segments corresponding to the pair of complementary strands, performing an alignment procedure to obtain aligned segment data, the alignment procedure being performed on the pair of read sequence segments and/or a corresponding pair of segments of the measurement signal determined using the segmentation point, and processing the aligned segment data using a second machine learning model to estimate the nucleotide sequence of the polynucleotide.
  • Figure 1 shows a schematic of apparatus for estimating a nucleotide sequence of a polynucleotide.
  • Figure 2 shows a schematic illustrating the estimation of a nucleotide sequence of the polynucleotide.
  • Figure 3 shows a schematic illustrating a method of determining a segmentation point within a read sequence.
  • Figure 4 shows a schematic illustrating a method of alignment involving alignment of read sequence segments and/or measurement signal segments.
  • Figure 5 shows a flow diagram representing a method of estimating a nucleotide sequence of the polynucleotide.
  • Embodiments of the present disclosure relate to estimating a nucleotide sequence, or sequencing, a double-stranded polynucleotide.
  • the embodiments described herein address challenges related to the analysis and alignment of measurement signals when performing duplex sequencing, in which the complementary strands of the polynucleotide are sequenced individually.
  • the disclosed embodiments make use of a composite polynucleotide strand formed of the pair of complementary strands connected endwise to one another by a bridging polymer whose nucleotide sequence is predetermined or otherwise readily identifiable.
  • an estimated read sequence of the composite polynucleotide strand obtained during a single translocation of the composite polynucleotide strand with respect to a nanopore, can be segmented by aligning the read sequence with the identifiable sequence of the bridging polymer. Segmentation of the read sequence in this manner can be more precise than the corresponding segmentation of the measurement signal because it ensures that the point of segmentation is unique and centrally located. Moreover, the subsequent alignment procedure is more reliable, algorithmically simpler, and less computationally expensive, leading to enhanced accuracy even compared with existing duplex methods.
  • Fig. 1 shows a schematic of a system 100 for estimating a nucleotide sequence of a polynucleotide.
  • the system 100 comprises a nanopore 102 situated in a membrane 104.
  • the nanopore 102 is also referred to in the technical field as a transmembrane nanopore.
  • the nanopore 102 may, for example, be a protein pore such as a polypeptide or a collection of polypeptides.
  • the nanopore may be composed of any other such molecules that allow the nanopore 102 to function as an aperture in the membrane 104.
  • the membrane 104 may be flanked by ionic solutions and the membrane material may preferably have a high resistance or resistivity so that the material volume is inconducive to the flow of ions.
  • the membrane 104 may be constructed out of a lipid bilayer.
  • the nanopore 102 may be constructed to allow a polynucleotide 106 to be moved through the pore.
  • the process of moving the polynucleotide 106 through the nanopore 102 is known in the technical field as translocation.
  • the polynucleotide may, for example, be DNA or RNA or any other polymer containing nucleotide units, in which the template (sense) and complement (antisense) strands are joined by a bridging molecule, such as a bridging polymer 108.
  • the template and complement strands may be joined at one end by a hairpin adapter to form a hairpin loop structure as shown in the inset in Fig. 1.
  • a hairpin adapter to form a hairpin loop structure as shown in the inset in Fig. 1.
  • Such a bridging polymer 108 may consist of nucleotides, for example 99 nucleotides, as well as optionally spacer units, for example 24 propylene phosphate spacer units.
  • the polynucleotide 106 with hairpin-joined strands may then be separated to form a single strand and passed through the nanopore 102.
  • the nanopore 102 may provide for controlled translocation by selectively permitting ions, such as hydrated ions or analytes, to flow across the membrane 104 under an applied potential difference.
  • the nanopore 102 may provide for translocation under the control of enzymatic activity, such as under the influence of a polynucleotide binding protein.
  • a protein such as helicase may be provided in the nanopore 102 to provide for movement of the polynucleotide 106 through the nanopore 102 in a stepwise fashion.
  • a combination of applied potential difference and enzymatic activity may be used to carry out controlled translocation of the polynucleotide 106.
  • the system 100 may contain electrical circuitry 110 to support performance of some of the functions described thus far.
  • the electrical circuitry 110 may contain electrodes placed on either side of the membrane 104 for establishing and/or maintaining a potential difference across the nanopore 102 during translocation.
  • the system 100 may also contain power supply components for powering various other system components as described below.
  • at least some of the components of the electrical circuitry 110 may be provided by devices external to the system 100.
  • the apparatus 100 allows a sequence of nucleotide units in the polynucleotide 106 to be estimated based on the following principle.
  • the polynucleotide 106 is translocated through the nanopore 102, its nucleotide units interact with the nanopore 102, thereby altering one or more measurable properties of the nanopore 102 or its surroundings.
  • the nucleotide units in the polynucleotide 106 may be the nitrogen-containing nucleobases: adenine (A), guanine (G), cytosine (C) and thymine (T) ordinarily found in DNA, joined to each other by covalent bonds.
  • each of these bases A, G, C or T may correspondingly alter the one or more measurable properties to varying extents.
  • a signal containing measurements of one of those properties will bear the signature of the sequence of nucleotides translocated through the nanopore 102.
  • one or more nucleotides may interact with the nanopore 102 and therefore the signal may be indicative of interactions between groups of consecutive nucleotides in the polynucleotide 106 during translocation. Nevertheless, the signal may be probed to identify and/or estimate the identities of the individual nucleotides in the polynucleotide 106.
  • RNA such as uracil (U)
  • U uracil
  • the measurable property may be a measurement of ion flow through the nanopore such as a current measurement.
  • Suitable conditions for measuring ionic currents through transmembrane protein pores are known in the art.
  • the method may be carried out with a voltage applied across the membrane and pore.
  • the voltage used may vary from +2 V to -2 V, or from -400 mV to +400 mV. It is possible to increase discrimination between different nucleotides by a pore by using an increased applied potential.
  • the methods may be carried out in the presence of any charge carriers, such as metal salts, for example alkali metal salt, halide salts, for example chloride salts, such as alkali metal chloride salt.
  • Charge carriers may include ionic liquids or organic salts, for example tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or l-ethyl-3 -methyl imidazolium chloride, potassium chloride, sodium chloride, caesium chloride or rubidium chloride.
  • the salt concentration may be 3 M or lower and is typically from 0.1 to 2.5 M. Hel308, XPD, RecD and Tral helicases surprisingly work under high salt concentrations which advantageously provide a high signal to noise ratio.
  • the methods may be carried out in the presence of a buffer.
  • Any suitable buffer may be used in the method of the invention such as HEPES or Tris-HCl buffer.
  • the pH may vary from 4.0 to 12.0 and is preferably about 7.5.
  • the methods may be carried out at a temperature that supports enzyme function.
  • the membrane may be supported on a support structure separating cis and trans compartments comprising ionic solution.
  • An array of membranes each comprising a nanopore may be provided on a support structure comprising a common cis chamber and multiple trans chambers such as disclosed by WO2014064443.
  • the potential difference may be applied across the nanopore between electrodes provide in the cis and trans compartments.
  • Suitable electrode materials include Pt, Pd and Ag.
  • the electrodes may be reference electrodes such as Ag/AgCl.
  • the ionic solution may comprise a redox couple such as potassium ferri/ferrocyanide.
  • Possible electrical measurements also include: nanopore tunnelling measurements, FET measurements and optical measurements combined with electrical measurements such as disclosed by WO 2005/124888, WO2020183172, Soni GV et al., Rev Sci Instrum. 2010 Jan;81(l):014301, Ivanov AP et al., Nano Lett. 2011 Jan 12;1 l(l):279-85 and WO2016/009180.
  • Polynucleotide as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, this term includes double- and single-stranded DNA, and RNA.
  • the polynucleotide is a single-stranded DNA- RNA hybrid. DNA-RNA hybrids can be prepared by ligating single-stranded DNA to RNA or vice versa.
  • the polynucleotide is most typically single-stranded deoxyribonucleic acid (DNA) or single-stranded ribonucleic nucleic acid (RNA).
  • the polynucleotide is double-stranded DNA. In some embodiments the polynucleotide is double-stranded RNA. In some embodiments the polynucleotide is a double-stranded DNA-RNA hybrid. Double-stranded DNA-RNA hybrids can be prepared from single-stranded RNA by reverse transcribing the cDNA complement.
  • the polynucleotide can be of any length. For example, the polynucleotide can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least 500 nucleotides or nucleotide pairs in length.
  • the polynucleotide can be 1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotides or nucleotide pairs in length or 100000 or more nucleotides or nucleotide pairs in length.
  • the polynucleotide may be made up of deoxyribonucleotide bases or ribonucleotide bases. Nucleic acids may be manufactured synthetically in vitro or isolated from natural sources.
  • Nucleic acids may further include modified DNA or RNA, for example DNA or RNA that has been methylated, or RNA that has been subject to post-translational modification, for example 5 ’-capping with 7-m ethylguanosine, 3 ’-processing such as cleavage and polyadenylation, and splicing.
  • Nucleic acids may also include synthetic nucleic acids (XNA), such as hexitol nucleic acid (HNA), cyclohexene nucleic acid (CeNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), locked nucleic acid (LNA) and peptide nucleic acid (PNA).
  • HNA hexitol nucleic acid
  • CeNA cyclohexene nucleic acid
  • TAA threose nucleic acid
  • GNA glycerol nucleic acid
  • LNA locked nucleic acid
  • nucleic acids also referred to herein as “polynucleotides” are typically expressed as the number of base pairs (bp) for double-stranded polynucleotides, or in the case of single-stranded polynucleotides as the number of nucleotides (nt). One thousand bp or nt equal a kilobase (kb). Polynucleotides of less than around 40 nucleotides in length are typically called “oligonucleotides” and may comprise primers for use in manipulation of DNA such as via polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • Nucleotides can have any identity, and include, but are not limited to, adenosine monophosphate (AMP), guanosine monophosphate (GMP), thymidine monophosphate (TMP), uridine monophosphate (UMP), 5-methylcytidine monophosphate, 5- hydroxymethylcytidine monophosphate, cytidine monophosphate (CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP), deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate (dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine monophosphate (dCMP) and deoxymethylcytidine monophosphate.
  • AMP adenosine monophosphate
  • GFP guanosine monophosphate
  • TMP thymidine monophosphate
  • UMP uridine mono
  • the nucleotides are preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMP and dUMP.
  • a nucleotide may be abasic (i.e. lack a nucleobase).
  • a nucleotide may also lack a nucleobase and a sugar (i.e. is a C3 spacer).
  • the polynucleotide may be a concatemer comprising multiple sets of complementary strands each linked by a bridging molecule or moiety. The concatemer may be provided by methods such as disclosed by US9910956.
  • transmembrane pore may be used in the methods provided herein.
  • the pore may be biological or artificial.
  • Suitable nanopores include, but are not limited to, protein pores, polynucleotide pores, and pores formed in solid state substrates such as silicon nitride.
  • a solid-state pore may comprise a nanochannel.
  • the pore may be a DNA origami pore such as disclosed in WO2013/083983.
  • the protein pore may be a monomer or an oligomer.
  • the pore may be made up of several repeating subunits, such as at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, or at least 16 subunits.
  • the pore may be a hexameric, heptameric, octameric or nonameric pore.
  • the pore may be a homo-oligomer or a hetero-oligomer.
  • the transmembrane protein pore may be modified from the wild type.
  • the transmembrane protein pore may be derived from P-barrel pores or a -helix bundle pores such as a -hemolysin, anthrax toxin and leukocidins, and outer membrane proteins/porins of bacteria, such as Mycobacterium smegmatis porin (Msp), for example MspA, MspB, MspC, MspD, CsgG, outer membrane porin F (OmpF), outer membrane porin G (OmpG), outer membrane phospholipase A and Neisseria autotransporter lipoprotein (NalP) and other pores, such as lysenin, inner membrane proteins and a outer membrane proteins, such as WZA and ClyA toxin.
  • Msp Mycobacterium smegmatis porin
  • Msp Mycobacterium smegmatis porin
  • MspA MspA
  • MspB MspC
  • MspD Ms
  • the nanopore may be supported in a membrane having cis and trans openings on opposite sides of the membrane.
  • the membrane may preferably be an amphiphilic layer.
  • An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties.
  • the amphiphilic molecules may be synthetic or naturally occurring. Non-naturally occurring amphiphiles and amphiphiles which form a monolayer are known in the art and include, for example, block copolymers such as disclosed in WO2014/064444.
  • the block copolymer may be a diblock (consisting of two monomer sub-units), but may also be constructed from more than two monomer sub-units to form more complex arrangements that behave as amphiphiles.
  • the copolymer may be a triblock, tetrablock or pentablock copolymer.
  • the membrane is preferably a triblock copolymer membrane.
  • the amphiphilic molecules may be chemically-modified or functionalised to facilitate coupling of the polynucleotide.
  • the amphiphilic layer may be a monolayer or a bilayer.
  • the amphiphilic layer may be planar.
  • the amphiphilic layer may be curved.
  • the membrane may be a lipid bilayer.
  • the lipids may comprise a head group, an interfacial moiety and two hydrophobic tail groups which may be the same or different. Suitable lipid bilayers are disclosed in WO 2008/102121, WO 2009/077734 and WO 2006/100484.
  • the helicase may be any of the helicases, modified helicases or helicase constructs disclosed in WO2013/057495, WO 2013/098562, WO2013098561, WO 2014/013259; WO 2014/013262 and WO 2014013260.
  • the helicase may be added to the polynucleotide during sample preparation and stalled by one or more spacers as disclosed in WO2014135838.
  • the polynucleotide may comprise a polymer leader sequence which preferentially threads into the pore.
  • the leader is preferably negatively charged and may be a polynucleotide, such as DNA or RNA, a modified polynucleotide (such as abasic DNA), PNA, LNA, polyethylene glycol (PEG) or a polypeptide.
  • the leader sequence may form part of a Y adapter that my comprise (a) a double-stranded region and (b) a single-stranded region or a region that is not complementary at the other end. Leader sequences and Y adapters suitable for use are disclosed for example in WO2017149316.
  • the rate of translocation of polynucleotide through the pore can be enhanced by coupling it to the membrane.
  • Suitable coupling moieties are disclosed for example in WO2017149316 and WO12164270.
  • the bridging molecule may be a polymer such as DNA, RNA, PNA or LNA and comprise one or more modified nucleotides, for example an abasic or 5mC.
  • the bridging polymer may have a stem-loop hairpin structure. Suitable hairpins can be designed using methods known in the art. In some embodiments a hairpin loop is typically 4 to 100 nucleotides in length, e.g. from 4 to 50 such as from 4 to 20 e.g. from 4 to 8 nucleotides in length. In an embodiment the bridging polymer consists of nucleotides.
  • the bridging polymer may also comprise one of more spacers such as 5SpC3 and iSpC3, commercially available from Integrated DNA Technologies, which provide a characteristic measurement signal.
  • spacers suitable for use in the invention are disclosed in WO2017149316.
  • a current flowing through the nanopore 102 may be measured during translocation of the polynucleotide 106. Similarly, a current flowing in a transverse direction of the nanopore 102 or in a surrounding region of the nanopore 102 may be measured during translocation.
  • the current may either decrease or increase in response to the passage of each nucleotide, and may do so to an extent that corresponds with a particular nucleotide or multiple nucleotides being translocated through the nanopore 102.
  • variations of current may be on the order of picoamperes, nanoamperes, or microamperes.
  • the ionic current may be measured by a sensor 112, and the measurement signal may optionally be amplified using an amplifier 114, such as a low-noise amplifier system.
  • the measurable property may be a voltage difference or a change in resistivity or conductance or other ionic property of the nanopore 102 or its surrounding medium.
  • the measurement signal may be digitised to obtain a measurement signal to be provided to a data processing system 116 for analysis.
  • the data processing system 116 may contain hardware and software components capable of estimating the sequence of nucleotides based on the measurement signal.
  • the data processing system 116 may be a desktop, a laptop, a tablet, or a server, or any combination of the above.
  • the data processing system may be a dedicated device for sequencing polymers, such as produced by Oxford Nanopore Technologies (RTM)
  • the data processing system 116 may include a central processing unit (CPU) having one or more processors including integrated circuit microprocessors that may optionally be multithreaded, such as Intel (RTM) Xeon or Intel Core i3/i5/i7 series processors.
  • the data processing system 116 may also optionally include one or more graphic processing units (GPUs) such as Nvidia (RTM) Al 00 or Hl 00 GPUs or AMD GPUs.
  • the data processing system 116 may contain software that may include source code, object code, firmware, etc. in any suitable language.
  • the source code may be written in Python, C, C++, Rust, Julia, etc and may use specific development frameworks, libraries, or packages, including PyTorch, TensorFlow, Keras, CUDA, etc.
  • the examples provided herein for the hardware and software components do not constitute an exhaustive list; many more similar components may be alternatively or additionally included.
  • the data processing system 116 may also be configured to provide instructions to the electrical circuitry 110, for example to control the translocation of the polynucleotide 106 with respect to the nanopore.
  • Fig. 2 shows a schematic of a method of estimating a nucleotide sequence of a double-stranded polynucleotide according to the present disclosure.
  • the method involves providing a composite polynucleotide strand 218, in which the individual strands of the double-stranded polynucleotide are connected endwise to one another.
  • the individual strands may be connected to one another using a bridging molecule.
  • construction of a composite polynucleotide strand 218 using a bridging molecule may include connecting the bridging molecule to one end each of the double-stranded polynucleotide.
  • the ends of bridging molecule may be ligated using covalent bonds with an end of each of the complementary strands of the double-stranded polynucleotide.
  • two ends of the bridging molecule may be connected to proximal ends of the complementary strands, i.e. to the ends of the complementary strands at one end of the double-stranded polynucleotide.
  • the pair of complementary strands of the doublestranded polynucleotide ordinarily coupled by hydrogen bonds, may be decoupled by introducing a suitable molecule such as a polynucleotide binding protein.
  • the outcome of these two stages of construction is a single composite polynucleotide strand 218 formed of the two individual strands of the original double-stranded polynucleotide flanking the bridging molecule.
  • the initial stage during construction may alternatively involve the ends of the bridging molecule being connected to distal ends of the complementary strands, i.e. to an ends of complementary strands at different ends of the polynucleotide. Regardless of the ends of the complementary strands connected by the bridging polymer, the composite polynucleotide strand 218 is formed of the two individual strands flanking the bridging molecule.
  • Fig. 2 shows the template strand TSTRAND and the complement strand CSTRAND flanking a bridging molecule.
  • the bridging molecule may, for example, be a polymer containing a chain of monomer units linked by covalent bonds labelled BPOLY in Fig. 2.
  • Such a bridging polymer may contain nucleotides that may be identifiable by the sequencing analysis.
  • the bridging polymer may be a polynucleotide containing a sequence of nucleotides, such as DNA or RNA.
  • the bridging polymer may also contain non-nucleotides in other examples.
  • the bridging polymer may be a hairpin typically containing four or more nucleotide units.
  • a hairpin molecule may be connected to proximal ends of the complementary strand, forming a hairpin stem-loop structure, prior to separating the complementary strands of the polynucleotide. Regardless of the bridging molecule used, the resulting composite polynucleotide strand 218 is translocated through the nanopore in order for a nucleotide sequence of the polynucleotide to be estimated.
  • Frame 220 of Fig. 2 shows the composite polynucleotide strand 218 being translocated through the nanopore 202.
  • a composite polynucleotide strand 218 containing both complementary strands connected by a bridging polymer both strands of the polynucleotide can be passed through the nanopore 202 in a single translocation of a polynucleotide.
  • Fig. 2 shows a measurement signal 222 generated during the translocation of the composite polynucleotide strand 218 with respect to the nanopore.
  • the measurement signal 222 is indicative of interactions between the nanopore 202 and the composite polynucleotide strand 218 during translocation.
  • the number of datapoints in the measurement signal 222 may depend on the resolution of the sensor and length of the composite polynucleotide strand 218.
  • the measurement signal 222 may for example contain measurements of ionic current across the nanopore 202 during translocation of the composite polynucleotide strand 218 with respect to the nanopore 202. Consecutive portions of the measurement signal 222 may correspond to nucleotides in consecutive regions of the composite polynucleotide strand 218.
  • the measurement signal 222 may exhibit a degree of symmetry. Measurements indicative of interactions between a group of nucleotides in the template strand may be expected to be complementary with measurements indicative of interactions between the associated group in the complement strand. In examples where the complementary strands are connected from distal ends to the bridging polymer, the order of measurements may be the same in the template and complement strand portions of the measurement signal 222.
  • the order of measurements may be reversed between the template and complement portions of the signal.
  • the measurement signal 222 may therefore be expected to first contain measurements indicative of interactions with the leading portion of the complement strand CSTRAND, then contain measurements indicative of interactions with the bridging molecule BPOLY, and finally contain measurements indicative of interactions with the tail portion of the template strand TSTRAND.
  • Fig. 2 shows the measurement signal 222 containing respective portions CSIGNAL followed by BSIGNAL and then TSIGNAL. Regardless of the points of connection of the bridging molecule with the complementary strands, the symmetry in the measurement signal is not normally perfect.
  • Measurements in portions of the signal 222 corresponding to the template strand will not normally have an exact match with the measurements in portions of the signal 222 corresponding to the complement strand. This is partly because of measurement noise or other statistical differences associated with measurements of this nature. Furthermore, differences may arise due to the differences in sensitivities of the sensor to various nucleotides or nucleotide groups. For example, the sensor may be more sensitive to adenine as compared with the corresponding thymine nucleobase. As a result, measurements form a portion of the template signal may be more accurate than measurements from a portion of the complement signal. A further source of differences between the signals may be the varying sensitivities of the sensor to measurements of a leading strand and a tailing strand.
  • having translocated a portion of the complement signal may result in altered sensitivities of the sensor towards the nucleotides in the template strand.
  • translocating both template and complement strand as part of the composite polynucleotide strand 218 allows the differences, including concurrent errors in the respective sequence estimates, to be accounted for, resulting in a more accurate overall estimate of the sequence.
  • Fig. 2 shows that measurement signal 222 being provided to a first machine learning (ML) model 224, which processes the measurement signal 222 to determine a read sequence 226.
  • the ML model 224 may be trained to estimate nucleotide sequences from using a training dataset containing signals and corresponding nucleotide sequences.
  • the ML model 224 may comprise a recurrent neural network (RNN) such as a long short-term memory (LSTM) network, or a transformer, and may have been trained to predict nucleotide sequences from input signals.
  • RNN recurrent neural network
  • LSTM long short-term memory
  • the ML model 224 may comprise statistical models that do not rely on neural networks, such as Hidden Markov Models (HMMs).
  • HMMs Hidden Markov Models
  • the trained ML model 224 may be associated with a number of trainable parameters such as weights, biases, and other values, whose values may have been determined from training and/or fine-tuning procedures, for example supervised training using measurement signals labelled with corresponding target sequences.
  • the ML model 224 may therefore identify a read sequence 226 of nucleotides associated with the measurement signal 222 corresponding to the composite polynucleotide strand 218. It is noted that the read sequence 226 may be generated by operating on partitions of the measurement signal 222 in batches for efficient management of memory and computing resources. In such cases, the individual results of operating on partitions may be collated at a later stage to obtain the read sequence 226 corresponding to the composite polynucleotide strand 218.
  • Fig. 2 the read sequence 226 is shown containing an illustrative basecalled sequence of nucleotides in regions corresponding to the complement, bridging polymer and the template portions of the signals.
  • CSEQUENCE may represent an estimated nucleotide sequence corresponding to the CSIGNAL portion of the measurement signal 222
  • TSEQUENCE may represent an estimated nucleotide sequence corresponding to the TSIGNAL portion of the measurement signal 222.
  • BSEQUENCE may represent an estimated nucleotide sequence corresponding to the BSIGNAL portion of the measurement signal 222.
  • the read sequence 226 may be seen as representation of the composite polynucleotide strand 218, or equivalently of the measurement signal 222, in a “base space”, having the dimensions of the number of nucleotides in the composite polynucleotide strand 218. The methods discussed below operate this base space representation to allow more accurate alignment of the template and complement sequences.
  • the method proceeds to determine in frame 228, using the read sequence 226, a segmentation point 230 separating the pair of read sequence segments TSEQUENCE and CSEQUENCE corresponding to the pair of complementary strands of the double-stranded polynucleotide.
  • the method may involve determining a single segmentation point 230, whereas in other examples the method may involve determining two or more segmentation points separating the pair of read sequence segments. Regardless of the number of segmentation points, the location(s) of the segmentation point(s) can be used to identify respective segments of the read sequence 226 corresponding to the template sequence and the complement sequence.
  • the methods described may therefore include determining the template sequence segment TSEQUENCE and the complement sequence segment CSEQUENCE using the base space representation of the measurement signal 222. Determining the template and complement sequences using the base space representation allows the segmentation point to be identified with higher precision in comparison to alternative approaches, for e.g those involving identification of the segmentation point in signal space.
  • the method proceeds to perform an alignment procedure 232 to obtain aligned segment data.
  • the alignment procedure 232 may be performed on the pair of read sequence segments TSEQUENCE and CSEQUENCE identified as a result of the determination of the segmentation point or points.
  • the alignment procedure may be performed on segments of the measurement signal 222. Such segments of the measurement signal 222 may correspond to the template signal TSIGNAL and the complement signal CSIGNAL and be determined using the segmentation point 230 as described hereinafter.
  • the alignment procedure 232 may be performed on a combination of the read sequence segments TSEQUENCE and CSEQUENCE and measurement signal segments TSIGNAL and CSIGNAL.
  • the aligned segment data contains information corresponding to both the template and complement strands of the doublestranded polynucleotide.
  • the method includes providing the aligned segment data produced by the alignment procedure to a second machine learning (ML) model 234.
  • the second ML model 234 may include components that are configured to process the aligned segment data in order to estimate the nucleotide sequence for the polynucleotide.
  • the second ML model 234 may include similar components as the first ML model.
  • the second ML model 234 may comprise a recurrent neural network (RNN) such as a long short-term memory (LSTM) network, or a transformer, and may have been trained to predict nucleotide sequences from input sequence data.
  • RNN recurrent neural network
  • LSTM long short-term memory
  • the second machine learning model 234 is able to estimate nucleotide sequences for the polynucleotide with an improved accuracy compared with sequencing based on any one of the complementary strands as described further below.
  • Fig. 3 shows a schematic illustrating a method 300 of obtaining aligned segment data using the segmentation point 330 within the read sequence 326.
  • the method 300 involves using a bridging polymer BPOLY containing a predetermined nucleotide sequence BSEQUENCE.
  • the bridging polymer BPOLY may contain a sequence with a predetermined number of repeating units of a nucleotide such as thymine, as depicted by “TTTTTTT” j n Fig 3 [ n other examples, the bridging polymer BPOLY may comprise a variety of nucleotides that form a predetermined nucleotide sequence BSEQUENCE.
  • Fig. 3 shows a copy of the predetermined sequence BSEQUENCE being aligned with the read sequence 326.
  • the segmentation point 330 can be determined as, for example, a location in the centre of the bridging polymer sequence BSEQUENCE. In other examples, two segmentation points can be determined, each corresponding to an end of the bridging polymer sequence BSEQUENCE in the read sequence 326. In still other examples, alternative segmentation points may be determined. In any case, the segmentation point 330 may correspond to a locus of the bridging polymer sequence BSEQUENCE in the read sequence 326. Therefore, the segmentation point 330 may even be a region corresponding to the locations of monomers in the bridging polymer sequence BSEQUENCE within the read sequence 326.
  • the alignment of the predetermined bridging sequence BSEQUENCE with the read sequence 326 allows segments TsEQUENCEand CSEQUENCE, respectively corresponding to the template and complement strands of the polynucleotide, to be readily identified.
  • identification of the template and complement sequence segments TsEQUENCE and CSEQUENCE can be done entirely using the base space representation of the measurement signal 222, greatly simplifying the process as well as accuracy of identifying template and complement sequence segments TsEQUENCE and CSEQUENCE as compared with, for example, relying on identification of the template and complement sequences based on first identifying template and complement signals in the measurement signal 222.
  • the approach described also improves over other approaches, for example, relying on separate translocation of complementary strands which necessitates a complex procedure for aligning the individual template and complement strands.
  • a single read sequence 326 is obtained in which the template and complement strands are separated by a predetermined sequence of nucleotides in base space.
  • the predetermined sequence BSEQUENCE of nucleotides can be aligned with the estimated read sequence 326 in a relatively straightforward computational step.
  • the alignment may involve applying a cross-correlation on the estimated read sequence 326 and the predetermined sequence BSEQUENCE to identify the location of the predetermined sequence BSEQUENCE in the read sequence 326.
  • dynamic programming techniques may be employed on respective computational data structures containing the read sequence 326 and the predetermined sequence BSEQUENCE.
  • Other alignment algorithms may be adopted also; for example, the Needleman-Wunsch algorithm or the Smith-Waterman may be adopted.
  • the locations corresponding to the template and complement read sequence segments TSEQUENCE and CSEQUENCE can be identified more efficiently and accurately compared with an alternative approach involving alignment of a signal corresponding to the bridging polymer BPOLY and the measurement signal 222.
  • the read sequence 326 may be segmented to obtain segmented data 336 contained a template read sequence segment 338 and a complement read sequence segment 340.
  • Fig. 3 shows the read sequence 326 being split at a single location associated with the centre of the predetermined sequence BSEQUENCE, the segmentation can be alternatively performed at the locations corresponding to the ends of the of the predetermined sequence BSEQUENCE.
  • two read sequence segments may be obtained as shown in Fig. 3, each containing a portion of the predetermined sequence corresponding to the bridging polymer BPOLY.
  • a third read sequence segment may corresponding to the predetermined sequence BSEQUENCE may be additionally obtained.
  • the read sequence segments 334 and 336 corresponding to the template and complement strands TSTRAND and CSTRAND may be obtained.
  • the read sequence segments 338 and 340 can be provided as inputs to the alignment procedure 332.
  • the alignment procedure 332 may align the nucleotide units of the read sequence segments 338 and 340 to obtain aligned read sequence segments. These aligned read sequence segments may serve a starting point for further sequencing using the second machine learning model 334.
  • Fig. 4 shows a schematic illustrating a method 400 of aligning read sequence segments and/or measurement signal segments.
  • the segmentation point 430 (or segmentation points / segmentation region) identified in the read sequence 426 may be used to obtain a pair of read sequence segments TSEQUENCE and CSEQUENCE respectively corresponding to the template and complement strands of the polynucleotide.
  • the complement strand contains nucleotides that are complementary to respective nucleotides in the template strand
  • aligning the template and complement sequences TSEQUENCE and CSEQUENCE may involve individually replacing the estimated nucleotides in one of the sequences with their known complement nucleotides, i.e. the template or complement sequences TSEQUENCE or CSEQUENCE may be “complemented” to obtain a pair of analogous sequences.
  • the composite polynucleotide strand 218 may be constructed by joining the proximal ends of the polynucleotide with the ends of a bridging polymer. In such examples, the template and complement strands are translocated through the nanopore in a reverse order.
  • alignment of the template and complement sequence segments TSEQUENCE and CSEQUENCE may include not only complementing but also reversing CSEQUENCE or TSEQUENCE to obtain its “reverse complement”, and thus obtaining a pair of analogous sequences.
  • a reverse complement C SEQUENCE of CSEQUENCE is determined, which is expected to be analogous to the template sequence TSEQUENCE.
  • the template sequence TSEQUENCE and the reverse complement sequence C SEQUENCE indicate the same nucleotides at a number of positions. However, several positions also indicate departures or divergences between the two sequences due to the aforementioned differences in the corresponding portions of the measurement signal 222.
  • the alignment procedure can be adapted to the scenario where the composite polynucleotide strand 218 is constructed by connecting the bridging polymer to distal ends of the complementary strands in the double-stranded polynucleotide.
  • the complement sequence CSEQUENCE may be complemented but not reversed to obtain the sequence C SEQUENCE that is analogous to the template sequence TSEQUENCE.
  • the aligned pair of sequence segments may be provided to a second machine learning model 434 for enhanced sequencing or basecalling. By sequencing the aligned pair of sequence segments using the second machine learning model 434, the method 400 allows a nucleotide sequence to be estimated with higher accuracy than could be estimate using either of the individual sequence segments TSEQUENCE and CSEQUENCE.
  • the methods discussed below also allow for the segmentation and alignment of measurement signal segments corresponding to the template and complement strands of the polynucleotide.
  • aligned segment data in the form of either aligned measurement signal segments or read sequence segments, the methods enable a number of possibilities for higher accuracy basecalling of the double-stranded polynucleotide.
  • Fig. 4 shows a mapping 444 that associates locations within the read sequence 426 with corresponding locations within the measurement signal 422.
  • the mapping 444 may be stored in the form a table whose elements associate locations within the read sequence 426 to corresponding locations within the measurement signal 422.
  • the mapping 444 may be obtained as an output of the first machine learning model 224 alongside the read sequence 426.
  • the mapping 444 may be obtained separately by using the read sequence 426 and the measurement signal 422 to associate the corresponding locations of each.
  • the mapping 444 may be provided as an output of another machine learning model separately form the first machine learning model.
  • the method 400 may continue by using the mapping 444 to determine a point 448 or region within the measurement signal 422 that corresponds to the segmentation point 430. Since the segmentation point 430 separates the read sequence 426 into regions or segments corresponding to the complementary strands of the polynucleotide, the corresponding point 448 separates the measurement signal 422 into regions or segments corresponding to the complementary strands of the polynucleotide.
  • the precise form of the point mapped using the mapping 444 may vary: when two or more segmentation points are identified within the read sequence 426 corresponding, for example, to the ends of the bridging polymer in the read sequence 426, the corresponding point 448 in the measurement signal 422 may also in fact be several points corresponding to the ends of the bridging polymer in the measurement signal 422. In any case, the measurement signal 422 may be segmented to obtain a pair of segments TSIGNAL and CSIGNAL corresponding to the complementary strands of the polynucleotide being sequenced.
  • Frame 450 shows an aligned pair of measurement signal segments TSIGNAL and C’ SIGNAL.
  • aligning the read sequence segments involves reverse complementing the sequence segment CSEQUENCE, and therefore aligning the measurement signal segments can be achieved by reverse complementing the corresponding measurement signal segment CSIGNAL.
  • the reverse complement C’ SIGNAL for the measurement signal CSIGNAL may be obtained from the reverse complemented sequence C SEQUENCE using an inverse of the mapping 444.
  • the resulting signal segment C’ SIGNAL may be expected to correspond to the measurement signal segment TSIGNAL.
  • the alignment procedure may result in an aligned pair of measurement signals, which may be provided to the second machine learning model 434.
  • the aligned measurement signal segments TSIGNAL and C’SIGNAL may be provided to the machine learning model 432 in combination with the aligned read sequence segments TSEQUENCE and C SEQUENCE.
  • the measurement signal segments TSIGNAL and C’SIGNAL may be provided to the second machine learning model 434 without additionally providing the aligned sequence segments TSEQUENCE and C SEQUENCE.
  • the aligned sequence segments TSEQUENCE and C SEQUENCE may be the only aligned segment data provided to the second machine learning model 434, in which case the steps set out in frames 446 and 450 may be omitted.
  • the second machine learning model 434 may be provided with additional aligned segment data.
  • the first machine learning model 424 may generate confidence estimates in the form of confidence scores indicative of the confidence of the first machine learning model 424 about the identity of a given nucleotide.
  • the confidence score may for example be provided as a probability, a Q score, or any other suitable confidence metric.
  • confidence estimates may be determined for individual measurements or sets of measurements in the measurement signal 422, for example as an additional output of the sensing apparatus.
  • Such information may be provided to the second machine learning model 434 to further enhance the accuracy of the estimated nucleotide sequence.
  • the second machine learning model 434 since the second machine learning model 434 operates on aligned segment data, it may be capable of estimating nucleotides with higher accuracy, particularly where the read sequence segments corresponding to the complementary strands have concurrent errors in the estimates of coupled nucleotides.
  • nucleobases in DNA or RNA may be modified by a process of methylation, in which methyl groups are attached to one or more nucleobases. Methylation may occur on either or both strands of the polynucleotide; nucleobases methylated on a single strand (termed hemi-methylation) may be particularly challenging to predict using the existing corpus of methods.
  • aligned segment data obtained using the methods may be used to improve estimations of nucleobase modifications, such as in the hemi-methylation of DNA.
  • a third machine learning model may be trained with supervised learning to predict methylation of nucleobases.
  • the aligned segment data containing either aligned read sequence segments 442 or aligned measurement signal segments 450 or a combination thereof, may be provided to the third machine learning model to obtain enhanced predictions of nucleotide methylation.
  • Other modifications of individual nucleotides or nucleotide groups may be predicted in a similar manner using different machine learning models.
  • the second machine learning model 434 may have an additional inference head which is capable of estimation the modifications.
  • Fig. 5 shows a flow diagram of the method 500 of estimating a nucleotide sequence of a polynucleotide formed of a pair of complementary polynucleotide strands.
  • the method proceeds to determine, using the read sequence determined by the first machine learning model, a segmentation point separating a pair of read sequence segments corresponding to the pair of complementary strands.
  • the segmentation point may instead be a segmentation region or a plurality of segmentation points that separate the pair of read sequence segments corresponding to the pair of complementary strands.
  • the method involves performing an alignment procedure to obtain aligned segment data.
  • the alignment procedure may be performed on the pair of read sequence segments and/or a corresponding pair of segments of the measurement signal determined using the segmentation point.
  • the pair measurement signal segments may be obtained by using a mapping to determine a point in the measurement signal corresponding to the segmentation point in the read sequence.
  • the method involves processing the aligned segment data using a second machine learning model to estimate the nucleotide sequence of the polynucleotide.
  • the measurement signal may be recorded as data and the steps 552 to 564 may be performed on or using a computing device or a network of computing devices.
  • the above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged.
  • the methods described above may be extended to include multiple copies of the polynucleotide strand in the composite polynucleotide strand.
  • the composite polynucleotide strand may be a linear concatemer formed of more than one, or a plurality, of pairs of complementary strands. Such a linear concatemer may be constructed from the desired number of naturally occurring pairs of complementary strands of a given polynucleotide.
  • three bridging molecules may be used to connect two pairs of complementary strands of the polynucleotide using covalent bonds.
  • the pairs of complementary strands may be uncoupled to result in a linear concatemer containing two copies of the template strand and two copies of the complement strand.
  • the methods discussed above may be adapted to translocate the linear concatemer and obtain aligned segment data for even higher accuracy sequencing of the polynucleotide.
  • the linear concatemer may contain any number of pairs of complementary strands of the polynucleotide.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un procédé d'estimation d'une séquence nucléotidique d'un polynucléotide comprenant une paire de brins polynucléotidiques complémentaires comprenant la fourniture d'un brin polynucléotidique composite comprenant la paire de brins polynucléotidiques complémentaires reliés bout à bout l'un à l'autre par une molécule de pontage, la translocation du brin polynucléotidique composite sous forme monocaténaire à travers un nanopore, la génération d'un signal de mesure pendant la translocation, le traitement du signal de mesure à l'aide d'un premier modèle d'apprentissage automatique pour déterminer une séquence de lecture, la détermination, à l'aide de la séquence de lecture, d'un point de segmentation séparant une paire de segments de séquence de lecture correspondant à la paire de brins complémentaires, la réalisation d'une procédure d'alignement pour obtenir des données de segments alignés, la procédure d'alignement étant effectuée sur la paire de segments de séquence de lecture et/ou une paire correspondante de segments du signal de mesure déterminés à l'aide du point de segmentation, et le traitement des données de segments alignés à l'aide d'un deuxième modèle d'apprentissage automatique pour estimer la séquence nucléotidique du polynucléotide.
PCT/GB2025/051060 2024-05-17 2025-05-16 Séquençage de polynucléotides amélioré Pending WO2025238370A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2407055.9 2024-05-17
GBGB2407055.9A GB202407055D0 (en) 2024-05-17 2024-05-17 Enhanced polynucleotide sequencing

Publications (1)

Publication Number Publication Date
WO2025238370A1 true WO2025238370A1 (fr) 2025-11-20

Family

ID=92932180

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2025/051060 Pending WO2025238370A1 (fr) 2024-05-17 2025-05-16 Séquençage de polynucléotides amélioré

Country Status (2)

Country Link
GB (1) GB202407055D0 (fr)
WO (1) WO2025238370A1 (fr)

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005124888A1 (fr) 2004-06-08 2005-12-29 President And Fellows Of Harvard College Transistor a effet de champ dans un nanotube au carbone suspendu
WO2006100484A2 (fr) 2005-03-23 2006-09-28 Isis Innovation Limited Administration de molecules dans une bicouche lipidique
WO2008102121A1 (fr) 2007-02-20 2008-08-28 Oxford Nanopore Technologies Limited Formation de bicouches lipidiques
WO2009077734A2 (fr) 2007-12-19 2009-06-25 Oxford Nanopore Technologies Limited Formation de couches de molécules amphiphiles
WO2012164270A1 (fr) 2011-05-27 2012-12-06 Oxford Nanopore Technologies Limited Procédé de couplage
WO2013057495A2 (fr) 2011-10-21 2013-04-25 Oxford Nanopore Technologies Limited Procédé enzymatique
WO2013083983A1 (fr) 2011-12-06 2013-06-13 Cambridge Enterprise Limited Contrôle de la fonctionnalité de nanopore
WO2013098562A2 (fr) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Procédé enzymatique
WO2013098561A1 (fr) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Procédé de caractérisation d'un polynucléotide au moyen d'une hélicase xpd
WO2014013260A1 (fr) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Hélicases modifiées
WO2014013259A1 (fr) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Procédé ssb
WO2014013262A1 (fr) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Construction d'enzyme
WO2014064444A1 (fr) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Interfaces de gouttelettes
WO2014064443A2 (fr) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Formation de groupement de membranes et appareil pour celle-ci
WO2014135838A1 (fr) 2013-03-08 2014-09-12 Oxford Nanopore Technologies Limited Procédé d'immobilisation enzymatique
WO2015166276A1 (fr) 2014-05-02 2015-11-05 Oxford Nanopore Technologies Limited Procédé pour améliorer le mouvement d'un polynucléotide cible par rapport à un pore transmembranaire
WO2016009180A1 (fr) 2014-07-14 2016-01-21 Isis Innovation Limited Mesure d'analytes avec molécules de canal de membrane, et réseaux bicouche
WO2016034591A2 (fr) 2014-09-01 2016-03-10 Vib Vzw Pores mutants
WO2017149316A1 (fr) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Pore mutant
US9910956B2 (en) 2008-03-28 2018-03-06 Pacific Biosciences Of California, Inc. Sequencing using concatemers of copies of sense and antisense strands
WO2018203084A1 (fr) 2017-05-04 2018-11-08 Oxford Nanopore Technologies Limited Analyse d'apprentissage automatique de mesures de nanopores
WO2019006214A1 (fr) 2017-06-29 2019-01-03 President And Fellows Of Harvard College Progression déterministe de polymères à travers un nanopore
WO2019002893A1 (fr) 2017-06-30 2019-01-03 Vib Vzw Nouveaux pores protéiques
WO2020016573A1 (fr) 2018-07-16 2020-01-23 Oxford University Innovation Limited Trémie moléculaire
WO2020183172A1 (fr) 2019-03-12 2020-09-17 Oxford Nanopore Technologies Inc. Dispositif de détection à nanopores, composants et procédé de fonctionnement
WO2023094806A1 (fr) 2021-11-29 2023-06-01 Oxford Nanopore Technologies Plc Analyse de signal de mesure de nanopore

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005124888A1 (fr) 2004-06-08 2005-12-29 President And Fellows Of Harvard College Transistor a effet de champ dans un nanotube au carbone suspendu
WO2006100484A2 (fr) 2005-03-23 2006-09-28 Isis Innovation Limited Administration de molecules dans une bicouche lipidique
WO2008102121A1 (fr) 2007-02-20 2008-08-28 Oxford Nanopore Technologies Limited Formation de bicouches lipidiques
WO2009077734A2 (fr) 2007-12-19 2009-06-25 Oxford Nanopore Technologies Limited Formation de couches de molécules amphiphiles
US9910956B2 (en) 2008-03-28 2018-03-06 Pacific Biosciences Of California, Inc. Sequencing using concatemers of copies of sense and antisense strands
WO2012164270A1 (fr) 2011-05-27 2012-12-06 Oxford Nanopore Technologies Limited Procédé de couplage
WO2013057495A2 (fr) 2011-10-21 2013-04-25 Oxford Nanopore Technologies Limited Procédé enzymatique
WO2013083983A1 (fr) 2011-12-06 2013-06-13 Cambridge Enterprise Limited Contrôle de la fonctionnalité de nanopore
WO2013098562A2 (fr) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Procédé enzymatique
WO2013098561A1 (fr) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Procédé de caractérisation d'un polynucléotide au moyen d'une hélicase xpd
WO2014013260A1 (fr) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Hélicases modifiées
WO2014013259A1 (fr) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Procédé ssb
WO2014013262A1 (fr) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Construction d'enzyme
WO2014064443A2 (fr) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Formation de groupement de membranes et appareil pour celle-ci
WO2014064444A1 (fr) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Interfaces de gouttelettes
WO2014135838A1 (fr) 2013-03-08 2014-09-12 Oxford Nanopore Technologies Limited Procédé d'immobilisation enzymatique
WO2015166276A1 (fr) 2014-05-02 2015-11-05 Oxford Nanopore Technologies Limited Procédé pour améliorer le mouvement d'un polynucléotide cible par rapport à un pore transmembranaire
WO2016009180A1 (fr) 2014-07-14 2016-01-21 Isis Innovation Limited Mesure d'analytes avec molécules de canal de membrane, et réseaux bicouche
WO2016034591A2 (fr) 2014-09-01 2016-03-10 Vib Vzw Pores mutants
WO2017149317A1 (fr) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Pore mutant
WO2017149318A1 (fr) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Pores mutants
WO2017149316A1 (fr) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Pore mutant
WO2018203084A1 (fr) 2017-05-04 2018-11-08 Oxford Nanopore Technologies Limited Analyse d'apprentissage automatique de mesures de nanopores
WO2019006214A1 (fr) 2017-06-29 2019-01-03 President And Fellows Of Harvard College Progression déterministe de polymères à travers un nanopore
WO2019002893A1 (fr) 2017-06-30 2019-01-03 Vib Vzw Nouveaux pores protéiques
WO2020016573A1 (fr) 2018-07-16 2020-01-23 Oxford University Innovation Limited Trémie moléculaire
WO2020183172A1 (fr) 2019-03-12 2020-09-17 Oxford Nanopore Technologies Inc. Dispositif de détection à nanopores, composants et procédé de fonctionnement
WO2023094806A1 (fr) 2021-11-29 2023-06-01 Oxford Nanopore Technologies Plc Analyse de signal de mesure de nanopore

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALEXANDER S. MIKHEYEV ET AL: "A first look at the Oxford Nanopore MinION sequencer", MOLECULAR ECOLOGY RESOURCES, vol. 14, no. 6, 3 September 2014 (2014-09-03), Hoboken, USA, pages 1097 - 1102, XP055492390, ISSN: 1755-098X, DOI: 10.1111/1755-0998.12324 *
IVANOV AP ET AL., NANO LETT, vol. 11, no. 1, 12 January 2011 (2011-01-12), pages 279 - 85
MATEI DAVID ET AL: "Nanocall: an open source basecaller for Oxford Nanopore sequencing data", BIOINFORMATICS, vol. 33, no. 1, 1 January 2017 (2017-01-01), GB, pages 49 - 55, XP055492402, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btw569 *
SONI GV ET AL., REV SCI INSTRUM., vol. 81, no. 1, January 2010 (2010-01-01), pages 014301

Also Published As

Publication number Publication date
GB202407055D0 (en) 2024-07-03

Similar Documents

Publication Publication Date Title
JP6226888B2 (ja) ポリマーの測定の解析
US12486534B2 (en) Analysis of a polynucleotide via a nanopore system
KR102551897B1 (ko) 폴리머의 분석
Deamer et al. Three decades of nanopore sequencing
Mohammadi et al. DNA sequencing: an overview of solid-state and biological nanopore-based methods
US9494554B2 (en) Chip set-up and high-accuracy nucleic acid sequencing
CA3223076A1 (fr) Sequencage de nanopores
US20250006308A1 (en) Nanopore measurement signal analysis
EP3847278A1 (fr) Procédé de détermination d'une séquence de polymère
WO2019121845A1 (fr) Compositions et méthodes de séquençage unidirectionnel d'acides nucléiques
WO2025238370A1 (fr) Séquençage de polynucléotides amélioré
Branton et al. The development of nanopore sequencing
EP4612327A1 (fr) Système d'analyse biochimique et procédé de commande d'un système d'analyse biochimique