[go: up one dir, main page]

WO2021251834A1 - Procédés et systèmes d'identification d'acides nucléiques - Google Patents

Procédés et systèmes d'identification d'acides nucléiques Download PDF

Info

Publication number
WO2021251834A1
WO2021251834A1 PCT/NZ2021/050089 NZ2021050089W WO2021251834A1 WO 2021251834 A1 WO2021251834 A1 WO 2021251834A1 NZ 2021050089 W NZ2021050089 W NZ 2021050089W WO 2021251834 A1 WO2021251834 A1 WO 2021251834A1
Authority
WO
WIPO (PCT)
Prior art keywords
common
nucleic acids
contributor
donor
profiles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/NZ2021/050089
Other languages
English (en)
Inventor
Maarten KRUIJVER
Duncan Taylor
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Forensic Science Sa
New Zealand Institute for Public Health and Forensic Science Ltd
Original Assignee
Forensic Science Sa
Institute of Environmental Science and Research Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Forensic Science Sa, Institute of Environmental Science and Research Ltd filed Critical Forensic Science Sa
Publication of WO2021251834A1 publication Critical patent/WO2021251834A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2539/00Reactions characterised by analysis of gene expression or genome comparison
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • the present disclosure relates to methods and systems for determining the contributor of a nucleic acid in a set of nucleic acids, for example, where there are one or more common contributors to a subset of the set of nucleic acids.
  • results of a deconvolution can be used in a number of tasks relating to profile evaluation, such as: calculating a likelihood ratio that questions the DNA contribution of one or more individuals (Taylor et al, 2013; Taylor, 2014); or intelligence gathering such as: interpreting the genotype of a contributor whose genotype is resolvable using some threshold (Bright et al, 2014); searching a database of individuals for a potential contributor to a mixed DNA profile (Bright et al, 2014); searching a database of individuals for a potential relative of a contributor to a mixed DNA profile (Taylor et al, 2014); and comparing mixtures to determine whether they possess a common contributor (Slooten, 2017; Bright et al, 2019). The comparing of mixtures has been shown to also have use in quality assurance processes (Taylor et al., 2019).
  • Another application which is a variation of the previous, is when the interest lies with a donor who has not contributed to multiple samples, but one or more common donors can be assumed to the mixture with the queried contributor.
  • An example of when this might be useful is in the taking of control swabs, i.e., a stain is sampled from an item that is likely to possess background DNA.
  • a ‘control’ swab could be taken next to the stain, and with the assumption of common contributors to the background in both samples, the donor of interest to the stain may achieve better resolution.
  • the present disclosure encompasses a method of determining a contributor of a nucleic acid in a set of nucleic acids comprising: taking a set of nucleic acids which have been individually analysed using probabilistic genotyping and combining results across the set of nucleic acids; using a likelihood ratio method or other statistical analysis to evaluate whether a contributor is a common donor across the set of nucleic acids; assuming that the contributor or other potential contributors are common between the set of nucleic acids; trialling all possible configurations of the contributor and zero or more other potential contributors as common donors across the set of nucleic acids; and interrogating one or more reference nucleic acids comprising the set of nucleic acids to identify the common donor; thereby determining the contributor of the nucleic acid in the set of nucleic acids.
  • the nucleic acid is contributed by a person of interest.
  • the nucleic acid is contributed by a perpetrator.
  • the nucleic acid is from a forensic sample.
  • the set of nucleic acids is from one or more forensic samples.
  • the likelihood ratio method or other statistical analysis is one of:
  • the set of nucleic acids includes subsets having single source nucleic acid profiles.
  • the set of nucleic acids includes subsets having mixed source nucleic acid profiles.
  • the one or more reference nucleic acids are included in a nucleic acid database.
  • the method does not include the assumption that all the contributors are the same across the set of nucleic acids.
  • the method does not include the assumption that that a mixture ratio or other parameters are constant.
  • the present disclosure encompasses a system for determining a contributor of the nucleic acid in the set of nucleic acids, the system including: a processor configured to carry out the method of any one of claims 1 to 10, and a sequencing device configured to sequence one or more nucleic acids and to provide sequence information to the processor.
  • the system includes a nucleic acid database.
  • the sequencing device is configured to carry out next generation/second generation sequencing or massively parallel sequencing.
  • the sequencing device comprises a capillary electrophoresis device.
  • the processor is configured to determine base calls and instructions to be executed by the processor.
  • the processor includes memory configured to store temporary variables or other intermediate information during execution of instructions to be executed by the processor.
  • the processor comprises a personal computer, a microprocessor, or a handheld gadget.
  • the present disclosure encompasses a system for determining a contributor of the nucleic acid in the set of nucleic acids, including: a processor configured to carry out a method comprising: taking a set of nucleic acids which have been individually analysed using probabilistic genotyping and combining results across the set of nucleic acids; using a likelihood ratio method or other statistical analysis to evaluate whether a contributor is a common donor across the set of nucleic acids; assuming that the contributor or other potential contributors are common between the set of nucleic acids; trialling all possible configurations of the contributor and zero or more other potential contributors as common donors across the set of nucleic acids; and interrogating one or more reference nucleic acids comprising the set of nucleic acids to identify the common donor; thereby determining the contributor of the nucleic acid in the set of nucleic acids; and a sequencing device configured to sequence one or more nucleic acids and to provide sequence information to the processor.
  • the system includes a nucleic acid database.
  • the sequencing device is configured to carry out next generation/second generation sequencing and/or massively parallel sequencing.
  • the sequencing device comprises a capillary electrophoresis device.
  • the processor is configured to determine base calls and instructions to be executed by the processor.
  • the processor includes memory configured to store temporary variables or other intermediate information during execution of instructions to be executed by the processor.
  • the processor is included as part of a personal computer, a microprocessor, or a handheld gadget.
  • Fig. 1A Exemplified results for common donor calculation. Shown is an example of a single locus in two DNA profiles produced from two different samples, i.e., one locus of DNA profiles from two different samples, where it is believed a common contributor exists. Below the electropherograms are the tables of genotypes sets.
  • FIG. IB Schematic representation of the six cases encountered in
  • Example 1.2 By construction, there is always a common donor in the queried contributor positions across the profiles but this is not always the Pol. Bottom and top circles are schematic only and do not correspond to contributor positions 1 and 2.
  • Example 1.3 The queried common contributor is always present in the first position (in the grey dashed box) in this diagram, but could be any combination of contributor positions in the analysis.
  • Fig. 3 Profile setup of Example 3 in a graphical layout. Each box represents a sample or reference and each arrow joins common donors. Columns of data within each box represent ‘contributor position : known donor : mixture ratio’.
  • Fig. 4 Increase in log 10 LR ( log 10 of likelihood ratio) when comparing known donors to pairs of single source profiles together, compared to when the known donor is compared to each profile separately. The increase is measured against the maximum of the two log 10 LRs (in part A, upper) and against the two separate log 10 LR values (in part B, lower). The line in A shows where the vertical axis is at 0. Likewise, plane in B indicates 0 on the vertical axis. In B, line segments are drawn vertically downwards from points with log 10 LR ⁇ 4.
  • Fig. 5 Comparison of the LR of a common contributor comparison for a pair of two person mixtures and the maximum of the two separate LRs for H 1 true; i.e., Pol is the queried common donor and the other contributors are not related.
  • Fig. 6 Comparison of the LR of a common contributor comparison for a pair of two person mixtures and the maximum of the two separate LRs, for H 2 true; i.e., there is a single common contributor who is queried and is not the Pol.
  • Fig. 7 Comparison of the LR of a common contributor comparison for a pair of two person mixtures and the maximum of the two separate LRs for H3 true; i.e., the Pol is the queried common contributor but the other contributor is also in common between the two samples.
  • Fig. 8 Comparison of the LR of a common contributor comparison for a pair of two person mixtures and the maximum of the two separate LRs for H4 true; i.e., there is a queried common contributor who is not the Pol, but the Pol is also present in one of the two samples.
  • Fig. 9 Common contributor LR compared to the maximum individual LR when compared to the Pol for the scenario shown in HI of Figure 3.
  • the size of the circles represents the magnitude of the mix-to-mix LR.
  • Fig. 10 Common contributor compared to the maximum individual LR when compared to the for the scenario shown in H2 of Figure 3.
  • the size of the circles represents the magnitude of the mix-to-mix LR.
  • Fig. 11 Common contributor LR compared to the maximum individual LR when compared to the Pol for the scenario shown in H3 of Figure 3.
  • the size of the circles represents the magnitude of the mix-to-mix LR.
  • Fig. 12. Common contributor LR compared to the maximum individual LR when compared to the Pol for the scenario shown in H 4 (first panel) and H 5 (second panel) of Figure 3. The size of the circles represents the magnitude of the mix-to-mix LR.
  • Fig. 13 Common contributor LR compared to the maximum individual LR when compared to the Pol for the scenario shown in H 6 of Figure 3. The size of the circles represents the magnitude of the mix-to-mix LR.
  • Fig. 14 Common contributor LR compared to the maximum individual LR when compared to the Pol for the scenario shown in the top left of the graph (constructed in the same form as scenarios in Figure 3). The size of the circles represents the magnitude of the mix-to-mix LR.
  • Fig. 15. Increase in log 10 LR when comparing profiles to a known donor and then providing information about one (white circle) or two (black circle) non-queried common donor(s) with another profile.
  • Fig. 16 Results of Example 2.2, considering pairwise combinations of 16 profiles that possess a common donor, K29.
  • the circles below the bars demonstrate the pairing of profiles.
  • the white columns represent the LR when given only the information of the common donorship of K29, and the black columns represent the LR when additional information on non-queried common donors is also provided.
  • the numbers above the columns are the log 10 LR when all common contributor information (regarding queried and non-queried donors) is given.
  • Fig. 17. The 9 sets of scenarios tested, with the network of profiles and reference considered in each scenario.
  • the ‘ LR (K48)’ column shows the LR for the comparison of K48 to sample 1.
  • the ‘>99%’ column shows the number of alleles in the minor component of sample 1 (i.e., corresponding to K48) that achieve a weight exceeding 0.99.
  • Fig. 18 Fifteen low level single source samples were compared against a large reference database using all possible common donor configurations. The maximum LR among all configurations is compared to the number of linked samples. When more samples are combined, the LRs, for true donors are considerably higher, while LRs for non contributors decrease.
  • the articles “a” and “an” are used to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article.
  • an element can be taken to mean one element or more than one element.
  • a range of “less than 10” can include any and all sub-ranges between (and including) the minimum value of zero and the maximum value of 10, that is, any and all sub-ranges having a minimum value of equal to or greater than zero and a maximum value of equal to or less than 10, e.g., 1 to 4.
  • the term “amplified”, when applied to a nucleic acid sequence refers to a process whereby one or more copies of a particular nucleic acid sequence is generated from a nucleic acid template sequence, preferably by the method of polymerase chain reaction.
  • DNA refers to a molecule comprised generally of the deoxyribonucleotides adenine (A), guanine (G), thymine (T) and/or cytosine (C).
  • A adenine
  • G guanine
  • T thymine
  • C cytosine
  • RNA thymine
  • U uracil
  • nucleic acid encompasses DNA or RNA, as well as various oligonucleotides and polynucleotides. Nucleic acids may also include chimeras of deoxyribonucleotides and ribonucleotides.
  • a nucleic acid can be a naturally occurring molecule, or a fragment of a naturally occurring molecule, or can be chemically synthesised, or derived by cloning according to well-known methods. Nucleic acids may be in single- stranded form, or in double- stranded form. Sense and/or anti-sense strands may be utilised.
  • Nucleic acids may contain naturally occurring rare nucleotides, synthetic nucleotides, analogue nucleotides, or modified nucleotides, or any combination of these. Modifications may be chemical, enzymatic, or metabolic in nature. One or more modified bases may be included. Nucleic acids include, but are not limited to, hnRNA, mRNA, non-coding RNA, as well as cDNA, genomic DNA and recombinant DNA, along with any wholly or partially synthesised nucleic acids.
  • DNA samples with a single unknown offender who is believed to be a contributor.
  • Pol person of interest
  • the methodology does not include the assumption that all the contributors are the same across the evidential profiles, nor is it assumed that parameters such as the mixture ratio are constant.
  • the disclosed methods can be used to compute an LR for a Pol being the common contributor to multiple stains. It is also possible to interrogate a database of reference profiles to search for the queried donor (common or non-common). As demonstrated herein, the disclosed methods can identify a queried contributor in cases where individual comparisons have limited capacity to discriminate between donors and non-donors, when some assumption(s) can be made about multiple contributors being from the same sources (queried or not).
  • the mathematical framework starts with developing a likelihood ratio that evaluates whether or not a Pol is the common donor across multiple profiles.
  • genotype g s is a multi-locus genotype (DNA profile).
  • the aim is to compare the hypotheses:
  • H 1 S is the common donor H 2 : S is not related to
  • H 1 specifies the contribution of S to the M profiles, and use the fact that (with only one common donor) the observed data across profiles is conditionally independent (on the genotype g s ) then this can be written as:
  • Hi states that there is a common donor, but not that this donor has the genotype of the Pol.
  • a likelihood ratio can be computed from a probabilistic deconvolution of the common contributor’s genotype.
  • genotype g which shows that the likelihood ratio for a genotype equals one over the random match probability times the posterior probability that the common donor has this genotype. In other words, it is the ratio of the posterior probability that the common donor has this genotype (given the mixture data) to the prior probability.
  • Expression (2) for the LR is convenient, because posterior genotype probabilities for a single common donor can be efficiently computed. Again, it is recognised that: , then use Bayes’ rule to obtain
  • This expression writes the posterior probability of the genotype as the prior probability times the likelihood for each mixture separately.
  • These likelihoods can be efficiently computed for all genotypes for each mixture. Specifically, a single pass through all genotype combinations can be made, computing, for each mixture, the term for each genotype at each locus. After multiplication of this term for each genotype across mixtures and taking the product with the priors P(g) , normalisation yields the posterior genotype probabilities for the common contributor.
  • the disclosed method can employ different types of likelihood ratio methods or other statistical analysis.
  • the method can use one of:
  • the common contributor LR is smaller for larger values of the mix-to-mix LR. This can be understood by noting that the mix-to-mix LR is large precisely if the same genotypes are well resolved across the two profiles. In that case there is comparatively little value in combining the two profiles to resolve a common contributor genotype. This may decrease the utility of conducting a mix-to-mix analysis before carrying out a common donor analysis.
  • hypotheses are formulated as:
  • This LR based on posterior genotype probabilities is convenient when comparing many reference profiles in a database search, because the posterior genotype genotypes probabilities can be computed up front after which the comparisons are computationally cheap.
  • a wildcard, ‘any’ option is considered for a contributor position in a common donor assumption. That is, a contributor position is nominated in one profile that has interest as a queried donor (although even this is not required, and all profiles could be provided a wildcard option) and then other profiles are provided with the ‘any’ option.
  • assumed common donors may be tied to specific contributor positions but can also left as ‘any’.
  • the system considers each contributor position (or that there were no common contributors) in the profiles marked with ‘any’ and generates all possible configurations (Q,C) comprising zero or more combinations of assumed common contributors paired with the queried common contributor. Neither the assumed nor the queried contributors have to span all the mixtures. These configurations are iterated over and at each iteration the queried contributor genotype is compared to all profiles on a database (or against a Pol). That is, for person S on the database, with profile g s , the
  • nuclei acid sequencing may be used to detect and characterise one or more nucleic acid samples.
  • Nucleic acid sequencing may be carried out using any sequencing method. Methods of next generation sequencing and massively parallel sequencing are particularly noted.
  • PCR amplification may be used to detect and characterise a nucleic acid.
  • PCR amplification products can be detected by a method selected from microfluidics, electrophoresis, mass spectrometry and the like known to one of skill in the art for detecting amplification products. See, e.g., US 2018/0018422.
  • PCR amplification products may be detected by fluorescent dyes conjugated to the PCR amplification primers, for example as described in PCT patent application WO 2009/059049.
  • PCR amplification products can also be detected by other techniques, including, but not limited to, the staining of amplification products, e.g., silver staining and the like.
  • detecting comprises a device, i.e., using an automated or semi- automated detecting means that can, but needs not, comprise a computer algorithm.
  • the device is portable, transportable or comprises a portable component which can be inserted into a less mobile or transportable component, e.g., residing in a laboratory, hospital or other environment in which detection of nucleic acids is conducted.
  • the detecting step is combined with or is part of at least one: amplification step, sequencing step, isolation step, or separating step.
  • microarray devices for example, comprising a data recording device such as a scanner or CCD camera.
  • sequencing devices for example, comprising at least one signal scanner and at least one graphing, recording, or readout component. Included amongst such devices are capillary electrophoresis devices, for example, comprising at least one fluorescent scanner and at least one graphing, recording, or readout component. Also included are sequencing by synthesis devices, for example, employing fluorophore-labelled, reversible-terminator nucleotides. Also included are pyrosequencing devices, for example, employing detection of pyrophosphate (PPi) release following incorporation of a nucleotide by DNA polymerase. Also included are pair-end sequencing devices, polony sequencing devices, single molecule sequencing devices, nanopore sequencing devices, and devices for sequencing by hybridization or by ligation. See, e.g., Lin, B. et al. (2008) Recent Patents on Biomedical Engineering l(l):60-67, incorporated by reference herein.
  • the detecting step is combined with an amplifying step, for example but not limited to, real-time analysis such as Q-PCR.
  • exemplary devices for performing a detecting step include the ABI PRISMTM Genetic Analyzer instrument series, the ABI PRISMTM DNA Analyzer instrument series, the ABI PRISMTM Sequence Detection Systems instrument series, and the Applied Biosystems Real-Time PCR instrument series (Applied Biosystems); and microarrays and related software such as the Applied Biosystems microarray and Applied Biosystems 1700 Chemiluminescent Microarray Analyzer and other commercially available microarray and analysis systems available from Affymetrix, Agilent, and Amersham Biosciences, among others (see, e.g., Gerry et al.
  • Exemplary software includes GeneMapperTM. Software, GeneScanTM Analysis Software, and GenotyperTM software (Applied Biosystems).
  • an amplification product can be detected and quantified based on the mass-to-charge ratio of at least a part of the amplicon (m/z).
  • a primer comprises a mass spectrometry-compatible reporter group, including without limitation, mass tags, charge tags, cleavable portions, or isotopes that are incorporated into an amplification product and can be used for mass spectrometer detection (see, e.g., Haff and Smirnov (1997) Nucl. Acids Res. 25:3749-50; and Sauer et al. (2003) Nucl. Acids Res. 31:e63).
  • An amplification product can be detected by mass spectrometry.
  • a primer comprises a restriction enzyme site, a cleavable portion, or the like, to facilitate release of a part of an amplification product for detection.
  • a multiplicity of amplification products are separated by liquid chromatography or capillary electrophoresis, subjected to ESI or to MALDI, and detected by mass spectrometry. Descriptions of mass spectrometry can be found in, among other places, The Expanding Role of Mass Spectrometry in Biotechnology, Gary Siuzdak, MCC Press, 2003.
  • detecting comprises a manual or visual readout or evaluation, or combinations thereof. In some embodiments, detecting comprises an automated or semi- automated digital or analog readout. In some embodiments, detecting comprises real-time or endpoint analysis. In some embodiments, detecting comprises a microfluidic device, including without limitation, a TaqManTM. Low Density Array (Applied Biosystems). In some embodiments, detecting comprises a real-time detection devices. Exemplary real-time devices include, the ABI PRISMTM. 7000 Sequence Detection System, the ABI PRISMTM.
  • detecting by sequencing comprises methods selected from Sanger sequencing, Maxam-Gilbert sequencing and variations thereof utilizing capillary or gel electrophoresis.
  • Exemplary capillary electrophoresis devices include, the ABI PRISMTM. 310 Genetic Analyzer, Applied Biosystems 3130 and 3130 xl Genetic Analyzers, the Applied Biosystems 3500/3500xL Genetic Analyzers, the Applied Biosystems 3730/3730x1 DNA Analyzers (Applied Biosystems), Beckman CEQ 8000 Genetic Analyzer (Beckman Coulter) and MegaBACE 4000 DNA Sequencer (GE Healthcare) as well as next-generation sequencing technologies.
  • Exemplary sequencing by synthesis devices include the Genome Analyzer System (Solexa/Illumina Inc), the Genome Sequence 20 System and the Genome Sequencer FLX Systems (454 Life Sciences/Roche Diagnostics) for pyro sequencing; sequencing by ligation using the SOLiD System (Applied Biosystems/Life Technologies); sequencing by hybridization; single molecule DNA sequencing, for example the Personal Genome Machine (Ion Torrent/Life Technologies); nanopore sequencing and polony sequencing and the like known to one of skill in the art for detecting and analysing the sequenced nucleic acid. Further descriptions of next-generation sequencing can be found in Zhang, J., J. (2011) Genet. Genomics 38(3):95-109, Metzker, M. L.
  • a computer system may be used to carry out the disclosed methods.
  • results are provided by the computer system in response to a processor executing one or more sequences of one or more instructions contained in memory.
  • Such instructions may be read into the memory from another computer-readable medium, such as a storage device.
  • Execution of the sequences of instructions contained in memory causes the processor to perform the methods described herein.
  • hard- wired circuitry may be used in place of or in combination with software instructions to implement the disclosed methods.
  • implementations of the disclosed methods are not limited to any specific combination of hardware circuitry and software.
  • An exemplary computer system includes a bus or other communication mechanism for communicating information, and a processor coupled with the bus for processing information.
  • the computer system also includes memory, which can be a random access memory (RAM) or other dynamic storage device, coupled to the bus for determining base calls, and instructions to be executed by the processor. Memory may also be used for storing temporary variables or other intermediate information during execution of instructions by the processor.
  • the computer system can further include a read only memory (ROM) or other static storage device coupled to the bus for storing static information and instructions for the processor.
  • ROM read only memory
  • a storage device such as a magnetic disk or optical disk, can be provided and coupled to the bus for storing information and instructions. See, e.g., US 2018/0018422.
  • a computer-readable medium includes any medium that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to, non-volatile medium, volatile medium, and transmission medium.
  • Non-volatile medium includes, for example, optical or magnetic disks, such as storage device.
  • Volatile medium includes dynamic memory, such as memory.
  • Transmission medium includes coaxial cables, copper wire, and fibre optics, including the wires that comprise the bus.
  • Common forms of computer-readable medium include, for example, a floppy disk, a flexible disk, hard disk, solid-state drive (SSD), magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, papertape, any other physical medium with patterns of holes, a RAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.
  • SSD solid-state drive
  • the computer system may be coupled via the bus to a display, such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user.
  • a display such as a cathode ray tube (CRT) or liquid crystal display (LCD)
  • An input device including alphanumeric and other keys, is coupled to the bus for communicating information and command selections to processor.
  • cursor control such as a mouse, a trackball or cursor direction keys for communicating direction information and command selections to processor and for controlling cursor movement on the display.
  • This input device typically has two degrees of freedom in two axes, a first axis (i.e., x) and a second axis (i.e., y), that allows the device to specify positions in a plane.
  • two or more computer systems that share one or more components of the architecture of computer can be used to perform the disclosed methods. These two or more computer systems can be in communication or networked. In various embodiments, these two or more computer systems can include a client/server or cloud computing architecture.
  • the computer system can be a standalone system.
  • the computer system can be connected to laboratory instrumentation (e.g., at least one sequencing device), or the computer system can be the computer system of a laboratory device or portable device (e.g., laboratory sequencing device or portable sequencing device).
  • Various forms of computer readable medium may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
  • the instructions may initially be carried on the magnetic disk of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to the computer system can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra red signal.
  • An infra-red detector coupled to the bus can receive the data carried in the infra red signal and place the data on the bus.
  • the bus carries the data to memory, from which processor retrieves and executes the instructions.
  • the instructions received by memory may optionally be stored on storage device either before or after execution by processor.
  • instructions configured to be executed by a processor to perform a method are stored on a non-transitory and tangible computer-readable medium.
  • the computer-readable medium can be a device that stores digital information.
  • a computer-readable medium includes a compact disc read-only memory (CD-ROM) as is known in the art for storing software.
  • CD-ROM compact disc read-only memory
  • the computer- readable medium is accessed by a processor suitable for executing instructions configured to be executed.
  • An exemplification of a system includes one or more of: at least one device, database, and/or processor.
  • the at least one device can include, for example, a capillary electrophoresis device or other sequencing device, as set out herein.
  • more than one device may be employed by the system. See, e.g., US 2018/0018422.
  • the database e.g., nucleic acid database
  • the database can be, but is not limited to, a magnetic disk drive, an electronic memory, a random access memory (RAM), a read only memory (ROM), or an optical disk drive.
  • the database may be provided as a separate device.
  • the database can be an internal memory of processor, or the at least one device.
  • the database can be directly connected to processor.
  • the database can be connected to processor through a network, or database can be connected to the at least one device directly or through a network.
  • the processor can be included with, for example, a personal computer, a microprocessor, or a handheld gadget (e.g., tablet or phone), or any other contrivance capable of sending and receiving control signals and data to and from the database and the at least one device (e.g., at least one sequencing device).
  • the processor can be provided separately.
  • the processor can also be provided as an internal processor of the database or the at least one device.
  • the at least one device analyses a nucleic acid sample and produces a first dataset for the nucleic acid sample.
  • One or more further devices can be used to analyse the nucleic acid sample and produces further dataset(s) for the nucleic acid sample.
  • the datasets can indicate the presence of various STRs or SNPs.
  • the datasets can then be compared to a database.
  • the processor can be in communication with the at least one device (e.g., at least one sequencing device) and database.
  • the processor receives the dataset(s) from the devices(s), and the disclosed methods are applied in relation to the database.
  • a computer program product includes a non-transitory and tangible computer-readable storage medium whose contents include a program with instructions being executed on a processor so as to perform the disclosed methods.
  • the methods can be performed by a system that includes one or more distinct software modules.
  • the computer program can include at least one of a measurement module, a selection module, a search module, and a calculation module.
  • the measurement module can receive the dataset(s) from the at least one device that analyses the nucleic acid sample. Alternatively, the dataset(s) can be provided to the measurement module, e.g., if the nucleic acids have been analysed previously.
  • the selection module selects a usable value from the dataset.
  • the search module searches a database that provides information for sets of nucleic acids.
  • the calculation module applies the likelihood ratio and interrogates the database to determine common donors across the database, in accordance with the methods disclosed herein.
  • Figure 1A shows an example of a single locus in two DNA profiles produced from two different samples.
  • Table 1 Terms used to compute the posterior genotype probabilities for the assumed common donor of Figure 1A.
  • the LR will be the inverse random match probability of [ C D for a matching Pol and 0 otherwise.
  • Example 1 Improved resolving of a queried common donor
  • Examples 1 and 2 required large numbers of pairs of mixtures with and without common donors. Artificially created mixtures were created by adding up the peak heights of real single source GlobalFiler profiles from the PROVEDIt dataset. Profiles originated from a donation of 0.03 ng to 7.5 ng of DNA. The profiles used originated from a total of six different donors. Examples 3 and 4 make use of GlobalFiler profiles from the PROVEDIt dataset.
  • Example 1.1 Single source profiles Methodology for single source profiles
  • H 1a - The Pol is the source of the DNA
  • H 1b The Pol is the common source of DNA to both profiles
  • H 2b An unrelated person is the donor of DNA to both profiles.
  • Figure 4 part A and B respectively.
  • Figure 4 shows that the common donor assumption has the most effect when: at least one of the two profiles has a reasonable amount of information; neither of the two profiles provides all possible information (a complete profile).
  • Example 1.2 Two person mixtures Methodology for two person mixtures
  • the resultant LR is called LR mix-mix k l , for the comparison of mixture 1
  • the common donor LR for a person compared to mixtures 1 and 2 was calculated (as per the equation given in section (2)) by: where j and k are the compared contributor positions of mixture A and B respectively and have been chosen to maximise the separate LRs in the numerator.
  • the common donor LR calculated represents the propositions:
  • Figure 6 shows that the power to exclude non-contributors further increases by combining the two mixtures.
  • the few LRs from individual comparisons giving support for inclusion decreased when the two samples were used jointly.
  • the fraction of non-zero LRs was already very low for the separate comparisons (2%), this decreased further (to 0.017%) in common donor comparisons.
  • Example 1.3 Three person profiles Methodology for three person profiles
  • Figures 9-13 show a thinned selection for the results of Example 1.3. These are filtered to exclude profiles where the known contributors give exclusionary LRs, to their individual mixtures (likely caused by the method of artificial mixture construction).
  • Figure 12 shows the situation where two profiles are being considered as having a common donor to one specific contributor position pair, and that this pair is a common donor, but not the Pol. However, in Figure 12 there are more than the one common donor to the profile and the Pol is present in one (HA true, Figure 12 first panel) or both (H 5 true, Figure 12 second panel) of the mixtures.
  • Figure 2 for a schematic representation of the propositions.
  • Figure 12 lower shows that the result of a common donor LR calculation can, in these circumstances, produce a wide variety of outcomes. The result is dependent on the make-up of the mixture and how prominently the Pol’s genotypes are spread through other contributor positions in one or both mixture deconvolutions.
  • the general trend is that the common donor LR correctly decreases compared to the maximum individual LR , but there are a number of instances where it is increased.
  • Figures 10 and 13 demonstrate the expected behaviour, i.e., that there was mostly support for the exclusion of the Pol from the profiles individually, and that this support for exclusion was increased when considering the common donor scenario propositions.
  • LR common i.e., that there is only one common donor, and all other non-common components of all mixtures are unrelated to the Pol
  • the two components being assumed to be from a common donor are from a common donor.
  • Figure 14 shows the results of LRs, when the assumption of any common donor is false, i.e., the two mixtures do not contain any common donors. The scenario was considered whereby the Pol has contributed DNA to one of the two samples and show the results of the LR ommon to the maximum LR of the two individual profiles compared to the Pol.
  • the Pol is likely to be being fit into minor components in other mixtures where dropout is invoked at many (if not all) loci.
  • This result gives some confidence that if a Pol is a donor to one sample, and a common donor analysis is carried out assuming them to another sample to which they are not a contributor, then the outcome is not likely to be a vast inflation in the individual LR to the common donor LR. In fact the effect acts in the opposite, more conservative, direction.
  • Example 2.1 Additional resolution of a queried non-common donor when assuming the presence of a common donor
  • H 1a D and one mixture has DNA donated by the Pol
  • H 2a and neither mixture has DNA donated by the Pol, when it is known that exactly one contributor (the Pol) is in common (and has donated DNA to position i in profile 1 and position j in profile 2) and: and one mixture has DNA donated by the Pol, and neither mixture has DNA donated by the Pol, when it is known that exactly two contributors (one of which being the Pol) are in common (one has donated DNA to position i in profile 1 and position j in profile 2 and the other has donated DNA to position k in profile 1 and position l in profile 2) and i ⁇ k, j ⁇ l .
  • the DNA originates from 3 unknown individuals, i.e., without the knowledge of the existence of the second profile.
  • Figure 15 shows the results of Example 2.1.
  • the results of this experiment show a similar trend to the single source profiles in Example 1.1, i.e., that when the queried contributor is already almost resolved, there is little to no additional information that can be provided by additional profiles. This is seen in Figure 15 by the only modest changes in LR for the points with an original log 10 LR > 25. Also, in common with the results from Example 1.1 is that when the donors to contributor positions with very little genotype resolution are provided information about non-queried common donors there is again only a modest change in LR and not always in positive direction.
  • H 1b and one mixture has DNA donated by the Pol
  • H 2b and neither mixture has DNA donated by the Pol, where i ⁇ k, j ⁇ l.
  • H 1 propositions i.e., H 1a , H 1b , and
  • Example 2.2 Additional resolution of a queried common donor profile when assuming other non-queried common donor (s)
  • Table 1 shows the details of the DNA profiles chosen, the assigned position based on the LR and the known position.
  • Table 1 Details of the samples that were included in Example 2.2. n/a signifies a position to which a reference was not assigned. This is based on the LR as explained in section (2), above.
  • H 1a K29 is a donor of DNA in each of the positions in Table 1 and all other donors are unrelated and unshared between mixtures, H 2a All donors in all profiles are unrelated and unshared.
  • H 1b K29 is a donor of DNA in each of the positions in Table 1 and all other donors are shared between mixtures as indicated in Table 1,
  • H 2b The shared donor to all mixtures is someone unrelated to K29 and other donors are shared between mixtures as indicated in Table 1.
  • Figure 16 shows the result of Example 2.2.
  • the circles below the bar plot show the combination of profiles, from the 16 individual profiles on the left, being combined pair-wise to eventually becoming one analysis that includes all 16 profiles at the end.
  • the bars show the LR when compared to K29 using either the single source propositions (first 16) or the common donor propositions either without providing information about other common donors in the mixture (represented as the white bars) or additional discrimination power of the LR when the other common donor information was provided (black part of the bar).
  • the numerical value above the bar in Figure 16 shows the log 10 LR obtained by comparison to K29 when non-queried common donor information was also provided. When all 16 profiles were used together the queried common donor LR almost reached the log of the inverse of the match probability (dashed line in Figure 16).
  • Figure 16 illustrates the power provided by combining multiple profiles, even if they individually may not provide much support for the inclusion of an individual (or even mild support for their exclusion). However, when profiles are combined, which all, or mostly, support exclusion this will tend to increase the support for exclusion in the common donor LR. An example if this can be seen in the combination of the first two pairs (resulting in common donor log 10 LR s of -0.11 and -1.87) to produce the first of the four- profile results (resulting in common donor log 10 LR of -2.02).
  • the black portions of the columns in Figure 16 show the additional discrimination power when information about the other common donors to the profiles is included in the common donor LR calculation.
  • the additional power that providing the non-queried common contributor information provides is very minor, particularly when the Pol is strongly included in one or more of the profiles being combined.
  • the additional power provided is substantial, particularly when the Pol’s inclusion to the individual profiles is weak.
  • Example 3 Assessment of a large network of inter-related profiles
  • Sample 1 A swab of a set of digital scales - two-person (4:1)
  • Provedlt sample B01_RD14-0003-31_32-l;l-Mla-0.25GF-Q1.2 [00201] None of the profiles yielded a profile that could interpreted or uploaded to the DNA database (the greatest number of alleles that can be interpreted with a weight threshold of 0.99 is six, which corresponds to person 50 from sample 2).
  • a graphical layout of the profiles and references is shown in Figure 3, showing links between all common contributors, and references. It was presumed that references for individuals K29 and K32 were available, and that there was interest in the minor contributor to sample 1 (person K48). Individuals K29, K32 and K48 are also shown in the network in Figure 3.
  • the LR comparing K48 to sample 1 and a determination of the number of uploadable alleles is carried out for various combinations of information in order to determine when the most information is obtained.
  • the scenario outlined in set 1 corresponds to the typical analysis of sample 1 , in isolation to any other profile information in the case. It was noted that only 3 alleles achieve weights >0.99 and which would not meet most upload criteria. The LR represents a lower bound on what would be expected when more information is provided.
  • the scenario in set 2 corresponds to the situation where the most information possible would be expected. This would occur if the reference of the other contributor to sample 1 (i.e., K49) was available and it was possible to assume them as a contributor to then interpret the minor.
  • Adding in information to increase resolution can either be through adding profiles, or by adding common donor information
  • Example 4 Assessment of a series of low level single source profiles
  • LR was assigned for the hypotheses: where S is replaced with each database profile. It was noted that the empty configuration was omitted, because the LR would be 1 for each database profile.
  • the database comprised one million GlobalFiler profiles sampled according to allele frequencies (Moretti et al, 2016) as well as 50 references from the PROVEDIt dataset.
  • Figure 18 shows that not only the support for the presence of true contributors increases when more samples are considered simultaneously, but also the power to eliminate non-contributors increases. This is an especially encouraging result, since the large number of common donor configurations that were considered led to a huge number of comparisons to non-contributors. Nevertheless, it was noted that there is some possibility for adventitious inclusions. For instance, the maximum LR for a non-contributor is slightly higher when pairs of samples are considered versus when samples are considered individually.
  • Example 1.1 compared the likelihood ratios of separate comparisons to pairs of single source profiles to the likelihood ratio obtained using a common donor analysis. Particularly large increases in evidential value were seen for the pairs of profiles that were at least weakly informative separately. For these single source profiles, the additional resolution comes from a few mechanisms. Most notably, if the two profiles exhibit dropout at different loci, then the joint analysis allows for a better reconstruction of the genotype of the donor. Another effect is that it is better possible to distinguish homozygous genotypes from heterozygous genotypes with dropout for low level samples if repeat observations are available.
  • Examples 1.2 and 1.3 considered DNA mixtures. The results showed, again, that for true contributors the common donor LR is almost always higher than the maximum of the separate LRs. Likewise, a correct common donor assumption allowed for greater exclusionary potential of non-contributors. Example 2 considered the effect of including an assumption about a non-queried common donor. For some cases the increase in likelihood ratio was dramatic.
  • An illustrative example of a wider network of samples was presented in Example 3. A Pol was compared to the minor component of a low level mixture. An increase in the LR was obtained by adding more information through assumed common donors with other samples. The amount of evidential value that could be added was limited by the alleles of the Pol that were present in the low level mixture (had not dropped out). The addition of other samples led to higher confidence in the deconvolution of the mixture to the extent that it was feasible for a classical database upload of resolved alleles.
  • the present results demonstrate various ways in which a common donor assumption can be utilised. It may be the case that there are multiple samples in a case, which individually have little power to discriminate the true DNA donor from non-donors, but when considered together have a much-improved ability to do so. It may also be that there is a donor who is not believed to be common to multiple samples, but other donors are. Again, providing this type of information to the analysis can provide much greater ability to discriminate contributors from non-contributors.
  • a difficulty that may be faced when conducting a common contributor analysis is to decide which contributor are assumed to be in common.
  • Assessment may be more challenging when there are a group of profiles where there are potentially multiple different common contributors at play.
  • Example 1.3 The results in Example 1.3 show that there are situations where a common contributor LR can be potentially misleading. This typically would occur under the somewhat artificial scenario of the Pol being a contributor to the mixtures, but not in the positions that are being posited as the common contributor. Typically, this will have its greatest effect when there is ability for the genotypes of the contributors to be spread across multiple contributor positions. The results are believed to be pointing to the right conclusion, but can be enhanced as to discrimination power they could if the correct contributor positions were chosen. Again, the use of a wildcard may be helpful. The use of common donor analyses in forensic science and conclusions
  • This information may be to provide greater resolution in a common donor to many samples, or it may be to provide resolution in a donor to just one sample, but using the information that other, non-queried, contributors to the profile are in common between related mixtures.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Bioethics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne des procédés et des systèmes pour déterminer le contributeur d'un acide nucléique dans un ensemble d'acides nucléiques. L'invention concerne spécifiquement des procédés et des systèmes pour déterminer si une personne d'intérêt a apporté un acide nucléique à un ensemble d'acides nucléiques, par exemple, lorsqu'un ou plusieurs contributeurs communs sont supposés, et lorsque toutes les combinaisons de donneurs communs et de donneurs questionnés sont testées.
PCT/NZ2021/050089 2020-06-10 2021-06-10 Procédés et systèmes d'identification d'acides nucléiques Ceased WO2021251834A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063037475P 2020-06-10 2020-06-10
US63/037,475 2020-06-10

Publications (1)

Publication Number Publication Date
WO2021251834A1 true WO2021251834A1 (fr) 2021-12-16

Family

ID=78846310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NZ2021/050089 Ceased WO2021251834A1 (fr) 2020-06-10 2021-06-10 Procédés et systèmes d'identification d'acides nucléiques

Country Status (1)

Country Link
WO (1) WO2021251834A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119028440A (zh) * 2024-08-22 2024-11-26 四川大学 一种基于str基因分型图谱的混合dna证据分析方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020009725A1 (en) * 1999-12-22 2002-01-24 Peter Gill Identification
WO2003056035A1 (fr) * 2001-12-21 2003-07-10 The Secretary Of State For The Home Department Ameliorations apportees a l'interpretation de l'adn
US20090132173A1 (en) * 2007-11-19 2009-05-21 Forensic Science Service Ltd Computing likelihood ratios using peak heights
WO2011110853A1 (fr) * 2010-03-10 2011-09-15 Forensic Science Service Limited Améliorations concernant la prise en considération de preuves
US20170206311A1 (en) * 2008-07-23 2017-07-20 The Translational Genomics Research Institute Method of characterizing sequences from genetic material samples
US20190102517A1 (en) * 2017-10-01 2019-04-04 Syracuse University Hierarchical optimized detection of relatives

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020009725A1 (en) * 1999-12-22 2002-01-24 Peter Gill Identification
WO2003056035A1 (fr) * 2001-12-21 2003-07-10 The Secretary Of State For The Home Department Ameliorations apportees a l'interpretation de l'adn
US20090132173A1 (en) * 2007-11-19 2009-05-21 Forensic Science Service Ltd Computing likelihood ratios using peak heights
US20170206311A1 (en) * 2008-07-23 2017-07-20 The Translational Genomics Research Institute Method of characterizing sequences from genetic material samples
WO2011110853A1 (fr) * 2010-03-10 2011-09-15 Forensic Science Service Limited Améliorations concernant la prise en considération de preuves
US20190102517A1 (en) * 2017-10-01 2019-04-04 Syracuse University Hierarchical optimized detection of relatives

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119028440A (zh) * 2024-08-22 2024-11-26 四川大学 一种基于str基因分型图谱的混合dna证据分析方法

Similar Documents

Publication Publication Date Title
Rokas et al. Genome-scale approaches to resolving incongruence in molecular phylogenies
US20180018422A1 (en) Systems and methods for nucleic acid-based identification
Neidhart et al. Adaptation in tunably rugged fitness landscapes: the rough Mount Fuji model
Bloom et al. Finding the sources of missing heritability in a yeast cross
Si et al. Model-based clustering for RNA-seq data
Rau et al. Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models
US12065696B2 (en) Systems and methods for genetic identification and analysis
EP1229135A2 (fr) Méthode et système pour l'analyse de mélange d'ADN
US20140052383A1 (en) Systems and methods for identifying a contributor's str genotype based on a dna sample having multiple contributors
Dueck et al. Assessing characteristics of RNA amplification methods for single cell RNA sequencing
US20190177719A1 (en) Method and System for Generating and Comparing Reduced Genome Data Sets
Collet et al. Mutational pleiotropy and the strength of stabilizing selection within and between functional modules of gene expression
Michaut et al. Multiple genetic interaction experiments provide complementary information useful for gene function prediction
Weber et al. Identification of gene regulation models from single-cell data
Grover et al. Searching microsatellites in DNA sequences: approaches used and tools developed
WO2021251834A1 (fr) Procédés et systèmes d'identification d'acides nucléiques
Samyak et al. Statistical summaries of unlabelled evolutionary trees
Agudo et al. A comparison of likelihood ratios calculated from surface DNA mixtures using MPS and CE Technologies
Graversen Statistical and computational methodology for the analysis of forensic DNA mixtures with artefacts
Chen et al. Initial large-scale exploration of protein-protein interactions in human brain
Piry et al. High throughput amplicon sequencing to assess within-and between-host genetic diversity in plant viruses
Patil et al. CoalQC-Quality control while inferring demographic histories from genomic data: Application to forest tree genomes
Oloomi The impact of multi-mappings in short read mapping
Moutinho et al. The silent impact: codon usage bias and protein evolution in bacteria
Singh Probabilistic Approach to Understand Errors in Sequencing and its based Applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21822292

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21822292

Country of ref document: EP

Kind code of ref document: A1