IL310301B1 - Methods for non-invasive prenatal testing of expansion mutations - Google Patents
Methods for non-invasive prenatal testing of expansion mutationsInfo
- Publication number
- IL310301B1 IL310301B1 IL310301A IL31030124A IL310301B1 IL 310301 B1 IL310301 B1 IL 310301B1 IL 310301 A IL310301 A IL 310301A IL 31030124 A IL31030124 A IL 31030124A IL 310301 B1 IL310301 B1 IL 310301B1
- Authority
- IL
- Israel
- Prior art keywords
- reads
- read
- fetus
- sequencing
- maternal
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2500/00—Analytical methods involving nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2545/00—Reactions characterised by their quantitative nature
- C12Q2545/10—Reactions characterised by their quantitative nature the purpose being quantitative analysis
- C12Q2545/101—Reactions characterised by their quantitative nature the purpose being quantitative analysis with an internal standard/control
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Public Health (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Organic Chemistry (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Description
METHODS FOR NON-INVASIVE PRENATAL TESTING OF EXPANSION MUTATIONS
TECHNOLOGICAL FIELD
The present disclosure relates to the field of prenatal genetic analysis.
REFERENCES: Dashnow H, Lek M, Phipson B, Halman A, Sadedin S, Lonsdale A, et al. STRetch: detecting and discovering pathogenic short tandem repeat expansions. Genome Biology. 2018;19:121. Dolzhenko et al., ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 2019; 35(22): 4754-4756. Liautard-Haag C, Durif G, VanGoethem C, Baux D, Louis A, Cayrefourcq L, et al. Noninvasive prenatal diagnosis of genetic diseases induced by triplet repeat expansion by linked read haplotyping and Bayesian approach. Sci Rep. 2022;12:11423. Rabinowitz T, Polsky A, Golan D, Danilevsky A, Shapira G, Raff C, et al. Bayesian-based noninvasive prenatal diagnosis of single-gene disorders. Genome Res. 2019. https://doi.org/10.1101/gr.235796.118. Rabinowitz T, Deri-Rozov S, Shomron N. Improved noninvasive fetal variant calling using standardized benchmarking approaches. Computational and Structural Biotechnology Journal. 2021;19:509–17. US2021/0340601. Yu SCY, Jiang P, Peng W, Cheng SH, Cheung YTT, Tse OYO, et al. Single-molecule sequencing reveals a large population of long cell-free DNA molecules in maternal plasma. Proceedings of the National Academy of Sciences. 2021;118:e2114937118.
BACKGROUND In recent years, more and more genetic mutations are clinically tested via Noninvasive Prenatal Testing (NIPT), enabling pregnant women to identify genetic malformations without the risks of invasive procedures. Rabinowitz et al (2019) describe a genome wide NIPT approach, termed noninvasive prenatal variant calling. Using Hoobari, the first noninvasive fetal variant caller, they were able to genotype all fetal positions, including biparental loci and indels (US2021/0340601). Hoobari, is a tool which employs a Bayesian algorithm to predict the inheritance of monogenic diseases, irrespective of their mode of inheritance or parental origin (Rabinowitz et al. 2019; US2021/0340601; Rabinowitz et al. 2021) . This tool enables the estimation of the likelihood that each cfDNA fragment is of fetal origin by considering its length. Hoobari can detect mutations caused by single nucleotide polymorphisms (SNPs), or small indels - insertions and deletions of bases in the genome. Short tandem repeat (STR) mutations arise when an error occurs during DNA replication, leading to an enlargement in the number of repetitions of common trinucleotide sequences like CAG, GCG, and others. In each of the diseases involving this mechanism, there are two thresholds: one threshold which separates a normal allele from a pre-mutation, and a second threshold which separates a pre-mutation from a full mutation. This second threshold concerns the number of repeats above which the phenotype is considered a disease. These thresholds differ between diseases. Although there are invasive clinical tests for analyzing STR (based on chorionic villus sampling or amniotic fluid), no noninvasive clinical test has been proven successful yet for analyzing STR. STRs can cause expansion mutations. Detecting expansion mutations is a crucial goal because over 40 diseases, primarily affecting the nervous system, stem from expansions of simple sequence repeats scattered across the human genome, e.g., Fragile X syndrome.
Liaotard-Hagg et al describe an indirect method which used linked-reads to sequence and phase (i.e., to attain haplotype information) the parents’ genomic sequence data, using markers present around expansion mutations. As a next step the authors employed Hoobari to predict the inherited haplotype.
Yu et al, describe the sequencing of long cell-free DNA molecules using PacBio long read sequencing, and the deduction of fetal haplotype (the inherited allele).
Both Liaotard-Hagg et al and Yu et al deduced the inherited allele, but they did not predict whether an expansion has occurred, and whether it caused a full mutation.
GENERAL DESCRIPTION In a first of its aspects, the present invention provides a method for genotyping a fetus, comprising: a. receiving reads of sequencing data of (i) maternal plasma cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting the fetus; b. identifying potential genomic sites at which the fetus may have a short tandem repeat (STR) variant; c. for each of the potential genomic sites, determining a probability that the fetus has the STR variant; d. deducing maternal and optionally paternal STR alleles; e. processing cfDNA reads in the region of the gene potentially comprising the STR to identify read categories, wherein said read categories comprise spanning reads, partial reads (including indicative and non-indicative reads) , and clean reads; f. Calculating the expected ratio between the read categories, thereby obtaining a calculated ratio; g. determining the most probable allele inherited by the fetus by: i. Measuring the observed ratio between the read categories; and ii. Using the calculated ratio and the observed ratio to distinguish a normal allele from an allele with a premutation or a full mutation, and to distinguish an allele with a premutation from an allele with a full mutation; Or iii. Applying a Bayesian algorithm; thereby genotyping said fetus. In one embodiment, said reads of sequencing data are short reads.
In one embodiment, said reads of sequencing data are long reads. In one embodiment, said reads of sequencing data comprise short reads and long reads. In some embodiments, one of or both gDNA sequencing data and the cfDNA sequencing data is obtained by a method selected from a group consisting of whole genome sequencing (WGS), whole exome sequencing (WES), next generation sequencing (NGS), targeted sequencing, panel sequencing, gene sequencing, long-read genome sequencing, paired-end sequencing, single end sequencing, and amplicon sequencing. In one embodiment, WGS or WES data is obtained by deep sequencing. In one embodiment, determining said probability is based on at least one Sequence Alignment Map (SAM) parameter. In one embodiment, determining said probability is based at least on an observed template length. In one embodiment, determining said probability comprises calculating a total fetal fraction. In one embodiment, the method further comprises constructing a fetal size distribution and a maternal size distribution, wherein said determining the probability of step c comprises binning said fetal size distribution and calculating a fetal fraction for each fragment size bin, and calculating, for at least one size and at least one fragment at said at least one site, a probability that said fragment is fetal, based on a fetal fraction of a respective fragment size bin to which said fragment belongs. In one embodiment, determining said probability comprises applying a Bayesian procedure. In one embodiment, said Bayesian procedure comprises prior probabilities calculated using sequencing data of at least one of said parents. In one embodiment, the method further comprises recalibration output of said Bayesian procedure using machine learning. In one embodiment, said determining the probability is performed using fetal variant calling. In one embodiment, said determining the probability is performed using the Hoobari algorithm. In one embodiment, said determining the probability comprises fragmentomics-based probability that the sequence read is of fetal origin.
In one embodiment, the method comprises extracting fragmentomic features for each cfDNA read identified as overlapping a potential genomic site where the fetus may have an STR variant. In some embodiments, said fragmentomic features comprise one or more of read quality mapping, read base qualities, fragment length, short/long read ratio, end motifs, cleavage patterns around methylation sites, read endpoint preferred end, DNA/accessibility/nucleosome, distance to nearest nucleosome, transcription factor binding sites, regional fetal fraction, regional sequence composition, read sequence composition, and number of sequence errors in the read. In one embodiment, determining the probability comprises multiplying the variant calling-based probability with the fragmentomics-based probability to obtain a calculated joint probability. In some embodiments, said step of deducing maternal and optionally paternal STR alleles comprises applying a bioinformatic tool selected from a group consisting of STRetch, GangSTR, ExpansionHunter, HipSTR, Expansion Hunter De Novo (EHdn), STRling, RepeatSeq, LobSTR, adVNTR, and TredParse. In some embodiments, said STR variant is indicative of a genetic disease selected from a group consisting of Fragile X Syndrome, Fragile X-associated Tremor/Ataxia Syndrome (FXTAS), X-linked Intellectual Disability, Huntington's Disease, Myotonic Dystrophy, Spinocerebellar Ataxias (SCAs), Friedreich's Ataxia, Machado-Joseph Disease, Spinocerebellar Ataxia Type 10 (SCA10), Spinocerebellar Ataxia Type (SCA12), Spinocerebellar Ataxia Type 17 (SCA17), Dentatorubral-Pallidoluysian Atrophy (DRPLA), Spinocerebellar Ataxia Type 8 (SCA8), Spinocerebellar Ataxia Type (SCA31), and Fragile XE Syndrome. In another aspect, the present invention provides a method for diagnosing a genetic disease associated with, or caused by, short tandem repeats (STR) in a fetus, comprising: a. receiving reads of sequencing data of (i) maternal plasma cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting the fetus; b. identifying potential genomic sites at which the fetus may have a short tandem repeat (STR) variant; c. for each of the potential genomic sites, determining a probability that the fetus has the STR variant;
d. deducing maternal and optionally paternal STR alleles; e. processing cfDNA reads in the region of the gene potentially comprising the STR to identify read categories, wherein said read categories comprise spanning reads, partial reads (including indicative and non-indicative reads), and clean reads; f. Calculating the expected ratio between the read categories, thereby obtaining a calculated ratio; g. determining the most probable allele inherited by the fetus by: i) Measuring the observed ratio between the read categories; and ii) Using the calculated ratio and the observed ratio to distinguish a normal allele from an allele with a premutation or a full mutation, and to distinguish an allele with a premutation from an allele with a full mutation; Or iii) Applying a Bayesian algorithm; thereby diagnosing said fetus. In some embodiments, said genetic disease associated with, or caused by, short tandem repeats (STR) is selected form a group consisting of Fragile X Syndrome, Fragile X-associated Tremor/Ataxia Syndrome (FXTAS), X-linked Intellectual Disability, Huntington's Disease, Myotonic Dystrophy, Spinocerebellar Ataxias (SCAs), Friedreich's Ataxia, Machado-Joseph Disease, Spinocerebellar Ataxia Type 10 (SCA10), Spinocerebellar Ataxia Type 12 (SCA12), Spinocerebellar Ataxia Type 17 (SCA17), Dentatorubral-Pallidoluysian Atrophy (DRPLA), Spinocerebellar Ataxia Type 8 (SCA8), Spinocerebellar Ataxia Type 31 (SCA31), and Fragile XE Syndrome. In another aspect, the present invention provides a method for diagnosing Fragile X syndrome in a fetus, comprising: a. receiving reads of sequencing data of (i) maternal plasma cell-free DNA (cfDNA), and (ii) maternal genomic DNA (gDNA); b. deducing maternal FMR1 gene alleles; c. calculating the fetal fraction within the cfDNA;
d. processing cfDNA reads in the region of the FMR1 gene to identify read categories, wherein said read categories comprise spanning reads, partial reads (including indicative and non-indicative reads), and clean reads; e. Calculating the expected ratio between the read categories, thereby obtaining a calculated ratio; f. determining the most probable allele inherited by the fetus by: i. Measuring the observed ratio between the read categories; and ii. Using the calculated ratio and the observed ratio to distinguish a normal allele from an allele with a premutation or a full mutation, and to distinguish an allele with a premutation from an allele with a full mutation; Or iii. Applying a Bayesian algorithm; wherein an imbalance towards spanning reads indicates that the fetus has a high likelihood of being unaffected with Fragile X syndrome, an imbalance towards indicative reads indicates that the fetus has a high likelihood of having a premutation or full mutation in the FMR1 gene, and an imbalance towards indicative reads and a ratio between clean reads and indicative reads that is higher than the calculated ratio indicates that the fetus has a high likelihood of having a full mutation in the FMR1 gene leading to Fragile X syndrome. In one embodiment, said step of calculating the expected ratio between the read categories comprises normalizing the read counts. In some embodiments, said normalizing is performed by one or more of: a. In sample normalization, b. Between samples normalization, c. Comparison of fetal-enriched and maternal-enriched reads, d. Haplotype comparison, and e. Correction for paired end NGS technology. In one embodiment, said between samples normalization comprises creating a panel-of-normals (PON).
In another aspect, the present invention provides a computer software product, comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a data processor, configure the data processor to (1) receive reads of sequencing data of (i) maternal cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting a fetus, and to (2) execute the method of the invention. In another aspect, the present invention provides a system for genotyping a fetus, comprising: an input utility for receiving reads of sequencing data of (i) maternal cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting a fetus; and a data processor configured for analyzing said data for executing the method of the invention.
BRIEF DESCRIPTION OF THE DRAWINGSFor better understanding the subject matter that is disclosed herein and to exemplify how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which: Figure 1is an outline exemplifying a method suitable for fetal genotyping, according to various exemplary embodiments of the present invention. Figure 2 is an exemplary STR-containing gene and its aligned reads categories. Figure 3is an outline exemplifying a method suitable for fetal genotyping, specifically for identification of expansion mutations associated with Fragile-X syndrome.
DETAILED DESCRIPTION OF EMBODIMENTSThe present invention provides methods and systems for predicting the risk of inheritance of an expansion mutation, for distinguishing pre-mutated fetuses from healthy fetuses with high probability and estimating the risk of inheriting a full mutation. The methods and systems of the invention apply to all types of expansion mutations. A non-limiting example of the methods and systems of the invention concerns the identification of expansion mutations associated with Fragile-X syndrome. In accordance with the invention, trinucleotide expansion mutations of maternal origin in the fetus are detected via NIPT.
The method of the invention can be performed with long-read and/or short-read NGS technology. It is noted that although linked-read sequencing, or long cfDNA molecules may be used in accordance with the method of the invention, there is no requirement for linked-read sequencing, or for long cfDNA molecules. The method of the invention is direct, i.e., it detects the expansion itself rather than markers around it. The method is optionally combined with phasing of the parental genome to increase the accuracy of the predictions.
As used herein the term “premutation” (PM) refers to a range of repeats that undergo expansion but do not cause an underlying disease. For example, in FMR1 gene (the gene responsible for the Fragile X syndrome), 5-44 is the normal repeat range, 45-is a gray zone (also termed intermediate allele), 55-200 is considered a pre-mutation. Between 55-200 repeats, there is an expansion in the next generation, which by itself does not cause Fragile X syndrome although it can cause other phenotypes in females. It is therefore termed a premutation. The FMR1 gene in its premutation (PM) state has been linked to a range of clinical and subclinical phenotypes among FMR1 PM carriers, including some subclinical traits associated with autism spectrum disorder (ASD). When there are over 200 repeats, it is considered a full mutation.
As used herein the term “full mutation” refers to an expansion of the repeat region to the extent that it exceeds the threshold defining a disease. For example, in Fragile X syndrome, this corresponds to 200 repeats. Short tandem repeat mutations are often characterized by genotype-phenotype correlation, so within a certain range of repeats, the more repeats there are typically the more severe the phenotype, or the greater the likelihood of a full mutation.
As used herein the term “anticipation” refers to a phenomenon where the disease appears at an earlier age each generation, due to the expansion growing larger.
Accordingly, in an aspect, the present invention provides a method of genotyping a fetus, comprising: a. receiving reads of sequencing data of (i) maternal plasma cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting the fetus;
b. identifying potential genomic sites at which the fetus may have a short tandem repeat (STR) variant; c. for each of the potential genomic sites, determining a probability that the fetus has the STR variant; d. Deducing maternal and optionally paternal STR alleles; e. Processing cfDNA reads in the region of the gene potentially comprising the STR to identify read categories, wherein said read categories comprise spanning reads, partial reads, and clean reads; f. Calculating the expected ratio between the read categories, thereby obtaining a calculated ratio; g. determining the most probable allele inherited by the fetus by: i. Measuring the observed ratio between the read categories; and ii. Using the calculated ratio and the observed ratio to distinguish a normal allele from an allele with a premutation or a full mutation, and to distinguish an allele with a premutation from an allele with a full mutation; Or iii. Applying a Bayesian algorithm; thereby genotyping said fetus. Figure 1 shows an exemplary outline of an embodiment of the method of the invention for fetal genotyping. First, parental (maternal and optionally paternal) and cfDNA BAM files containing parental and fetal-maternal reads, respectively, and the FF (e.g., 10% as shown in Figure 1) are obtained (steps (a) - (c) in the method described above). From the parental BAMs the parental alleles are deduced (step (d)), and from the cfDNA BAM the reads that correspond to the inquired gene are processed (step (e)). The expected ratio between spanning reads that originate from a known parental normal allele to the indicative reads that originate from a pre-mutation is calculated based on the FF. Expected ratios are corrected for biases and errors that are the result of biological, technological or any other systemic bias and errors (step (f)). Based on the observed ratios, it is determined whether the fetus inherited the normal allele or not; Based on the ratio of clean reads to other reads (e.g., indicative reads) the probability of a full mutation is predicted (step (g) (i) and (ii)).
Cell-free DNA (cfDNA) also referred to as “circulating free DNA” are DNA fragments existing outside of cells in vivo circulating in body fluids such as blood plasma. The fragments of cfDNA typically have lengths ranging from about 150 to 200 base pairs (bp), and averaging about 170 bp, which presumably relates to the length of a DNA stretch wrapped around a nucleosome. During pregnancy, cell-free fetal DNA can be found circulating in maternal plasma. Thus, the cfDNA in maternal plasma is a mixture of both maternal and fetal DNA; both the total amount of cfDNA, and the fraction of fetal DNA within it, increases throughout pregnancy. The term cfDNA also refers to fragments of DNA that have been obtained from the in vivo extracellular sources and separated, isolated, or otherwise manipulated in vitro. cfDNA can be obtained by extracting DNA from blood plasma after removal of intact cells. Methods for extracting cfDNA are well known in the art, for example, as shown in the Examples below. The term “genomic DNA” or “gDNA” herein refers to DNA existing in a cell in vivo and containing a complete genome of the cell or organism. The term also refers to DNA that has been obtained from the in vivo cell and separated, isolated, or otherwise manipulated in vitro. Typically, the cell is isolated prior to being subjected to lysis to produce in vitro cellular DNA. The term gDNA as used herein does not include cfDNA. The maternal genomic DNA (gDNA) data, maternal cell-free DNA (cfDNA) data, and optionally, the paternal gDNA data are obtained by a sequencing method including, but not limited to, deep whole genome sequencing (WGS), whole exome sequencing (WES), next generation sequencing (NGS), targeted sequencing, panel sequencing, gene sequencing, long-read genome sequencing, paired-end sequencing, single end sequencing, and amplicon sequencing. The term “Next Generation Sequencing” (NGS) herein refers to sequencing methods that allow for massively parallel sequencing of clonally amplified molecules and of single nucleic acid molecules. Non-limiting examples of NGS include sequencing-by-synthesis using reversible dye terminators, and sequencing-by-ligation. Deep sequencing refers to sequencing a genomic region multiple times, sometimes hundreds or even thousands of times. Deep sequencing of the genome allows researchers to detect rare genetic variants. As used herein the term “deep whole genome sequencing” refers to deep sequencing of the entire genome. In the context of the present invention cell-free DNA
extracted from maternal blood plasma during pregnancy is subjected to deep whole genome sequencing. The maternal blood plasma samples may be obtained at any stage of the pregnancy, preferably between weeks 7-38 of the pregnancy. The sequencing is repeated multiple times, for example, but not limited to between times (10X) and 1000 times (1000X), e.g., 10 times (10X), 20 times (20X), 30 times (30X), 50 times (50X), 100 times (100X), 150 times (X150), 200 times (200X), 300 times (300X), 500 times (500X), or 1000 times (1000X). Plasma cfDNA can be subjected to varying sequencing depths. In one non-limiting example, the cfDNA in maternal plasma is sequenced 3times (300X); in other embodiments, the cfDNA in maternal plasma is sequenced times (50X), 100 times (100X), or 150 times (X150). In addition, genomic maternal and, optionally, paternal DNA is also subjected to whole genome sequencing. Such genomic DNA may be obtained from any cell type, for example from blood cells, e.g., leukocytes. In an embodiment, whole genome sequencing of paternal and maternal genomic DNA is performed to a targeted depth of between about 20X and 40X, for example 30X.
Whole genome sequencing may be performed using any method known in the art, for example, the HiSeq X Ten System (Illumina), HiSeq 4000 (Illumina), nanopore WGS sequencing using MinION device (Oxford Nanopore Technologies), and WGS by Ultima Genomics. The sequencing generates “reads” which are sequences of DNA fragments of varying lengths. Typically, though not necessarily, a read represents a short sequence of contiguous base pairs in the sample. The read may be represented symbolically by the base pair sequence (in A T C G). It may be stored in a memory device and processed as appropriate. A read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information. The sequencing input may be of long reads (e.g., from about 1 KBP (kilobase pairs) to about 100KBP, or more) or short reads (e.g., from between about 50 base pairs and 400 base pairs), or a combination of long reads and short reads. After sequencing, the reads are aligned to a human reference genome based on sequence similarities.
The identification of the maternal and paternal variants (i.e., variant sites or mutations) can be performed using a variant calling approach, which is generally based on alignment of the DNA sequencing data and the application of a commercially available variant caller. As used herein, the terms “aligned” , “alignment” , or “aligning” refer to the process of comparing a read to a reference sequence and thereby determining whether the read is contained in the reference sequence. If the reference sequence contains the read, the read may be mapped to a particular location in the reference sequence. In some cases, alignment simply tells whether the read is present or absent in the reference sequence. Sequence alignment techniques that can be used according to some embodiments of the present invention include, without limitation, Burrows Wheeler Aligner (BWA), ABA, ALE, AMAP, anon, BAli-Phy, Base-By-Base, BHAOS/DIALIGN, Bowtie, Bowtie 2, ClustalW, CodonCode Aligner, Comass, DECIPHER, DIALIGN-TX, DIALIGN-T, DNA Alignment, DNA Baser Sequence Assembler, EDNA, FSA, Geneious, Kalign, MAFFT, MARNA, MAVID, MSA, MSAProbs, MULTALIN, Multi-LAGEN, MUSCLE, Opal, Pecan, Phylo, Praline, PicXAA, POA, Probalign, ProbCons, PROMALS3D, PRRN/PRRP, PSAlign, RevTrans, SAGA, SAM, Se-AI, STAR, STAR-Fusion, StatAlign, Stemloc, T-Coffee, UGENE, VectorFriends, NovoAlign, and GLProbs. Exemplary variant callers suitable for the present embodiments include, without limitation, Genome Analysis Toolkit (GATK) and Freebayes. For example, Freebayes can comprise an alignment based on literal sequences of reads aligned to a particular target, not their precise alignment. GATK can comprise: (i) pre-processing; (ii) variant discovery; and (iii) callset refinement. Pre-processing can comprise starting from raw sequence data, e.g., in FASTQ or uBAM format, and producing analysis-ready BAM files; processing can include alignment to a reference genome as well as data cleanup operations to correct for technical biases and make the data suitable for analysis; variant discovery can comprise starting from analysis-ready BAM files and producing a callset in VCF format; processing can involve identifying sites where one or more individuals display possible genomic variation, and applying filtering methods appropriate to the experimental design; callset refinement can comprise starting and ending with a VCF callset; processing can involve using metadata to assess and improve genotyping accuracy, attach additional information and evaluate the overall quality of the callset.
Also contemplated are variant callers such as, but not limited to, Platypus, VarScan, Bowtie analysis, MuTect, Google DeepVariant, and/or SAMtools. For example, Bowtie analysis can comprise implementing the Burrows-Wheeler transform for aligning. MuTect can comprise: (i) pre-processing; (ii) statistical analysis; and (iii) post-processing. Pre-processing can comprise an initial alignment of sequencing reads; statistical analysis can comprise using two Bayesian classifiers, one classifier can detect whether a SNP is non-reference at a given site and, for those sites that are found as non-reference, the other classifier can make sure that the normal does not carry the SNP; post-processing can comprise removal of artifacts of sequencing, short read alignments, and hybrid capture. SAMtools can comprise storing, manipulating, and aligning sequencing reads stored as SAM files. In an embodiment, the step of determining a probability that a sequence read in the maternal plasma cfDNA is of fetal origin is performed for example by an algorithm, e.g. the algorithm Hoobari as described in Rabinowitz et al., 2019 and US2021/0340601, to calculate the fetal fraction (FF), i.e., the percent of fetal derived cfDNA within the maternal plasma cfDNA, and to calculate the fragment length distributions. This step may also comprise extracting various fragment-level characteristics.
In various exemplary embodiments of the invention the method comprises the determination of the probability, of each DNA fragment (or read), to be of fetal origin. Using these read-level probabilities, a prediction of the genotype is made, i.e., a prediction whether the fetus carries a normal allele, a pre-mutation, or a full-mutation. In an embodiment, the determination of the probability of the DNA fragment (or read) to be of fetal origin comprises constructing a fetal size distribution and a maternal size distribution, binning said fetal size distribution and calculating a fetal fraction for each fragment size bin, and calculating, for at least one size and at least one fragment at said at least one site, a probability that said fragment is fetal, based on a fetal fraction of a respective fragment size bin to which said fragment belongs. As used herein the term “fetal fraction” or “FF” refers to the portion of fetal cfDNA, within the total amount of cfDNA in the maternal blood. The portion of fetal cfDNA within maternal blood (the fetal fraction) varies throughout the pregnancy, and between individuals, hence this is not regarded as a constant but as a variable. Low levels of fetal cfDNA are referred to as a low fetal fraction.
In an embodiment, said determining the probabilities comprises applying a Bayesian procedure. Optionally, said Bayesian procedure comprises prior probabilities calculated using sequencing data of at least one of said parents. In an embodiment, this procedure further comprises recalibration of the output of said Bayesian procedure using machine learning. In a specific embodiment the determination of the probability, for each DNA fragment (or read), to be of fetal origin is performed using variant calling, for example using the Hoobari algorithm as described in Rabinowitz et al., 2019 and US2021/0340601.
The variant calling prediction can optionally be combined with analysis of fragmentomic features of the DNA reads. The term “fragmentomic features” refers to molecular characteristics of DNA reads, as well as to genomic, epigenetic and alignment features of the DNA read. Fragmentomic features include, but are not limited to, Read quality mapping, Read base qualities, Fragment length, short/long read ratio, DNA fragment end motifs, Cleavage patterns around methylation sites, Read endpoint preferred end, DNA accessibility/nucleosome positioning inference, Distance to nearest nucleosome, Transcription factor binding sites, Regional fetal fraction, Regional sequence composition, Read sequence composition, and Number of sequence errors in the read. In certain embodiments, the variant calling probabilities and the fragmentomics-based probabilities are multiplied to calculate joint probabilities, for which a maximum-likelihood approach is applied to predict the genotype at each site. The step of deducing maternal and optionally paternal STR alleles refers to the calculation of the number of repeats in each maternal and optionally paternal haplotype.
The deduction of maternal and optionally paternal STR alleles can be performed using any suitable bioinformatic tool, including but not limited to STRetch (Dashnow et al.), GangSTR, ExpansionHunter (Dolzhenko et al.), HipSTR, Expansion Hunter De Novo (EHdn), STRling, RepeatSeq, LobSTR, adVNTR, or TredParse.
The cfDNA reads in the region of the inquired gene (i.e., the gene potentially comprising the STR) are processed and categorized as follows:
(i) “Spanning reads” – contain the normal allele in its full length.
(ii) “Partial reads” – contain part of the repeat region and part of the adjacent genomic region. This group is further divided to: a. upstream vs downstream – this categorization helps reduce noise and bias. b. Indicative and non-indicative – indicative reads contain a larger number of repeats than the number of repeats that define a normal allele, thus they must originate from a pre-mutated allele. Non-indicative reads contain a repeat number that is smaller than the threshold that defines a normal allele, and hence, can originate either from a normal or a pre-mutated allele. As indicated above, there are three allele types: normal (i.e., having or less repeats), pre-mutation (i.e., having 45-200 repeats) and full mutation (i.e., having >200 repeats). Accordingly, indicative reads are partial reads that have 45 or more repeats, and therefore they must derive from a pre-mutation (hence they are indicative). Partial reads with 44 repeats or less can be derived either from a normal allele or from a pre-mutation allele, hence they referred to as non-indicative. (iii) “Clean reads” - contain only repeats. These read categories are exemplified in Figure 2 which shows the general structure of relevant reads at the region of an exemplary STR-containing gene. In this example, the normal allele consists of 40 repeats, and the pre-mutated allele consists of 150 repeats. Spanning reads contain the repeats area and flanking regions that are not repeats. Since pre-mutated alleles are often large, spanning repeats typically correspond only to the normal allele, unless a long-read sequencing technology is used. Clean reads correspond to either normal or pre-mutated alleles. Indicative partial reads can only derive from the pre-mutated allele, while non-indicative partial reads may derive from either allele. The expected ratio (also referred to herein as the “calculated ratio” ) between the different reads categories depends on the fetal fraction (FF), the inherited allele, and the expansion length (if a mutation has occurred). In addition, it depends on different bias and error patterns since reads are not evenly distributed across the inquired region.
Accordingly, the theoretical calculated/expected ratio may be 1 in a very simple scenario, but depending on the read length and STR length this value could be slightly different. Physical, and chemical properties which differ between genomic regions, may also contribute to a skew from that ratio. For that reason, the normalization step is employed. As will be specified below, normalization may be performed vis a vis other samples, other regions within the sample, or subsets of reads within the same region (two haplotypes or fetal and maternal enriched). Accordingly, to estimate the expected/calculated ratios between the read types, read counts are normalized against different references with various approaches. For example, but not limited to one or more of the following: - In-sample normalization - normalize against similar regions in the genome, using GC content, mappability and other unique measurements (that are relevant specifically to the tested STRs, e.g., the count of the same triplet in other genomic regions) to create a matrix of similar regions. - Between samples normalization - create a panel-of-normals (PON) or panel of abnormals to learn the typical ratios and how they correlate with expansion length. - Comparison of fetal-enriched and maternal-enriched reads - Separate the cfDNA to fetal-enriched and maternal-enriched reads (using for example the methods described in Rabinowitz et al., 2019, US2021/0340601 and IL307784) and test whether the same phenomenon occurs in both groups, or whether it is more abundant in the fetal group. - Haplotype comparison - Use the parental (namely, maternal, and optionally paternal) haplotype information to phase the cfDNA sequencing data (namely, to separate it to haplotypes), and verify whether the ratio is similar between the two haplotypes or different. - Correction for paired end NGS technology - Perform a correction for paired end NGS technology, i.e., model the paired reads to finetune the expected ratios. Determining the most probable allele inherited by the fetus can be performed by measuring the observed ratio between the different categories and by comparing the observed ratio with the expected ratio. Alternatively, or in addition, determining the most probable allele inherited by the fetus can be performed by applying a Bayesian algorithm.
Measuring the observed ratio between the different read categories can be used to distinguish a normal allele from a premutation or a full mutation, for example as follows: If the mother carries a pre-mutation and the father has two normal copies, the ratio between the upstream (or downstream) indicative-partial reads and the spanning reads is expected to be 1:1, when referring to maternal derived cfDNA. An imbalance will occur if the fetus inherits the normal allele. If, on the other hand, only the father is a carrier, then the presence of indicative-partial reads in maternal plasma will indicate the inheritance of a mutation. Measuring the observed ratio between the different read categories can also be used to distinguish a premutation from a full mutation, for example by measuring the number of clean reads. A large increase in this number compared to the expected proportion indicates a full mutation. This calculation is also correlated with the fetal fraction. Alternatively, a Bayesian algorithm (for example as described in Rabinowitz et al., 2019 and US2021/0340601) can be applied to determine the most probable allele inherited by the fetus, for example as follows: a. Determining possible fetal genotypes based on the parental genotypes. For example, if the mother has a 20-repeat allele and a 30-repeat allele, and the father has a 25-repeat allele and a 31- repeat allele, then the fetus can be (20,25), (20,31), (30,25), (30,31), and the prior probability for the fetus is exactly 25% per genotype. b. Calculating the probability of each read to support each allele using a suitable ready to use commercially available software tool, e.g., ExpansionHunter as described in Dolzhenko et al., HipSTR (Haplotype inference and phasing for Short Tandem Repeats), or as described in Liautard-Haag et al., and the probability of each read to be derived from the fetus or the mother is calculated as described in Rabinowitz et al., 2019, or US2021/0340601. c. Calculating the likelihood of each read to support each genotype based on the probabilities that each allele comprising the genotype is supported by the read, and the probability that the read is either fetal or maternal.
d. Obtaining a joint probability (which is proportional to the most probable genotype) by calculating the product of all read likelihoods per genotype and multiplying this value by the prior probability of each genotype. Thereby, the genotype with the highest joint probability is called.
In an embodiment, parental (maternal and optionally paternal) haplotyping information is incorporated into the method. Based on either short or long-read data, the parental sequencing information is phased (see for example PCT/IL2023/051184). Thereby indirect genotyping is added to the method using haplotypes around the 2 alleles. This haplotype information is integrated with the Bayesian algorithm and may improve accuracy of the genotyping.
The method of the invention is suitable for identifying human genetic diseases associated with, or caused by, short tandem repeats (STR). Non-limiting examples of such genetic diseases include Fragile X Syndrome (FMR1 gene), Fragile X-associated Tremor/Ataxia Syndrome (FXTAS) (FMR1 gene), X-linked Intellectual Disability (multiple genes, e.g., ARX, FMR1), Huntington's Disease (HTT gene), Myotonic Dystrophy (DMPK gene), Spinocerebellar Ataxias (SCAs) (multiple genes, e.g., SCA1, SCA2, SCA3), Friedreich's Ataxia (FXN gene), Machado-Joseph Disease (ATXNgene), Spinocerebellar Ataxia Type 10 (SCA10) (ATTCT repeat expansion), Spinocerebellar Ataxia Type 12 (SCA12) (CAG repeat expansion), Spinocerebellar Ataxia Type 17 (SCA17) (TBP gene), Dentatorubral-Pallidoluysian Atrophy (DRPLA) (ATN1 gene), Spinocerebellar Ataxia Type 8 (SCA8) (CTG repeat expansion), Spinocerebellar Ataxia Type 31 (SCA31) (BEAN1 gene), and Fragile XE Syndrome (AFF2 gene).
In a non-limiting embodiment, the present invention provides a method for genotyping a fetus, comprising identifying whether a fetus comprises STR repeats in the FMR1 gene that may lead to Fragile X syndrome. Namely, the present invention provides a non-invasive method for diagnosing Fragile X syndrome in a fetus.
The method is suitable for diagnosing Fragile X in a both a male fetus and a female fetus.
Fragile X syndrome (FXS) is an X-linked genetic disease, and is the most common, known cause of inherited intellectual disability and the leading single mutation that causes autistic spectrum disorder. Figure 3 shows an exemplary outline of an embodiment of the invention for detecting STR repeats associated with Fragile X syndrome. Since the disease is associated only with the X chromosome, the analysis according to this embodiment of the invention focuses only on male fetuses. These fetuses inherit a single X-linked allele. Accordingly, the paternal alleles are not described. The fetus only inherits one allele which originates from the mother, meaning the complete FF (5% in this example) corresponds to this allele. Thus, an imbalance towards spanning reads can rule out an affected fetus. Imbalance towards indicative reads means that an affected fetus cannot be ruled out, then the ratio between indicative and clean reads assists in estimating the risk of a full mutation. Namely, a ratio between clean reads and indicative reads that is higher than the calculated ratio, indicates the presence of a full mutation.
Claims (29)
- - 21 - 310301/
- CLAIMS:1. A method for genotyping a fetus, comprising: a. receiving reads of sequencing data of (i) maternal plasma cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting the fetus; b. identifying potential genomic sites at which the fetus may have a short tandem repeat (STR) variant; c. for each of the potential genomic sites, determining a probability that the fetus has the STR variant; d. deducing maternal and optionally paternal STR alleles; e. processing cfDNA reads in the region of the gene potentially comprising the STR to identify read categories, wherein said read categories comprise spanning reads, partial reads (including indicative and non-indicative reads) , and clean reads; f. Calculating the expected ratio between the read categories, thereby obtaining a calculated ratio; g. determining the most probable allele inherited by the fetus by: i. Measuring the observed ratio between the read categories; and ii. Using the calculated ratio and the observed ratio to distinguish a normal allele from an allele with a premutation or a full mutation, and to distinguish an allele with a premutation from an allele with a full mutation; Or iii. Applying a Bayesian algorithm; thereby genotyping said fetus. 2. The method of claim 1 wherein said reads of sequencing data are short reads.
- 3. The method of claim 1 wherein said reads of sequencing data are long reads.
- 4. The method of claim 1 wherein said reads of sequencing data comprise short reads and long reads.
- 5. The method of any one of claims 1 to 4, wherein one or both of the gDNA sequencing data and the cfDNA sequencing data is obtained by a method - 22 - 310301/ selected from a group consisting of whole genome sequencing (WGS), whole exome sequencing (WES), next generation sequencing (NGS), targeted sequencing, panel sequencing, gene sequencing, long-read genome sequencing, paired-end sequencing, single end sequencing, and amplicon sequencing.
- 6. The method of claim 5 wherein said WGS or WES data is obtained by deep sequencing.
- 7. The method of any one of the preceding claims, wherein determining said probability is based on at least one Sequence Alignment Map (SAM) parameter.
- 8. The method of any one of the preceding claims, wherein determining said probability is based at least on an observed template length.
- 9. The method of any one of the preceding claims, wherein determining said probability comprises calculating a total fetal fraction.
- 10. The method of claim 9, further comprising constructing a fetal size distribution and a maternal size distribution, wherein said determining the probability of step c in claim 1 comprises binning said fetal size distribution and calculating a fetal fraction for each fragment size bin, and calculating, for at least one size and at least one fragment at said at least one site, a probability that said fragment is fetal, based on a fetal fraction of a respective fragment size bin to which said fragment belongs.
- 11. The method of any one of the preceding claims wherein determining said probability comprises applying a Bayesian procedure.
- 12. The method of claim 11 wherein said Bayesian procedure comprises prior probabilities calculated using sequencing data of at least one of said parents.
- 13. The method of claim 11 or 12, further comprising recalibration output of said Bayesian procedure using machine learning.
- 14. The method of any one of the preceding claims wherein said determining the probability is performed using fetal variant calling.
- 15. The method of any one of the preceding claims wherein said determining the probability is performed using the Hoobari algorithm. - 23 - 310301/
- 16. The method of any one of the preceding claims wherein said determining the probability comprises fragmentomics-based probability that the sequence read is of fetal origin.
- 17. The method of claim 16 wherein the method comprises extracting fragmentomic features for each cfDNA read identified as overlapping a potential genomic site where the fetus may have an STR variant.
- 18. The method of claim 17, wherein said fragmentomic features comprise one or more of read quality mapping, read base qualities, fragment length, short/long read ratio, end motifs, cleavage patterns around methylation sites, read endpoint preferred end, DNA/accessibility/nucleosome, distance to nearest nucleosome, transcription factor binding sites, regional fetal fraction, regional sequence composition, read sequence composition, and number of sequence errors in the read.
- 19. The method of any one of claims 16 to 18 wherein said determining the probability comprises multiplying the variant calling-based probability with the fragmentomics-based probability to obtain a calculated joint probability.
- 20. The method of any one of the preceding claims, wherein said step of deducing maternal and optionally paternal STR alleles comprises applying a bioinformatic tool selected from a group consisting of STRetch, GangSTR, ExpansionHunter, HipSTR, Expansion Hunter De Novo (EHdn), STRling, RepeatSeq, LobSTR, adVNTR, and TredParse.
- 21. The method of any one of the preceding claims wherein said STR variant is indicative of a genetic disease selected from a group consisting of Fragile X Syndrome, Fragile X-associated Tremor/Ataxia Syndrome (FXTAS), X-linked Intellectual Disability, Huntington's Disease, Myotonic Dystrophy, Spinocerebellar Ataxias (SCAs), Friedreich's Ataxia, Machado-Joseph Disease, Spinocerebellar Ataxia Type 10 (SCA10), Spinocerebellar Ataxia Type 12 (SCA12), Spinocerebellar Ataxia Type 17 (SCA17), Dentatorubral-Pallidoluysian Atrophy (DRPLA), Spinocerebellar Ataxia Type 8 (SCA8), Spinocerebellar Ataxia Type 31 (SCA31), and Fragile XE Syndrome.
- 22. A method for diagnosing a genetic disease associated with, or caused by, short tandem repeats (STR) in a fetus, comprising: - 24 - 310301/ a. receiving reads of sequencing data of (i) maternal plasma cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting the fetus; b. identifying potential genomic sites at which the fetus may have a short tandem repeat (STR) variant; c. for each of the potential genomic sites, determining a probability that the fetus has the STR variant; d. deducing maternal and optionally paternal STR alleles; e. processing cfDNA reads in the region of the gene potentially comprising the STR to identify read categories, wherein said read categories comprise spanning reads, partial reads (including indicative and non-indicative reads), and clean reads; f. Calculating the expected ratio between the read categories, thereby obtaining a calculated ratio; g. determining the most probable allele inherited by the fetus by: i) Measuring the observed ratio between the read categories; and ii) Using the calculated ratio and the observed ratio to distinguish a normal allele from an allele with a premutation or a full mutation, and to distinguish an allele with a premutation from an allele with a full mutation; Or iii) Applying a Bayesian algorithm; thereby diagnosing said fetus.
- 23. The method of claim 22 wherein said genetic disease associated with, or caused by, short tandem repeats (STR) is selected from a group consisting of Fragile X Syndrome, Fragile X-associated Tremor/Ataxia Syndrome (FXTAS), X-linked Intellectual Disability, Huntington's Disease, Myotonic Dystrophy, Spinocerebellar Ataxias (SCAs), Friedreich's Ataxia, Machado-Joseph Disease, Spinocerebellar Ataxia Type 10 (SCA10), Spinocerebellar Ataxia Type 12 (SCA12), Spinocerebellar Ataxia Type 17 (SCA17), Dentatorubral-Pallidoluysian Atrophy (DRPLA), Spinocerebellar Ataxia - 25 - 310301/ Type 8 (SCA8), Spinocerebellar Ataxia Type 31 (SCA31), and Fragile XE Syndrome.
- 24. A method for diagnosing Fragile X syndrome in a fetus, comprising: a. receiving reads of sequencing data of (i) maternal plasma cell-free DNA (cfDNA), and (ii) maternal genomic DNA (gDNA); b. deducing maternal FMR1 gene alleles; c. calculating the fetal fraction within the cfDNA; d. processing cfDNA reads in the region of the FMR1 gene to identify read categories, wherein said read categories comprise spanning reads, partial reads (including indicative and non-indicative reads), and clean reads; e. Calculating the expected ratio between the read categories, thereby obtaining a calculated ratio; f. determining the most probable allele inherited by the fetus by: i. Measuring the observed ratio between the read categories; and ii. Using the calculated ratio and the observed ratio to distinguish a normal allele from an allele with a premutation or a full mutation, and to distinguish an allele with a premutation from an allele with a full mutation; Or iii. Applying a Bayesian algorithm; wherein an imbalance towards spanning reads indicates that the fetus has a high likelihood of being unaffected with Fragile X syndrome, an imbalance towards indicative reads indicates that the fetus has a high likelihood of having a premutation or full mutation in the FMR1 gene, and an imbalance towards indicative reads and a ratio between clean reads and indicative reads that is higher than the calculated ratio indicates that the fetus has a high likelihood of having a full mutation in the FMR1 gene leading to Fragile X syndrome.
- 25. The method of any one of the preceding claims, wherein said step of calculating the expected ratio between the read categories comprises normalizing the read counts. - 26 - 310301/
- 26. The method of claim 25 wherein said normalizing is performed by one or more of: a. In sample normalization, b. Between samples normalization, c. Comparison of fetal-enriched and maternal-enriched reads, d. Haplotype comparison, and e. Correction for paired end NGS technology.
- 27. The method of claim 26 wherein said between samples normalization comprises creating a panel-of-normals (PON).
- 28. A computer software product, comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a data processor, configure the data processor to (1) receive reads of sequencing data of (i) maternal cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting a fetus, and to (2) execute the method according to any one of claims 1-27.
- 29. A system for genotyping a fetus, comprising: an input utility for receiving reads of sequencing data of (i) maternal cell-free DNA (cfDNA), and (ii) maternal and optionally paternal genomic DNA (gDNA) from a pair parenting a fetus; and a data processor configured for analyzing said data for executing the method according to any one of claims 1-27. For the Applicants, COHN, DE VRIES, STADLER & CO By:
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IL310301A IL310301B1 (en) | 2024-01-21 | 2024-01-21 | Methods for non-invasive prenatal testing of expansion mutations |
| PCT/IL2025/050073 WO2025154081A1 (en) | 2024-01-21 | 2025-01-21 | Methods for non-invasive prenatal testing of expansion mutations |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IL310301A IL310301B1 (en) | 2024-01-21 | 2024-01-21 | Methods for non-invasive prenatal testing of expansion mutations |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| IL310301A IL310301A (en) | 2025-08-01 |
| IL310301B1 true IL310301B1 (en) | 2025-10-01 |
Family
ID=96471011
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| IL310301A IL310301B1 (en) | 2024-01-21 | 2024-01-21 | Methods for non-invasive prenatal testing of expansion mutations |
Country Status (2)
| Country | Link |
|---|---|
| IL (1) | IL310301B1 (en) |
| WO (1) | WO2025154081A1 (en) |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2848704B1 (en) * | 2010-01-19 | 2018-08-29 | Verinata Health, Inc | Sequencing methods for prenatal diagnoses |
| BR112021006402A2 (en) * | 2019-03-07 | 2021-09-21 | Illumina, Inc. | SEQUENCE-GRAPH BASED TOOL TO DETERMINE VARIATION IN SHORT TANDEM REPETITION REGIONS |
-
2024
- 2024-01-21 IL IL310301A patent/IL310301B1/en unknown
-
2025
- 2025-01-21 WO PCT/IL2025/050073 patent/WO2025154081A1/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| WO2025154081A1 (en) | 2025-07-24 |
| IL310301A (en) | 2025-08-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250329411A1 (en) | Methods for non-invasive prenatal ploidy calling | |
| US20220042103A1 (en) | Methods for non-invasive prenatal ploidy calling | |
| DK2562268T3 (en) | Non-invasive diagnosis of fetal aneuploidy by sequencing | |
| EP2902500A1 (en) | Methods for non-invasive prenatal ploidy calling | |
| BR112013020220B1 (en) | METHOD FOR DETERMINING THE PLOIDIA STATUS OF A CHROMOSOME IN A PREGNANT FETUS | |
| HUE030510T2 (en) | Diagnosing fetal chromosomal aneuploidy using genomic sequencing | |
| US20190338349A1 (en) | Methods and systems for high fidelity sequencing | |
| US11869630B2 (en) | Screening system and method for determining a presence and an assessment score of cell-free DNA fragments | |
| WO2025154081A1 (en) | Methods for non-invasive prenatal testing of expansion mutations | |
| AU2021200569B2 (en) | Noninvasive diagnosis of fetal aneuploidy by sequencing | |
| Vinh | A Method to Create NIPT Samples with Turner Disorder to Evaluate NIPT Algorithms | |
| WO2025083690A1 (en) | Noninvasive fetal variant identification using fragmentomics-based classification | |
| CN120548370A (en) | Methods and systems for improving the accuracy of identifying fetal genetic disorders in maternal blood | |
| HK40045017A (en) | Methods for non-invasive prenatal ploidy calling | |
| HK40045016A (en) | Methods for non-invasive prenatal ploidy calling | |
| HK40045015A (en) | Methods for non-invasive prenatal ploidy calling | |
| HK1208940B (en) | Methods for non-invasive pre-natal ploidy calling | |
| HK1213600A1 (en) | Methods for non-invasive prenatal ploidy calling | |
| HK1240285A1 (en) | Methods for non-invasive prenatal ploidy calling | |
| HK1193173B (en) | Methods for non-invasive prenatal ploidy calling | |
| HK1193173A (en) | Methods for non-invasive prenatal ploidy calling |