ATTORNEY REF #: KAUST 2024-024-02 PCT COMPOSITIONS AND METHODS OF RAPID TARGETED AMPLIFICATION OF GENOMIC REGIONS, SEQUENCING AND ANALYSIS THEREOF CROSS-REFERENCE TO RELATED APPLICATION The present application claims priority to U.S. Provisional Application No. 63/588,122 filed October 5, 2023, the contents of which are hereby incorporated by reference in their entirety. FIELD OF THE INVENTION The disclosed invention is generally in the field of amplification of genomic regions and specifically in the area of determining genetic structural variants associated with Mendelian disease. BACKGROUND OF THE INVENTION More than 350 million people worldwide have a rare disease. 80% of rare diseases are estimated having a genetic origin 1. Despite advances in routine genetic testing in both clinical and research settings, which have improved diagnostic rates and identified the genetic basis for many rare conditions, approximately half of individuals worldwide suspected of having a Mendelian disorder remain undiagnosed
1-3. For those fortunate enough to receive an accurate diagnosis, it takes an average of 4.8 years
1. 44% of rare disease patients will be misdiagnosed at least once
4. The low diagnostic yield and lengthy diagnostic odyssey take a heavy toll on the quality of life of the patients and their families. Thousands of genes have already been linked to Mendelian disease, even in scenarios where variant identification and interpretation can be challenging
5. In the clinic, routine genetic testing methods like chromosomal microarray (CMA) have limitations in providing a comprehensive view of pathogenic genetic structural variations (SVs), especially for repeat expansions, insertions, deletions, and rearrangements. Even large diagnostic exome sequencing (ES) cohorts and whole- genome sequencing (WGS) typically report less than 50%
6,7. Optical Genome Mapping (OGM), a non-nucleotide sequencing method, offers an orthogonal method of structural variation detection. OGM relies on ultra-high-molecular-weight DNA from a biological sample, labeled with fluorescent tags at specific recognition sites. This stretched DNA is then imaged to construct a genome map, highlighting structural variations. OGM is particularly effective at detecting large structural variations, ranging from typical 500bp to several megabases. Challenging variants have been solved efficiently by Bionano while it was missed with other technologies
5. The breakpoints uncovered by OGM are disputable with high error rate. Moreover, it requires larger quantities of 1 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT ultra-high molecular weight of DNA, which potentially limits the range of usable samples and inefficient for family carrier testing. Finally, the high cost greatly increases the financial burden of the affected families. New technologies that offer a comprehensive and base-resolution analysis of all types of genetic lesions are urgently needed to increase diagnostic yield and shorten time to diagnosis (TTD). Long-read sequencing (LRS) technology is becoming increasingly popular for detecting novel and complex structural variations. However, whole genome LRS is still prohibitively expensive for routine genetic diagnosis, and probably unnecessary when used in combination with the standard-of-care tests that in most cases already performed. Along these lines, targeted LRS (T- LRS) strategies have shown promise in facilitating genetic disease diagnosis in a few of case studies
8. While several methods including long-range PCR (LR-PCR), Targeted Locus Amplification (TLA), Cas9-mediated isolation of targets, and adaptive sampling, have been reported to enrich target genomic regions for LRS, none of them is generally applicable to different clinical scenarios due to different technical limitations
8-10. For instance, LR-PCR enrichment requires laborious primer design and optimization by trial and error due to amplicon size limit (<30kb) and uncertainty of the genetic lesions. TLA is limited by the requirements of millions of live cells and time- consuming (~5-7 days) and complex experimental processes. Adaptive sampling necessitates a significant quantity of DNA material and sequencing resources to achieve adequate sequencing depth, rendering it impractically resource-intensive in its current format. There is still a need for simple, efficient and cheaper methods for all types of structural variations in expanded genomic regions at base resolution. BRIEF SUMMARY OF THE INVENTION Compositions and methods for detecting genetic structural variations (SVs) in a gene of interest, involved in disease pathologies commonly found in large Mendelian genomics programs, or for detecting transposable elements (TEs) in a sample, are disclosed. Exemplary conditions and implicated genes (and structural variation) include, but are not limited to Bardet–Biedl syndrome (BBIP1; chr10:110898918-110903058del; chr10:110903059-110907644inv; and chr10:110907645-110907833del); Severe upper and lower limb defects (LMBR1; chr7:156699499- 156799477del); Retinitis pigmentosa (MERTK-chr2: 111980501-111989438del; PHYH- chr10: 13282200-13284270del; BBS9- chr7: 33255115-33263754del); Syndromic microcephaly (VPS13B; chr8: 99501255-99689552del); Spastic paraplegia (AP4S1; chr14: 31071709- 2 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT 31073690del); and atypical hemolytic uremic syndrome (CFHR1 and CFHR3; chr1:196749230- 196832818del). In some forms, the compositions are inverse primers for use in an inverse PCR to detect structural variations in genes. In some forms the primers can be used to detect a structural variation in BBIP1 (BBSome Interacting Protein 1) protein coding gene, and they include a first inverse primer ACAGCCTATGCCCCATTTTGG (SEQ ID NO:13) and a second inverse primer CGAAGGAGATGGAGGTCGTC (SEQ ID NO:14). In some forms the primers can be used to detect a structural variation in LMBR1 (Limb Development Membrane Protein 1) protein coding gene, and they include a first inverse primer GCTTGGAGCATAAGGATGACACA (SEQ ID NO:19) and GAACTTAGGGAGATGGCTGGA (SEQ ID NO: 20) or variants thereof. In some forms the primers can be used to detect a structural variation in MERTK (Mer Tyrosine Kinase) protein coding gene, and they include a first inverse primer GAGGGGGATGTGGAGAGACT (SEQ ID NO:23) and GGGCGATTCCCAGCACAAGT (SEQ ID NO: 24) or variants thereof. In some forms the primers can be used to detect a structural variation in PHYH (Mer Tyrosine Kinase) protein coding gene, and they include a first inverse primer GGAGGGTTTTGGGAACCCTTGT (SEQ ID NO:27) and AAAGGCCCAGAGAAGTGAGGC (SEQ ID NO: 28) or variants thereof. In some forms the primers can be used to detect a structural variation in BBS9 (Bardet-Biedl Syndrome 9) protein coding gene, and they include a first inverse primer GCAGGAATGTGATACCATGGAGC (SEQ ID NO:31) and ACACCACTATTGAGGAGGTCAAAGG (SEQ ID NO:32) or variants thereof. In some forms the primers can be used to detect a structural variation in VPS13B (vacuolar protein sorting 13 homolog B) protein coding gene, and they include a first inverse primer GGCATGTCTGTGGTAATGAGAG (SEQ ID NO:35) and GCAACCTCAGAAGGAGGCCC (SEQ ID NO:36) or variants thereof. In some forms the primers can be used to detect a structural variation in AP4S1 (Adaptor Related Protein Complex 4 Subunit Sigma 1) protein coding gene, and they include a first inverse primer TTACAGGCCAGCACGATTCAT (SEQ ID NO:39) and CCAAGCCCAGAAGCAGGTAG (SEQ ID NO:40) or variants thereof. 3 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT In some forms the primers can be used to detect a structural variation in CHFR1/3 (Complement Factor H Related 1/3) protein coding gene, and they include a first inverse primer ACAATGAGCCTCAGAAGCTGT (SEQ ID NO:43) and GTCGGGGTAAAAGTTAGGGTTT (SEQ ID NO:44) or variants thereof. The methods include: (i) contacting a sample comprising genomic DNA with a DNA endonuclease for an effective amount of time to cleave the genomic DNA into fragments, wherein the sample has not been in contact with a cross-linking agent and/or the genomic DNA is not crosslinked, where the DNA endonuclease is in an effective amount and the time of contact with the genomic DNA is effective to partially digest the genomic DNA (ii) subjecting the fragments obtained from step (b) to a DNA ligase to obtain circularized DNA, (iii) subjecting the circularized DNA to inverse PCR amplification comprising inverse primers, wherein the inverse primers are designed to align with anticipated wild-type sequences, adjacent to a suspected mutation locus in a gene, (iv) sequencing the amplified product and comparing the amplified product sequence to a reference human genome i.e., wild type. The DNA sample comprises between about 100 ng to 900 ng DNA, preferably between about 200 to 800 ng DNA and more preferably, between about 150- 600 ng DNA for example, 150, 200, 300, 400, 500 ng DNA including the intervening values. The disclosed compositions and methods can be used to detect a structural variation in any gene of interest involved in disease pathologies commonly found in large Mendelian genomics programs, including neurodevelopmental disorders, dysmorphic/congenital malformation syndromes, inborn errors of metabolism, hematological conditions, immunological disorders, ophthalmological diseases, audiological disorders, pulmonary conditions, gastrointestinal issues, connective tissue-related disorders, cardiovascular diseases, skeletal abnormalities, reproductive disorders, and renal conditions. The disclosed compositions and methods can also be used to detect TEs in a sample containing genomic DNA. In this aspect, a pair of primers specifically designed for TEs (e.g., human-specific LINE-1 insertions (L1Hs)) and are used in step (iii) above, and the method generates reads that align to the human reference genome, capturing extended genomic regions containing neighboring genomic regions of TEs. Two categories of TE insertions can be detected based on their supporting reads’ alignment patterns to TEs: 1) known TE insertions: reads mapping to TE regions present in the reference genome are categorized as known insertions. 2) potential novel TE insertions: Reads aligning to genomic regions not annotated as TEs in the reference genome are flagged as potential novel insertions. NanoRanger can be applied for mapping all 4 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT classes of TEs including but not limited to LINE1 elements. This approach can achieve rapid, accurate, and sensitive identification of both known and novel TEs, providing valuable insights into their role in development, aging, and diseases. The disclosed methods uncover precise genetic coordinates for structural variants and therefore, can be used to identify carriers and inform physician on appropriate treatment. Thus, some forms, the DNA sample being tested is obtained from a subject who has not been diagnosed with the genetic disorder being tested for. BRIEF DESCRIPTION OF THE DRAWINGS FIGS. 1A-1D show images of a medical case resolved by LR-PCR and Nanopore sequencing. The case was diagnosed with neuronal ceroid lipofuscinosis. FIG. 1A show a schematic summary of the gene map, prior test results (only tests with positive results are shown), primer design strategy, and sequencing findings. An exon 4 deletion in NM_152778.2 that was initially suspected using molecular karyotyping, while optical genome mapping suggested a 31-kb deletion compassing the exon 4 in NM_152778.2. Long-range PCR and Nanopore sequencing detected an 8.8-kb deletion and an 8-bp inversion near the 5’breakpoint. FIG. 1B is Validation of the breakpoint by Sanger sequencing. FIG. 1C shows carrier detection using genotyping PCR, where the sibling was identified as a non-carrier. FIG. 1D shows application of genotyping PCR for carrier screening among 1,000 healthy Saudi individuals identified five carriers of the mutant MFSD8 gene. ‘‘FAM4’’ denotes the temporary ID for the studied family. ‘‘PLATE’’ followed by numbers were the genotyping batch IDs. Displayed are only those gels with positive bands, with lanes correlating to individual case IDs. A DNA molecular-weight marker was included in the rightmost lane. FIGS. 2A-2C show images of a medical case (19DG0075) resolved by TLA and Nanopore sequencing. The case is of a combination of severe hyper insulinemic hypoglycemia necessitating pancreatectomy as well as progressive vision loss. FIG. 2A is a schematic summary of the gene map, prior test results (only tests with positive results are shown), primer design strategy, and sequencing findings. An image showing how SNP array initially suggested a homozygous deletion of ABCC8 exon-3 to -17, and -19 to -21 and ABCC8 exon-1, -18, and -22 to -39 were suggested to be present. TLA and Nanopore sequencing unveiled a much larger 123-kb deletion. FIG. 2B is Validation of the breakpoint by Sanger sequencing. FIG. 2C shows identification of carrier siblings using genotyping PCR. 5 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT FIGs. 3A-3B is an overview of Nanopore-based Rapid Acquisition of Neighboring Genomic Regions (NanoRanger) pipeline. To initiate the process, inverse PCR primers (blue arrows) are meticulously designed to align with anticipated wild-type sequences, adjacent to the suspected mutation locus. An appropriate restriction enzyme is chosen to cleave the genomic DNA into substantial fragments, typically ranging from a few kilobases to tens of kilobases in size (restriction cutting sites: in black). Fragmentation is achieved through a brief (~ 5-minute) partial digestion utilizing the selected enzyme, facilitating the generation of restriction fragments of varying sizes. These resultant restriction fragments are then circularized through DNA ligation, employing conditions conducive to intra-molecular ligation. The circularized restriction fragments serve as the substrate for inverse PCR amplification, focusing on DNA regions proximal to the suspected breakpoints. Notably, circularized restriction fragments of differing sizes, deliberately generated through partial digestions, are amplified using the same pair of primers, enhancing the coverage of the target locus. In the final phase, the generated amplicons are subjected to sequencing using long- read sequencing technologies, such as Nanopore and PacBio, to comprehensively analyze the target genomic region. Additionally, before inverse PCR amplification, the circularized restriction fragments can be labeled with unique molecular identifiers (UMIs). The UMI labeling and the following inverse PCR require primers: a UMI primer (in the combination of red, yellow, and green), a universal primer (in green), and a reverse primer (in blue). The UMI primer contains three parts: a universal primer for amplifying the DNA (in green), an optimized UMI structure for labeling individual DNA molecules (in yellow), and a gene-specific primer for targeted DNA amplification (in red). A UMI primer should be used to label one end of the known genomic region. After the UMI labeling and purification, a universal primer and a reverse primer are used to amplify all labeled DNA. In the downstream bioinformatic analysis, the inverse PCR amplicons can be grouped by the UMI they contain. The grouped reads with the same UMI represent one original DNA molecule, which means they share the same DNA sequence. By comparing each consensus sequence with the reference sequence, we can detect genetic variations in each original DNA molecule. FIG. 3B is | an overview of LongRanger pipeline. To initiate the process, single guide RNAs ((sgRNAs) if Cas9 nuclease is chosen as the nuclease) are designed to target anticipated wild- type sequences, adjacent to the regions of interest (ROI1 and ROI2). An appropriate restriction enzyme is chosen to cleave the genomic DNA into substantial fragments, typically ranging from a few kilobases to tens of kilobases in size (restriction cutting sites: in black). Fragmentation is achieved through a brief (~ 5-minute) partial digestion utilizing the selected enzyme, facilitating the 6 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT generation of restriction fragments of varying sizes. These resultant restriction fragments are then circularized through DNA ligation, employing conditions conducive to intra-molecular ligation. The circularized restriction fragments are then cleaved by Cas9/sgRNA complexes. Notably, circularized restriction fragments of differing sizes, deliberately generated through partial digestions, are linearized into different lengths, enhancing the coverage of the ROIs. The linear native DNA can be dA-tailed and then ligated (while circular DNA cannot be ligated) to sequencing adapters and then subjected to sequencing using long-read sequencing technologies, such as Nanopore and PacBio, to comprehensively analyze the target genomic region. In the sequencing and data analysis steps epigenetic information on the native DNA (e.g., methylated DNA and unmethylated DNA indicated by filled circle and open circle, respectively) can be analyzed simultaneously with DNA sequence information. FIGS. 4A-4E demonstrate how NanoRanger enables precise characterization of disease- causing breakpoints in a Bardet-Biedl Syndrome case (10DG0002). FIG. 4A is an illustration showing strategy of applying NanoRanger for the region of interest. FIG. 4B shows the results of gel electrophoresis of Inverse PCR products, displaying the mutant band in affected family samples and wild-type bands in the parents and a healthy control. FIG. 4C is an image showing phase analysis of long sequencing reads reveals the proband's homozygous mutant haplotype, while each parent carries one mutant and one wild-type allele. De novo assembly using NanoRanger identifies two breakpoint junctions, including a 4.6-kb inversion and two deletions (189 bp and 4.1-kb). The breakpoint junctions 1 (FIG. 4D) and 2 (FIG. 4E) were verified by Sanger sequencing. FIGS. 5A-5M show images of carrier screening of the MFSD8 breakpoints in 1000 healthy Saudi individuals. FIGS. 6A-6C are images that demonstrate how long-range PCR and nanopore sequencing identifies complex structural variations that are missed by clinical testing. FIG. 7 demonstrates pipeline of pyNanoRanger analysis. FIGS. 8A-8N show images of carrier screening of the BBIP1 breakpoints in 1000 healthy Saudi individuals. FIGS. 9A-9B demonstrate how NanoRanger largely increases diagnosis yield for the complex structural variations that are missed by clinical testing at base level. FIGS. 10A-10D are images that demonstrate how adaptive sampling enables the detection of large structural variations that are missed by clinical testing. FIG. 10A is an image showing an overview of , clinical testing and long-range PCR and nanopore sequencing alignment. FIG. 10B is 7 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT an image showing adaptive sampling reads map to the assembly generated by reads from Nanoranger. Fig. 11. Overview pipeline of applying NanoRanger for transposable elements. A pair of primers specific to transposable elements is applied while keeping all the other experimental steps unchanged. The generated amplicons are subjected to sequencing using long-read sequencing technologies, to comprehensively analyze the extended genomic regions containing transposable elements (e.g. L1Hs insertions). The reads are categorized into three types based on their alignment patterns into L1Hs. Figs. 12A-12E. Illustration of detected L1Hs insertions. (FIG. 12A-12B) Genotyping gel images for two novel L1Hs insertions. (FIG. 12C-12D) Sanger sequencing validation of the two novel L1Hs insertions. (FIG. 12E) An illustration of a detected known L1Hs insertion. This figure demonstrates NanoRanger’s ability to specifically detect L1Hs insertions, even in the presence of nearby satellite sequences and/or other repetitive elements. Notably, NanoRanger operates effectively at low sequencing depths with a single read being sufficient to accurately identify an L1Hs insertion. Fig. 13. Pipeline for NanoRanger-based TE characterization. The pipeline consists of three major stages: filtering, categorizing, and summarizing reads aligned to L1Hs regions. The system facilitates comprehensive detection and reporting of TEs by leveraging data generated through NanoRanger. Figures 14A-14D. NanoRanger facilitates precise characterization of breakpoints of rearrangements in a 1.5-Mbp region in case 14DG1602. (FIG. 14A) A schematic summary of the gene map, prior test results (only tests with positive results are shown), primer design strategy, and sequencing findings. The two Integrative Genomics Viewer (IGV) coverage and sequences tracks display the read alignment using primer pairs positioned upstream and downstream of the ROI. (FIG. 14B) A schematic summary of the rearrangement and the origins of the amplicon sequences: gray area, a 1,099.8-kb deleted genomic region; yellow region, a 384.0-kb inverted genomic region; solid lines with dotted lines in between, illustration of the process of NanoRanger amplification. (FIG. 14C and FIG. 14D) The junctions 1 (FIG. 14C) and 2 (FIG. 14D) confirmed by Sanger sequencing Figure 15A-15E. Indications of transcription-mediated genome instability in two cases (FIG. 15A) A schematic illustration of the genomic structure, Alu element composition, and the potential mechanism involved in case 17DG0332. (FIG. 15B) Alignment of Alu elements with 8 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT identical orientations near the 50 and 30 ends of the deletion breakpoints in case 17DG0332. (FIG. 15C) A schematic illustration of the genomic structure, transcription regulatory element composition, and the potential mechanism in case 19DG0075. (FIG. 15D) A schematic illustration depicting potential mechanisms implicated in case 10DG0002. (FIG. 15E) Schematic illustration of the genomic structure and transcription regulatory element composition in case 10DG0002. Adapted from the University of California, Santa Cruz (UCSC) Genome Browser screenshot. Gray shade, unaffected genomic regions; teal shade, deleted genomic regions; purple chunks, insertions; green shade, inverted genomic regions; brown arrows, homologous sequences (not in scale). a, b, c, and d indicate breakpoints. Figure 16. Study design and diagnostic routine;. The schematic illustrates the diagnostic approach taken for cases recruited in this study. Cases yielding negative results were subject to Targeted Long-Read Sequencing (T-LRS) to investigate potential genetic underpinnings further. The mapped breakpoints provide the cases with rapid single base-pair resolution of their genomic disorders and could be utilized in broad applications, including carrier screening, etc. Created with BioRender.com. Figures 17A-17C. Genotyping PCR for the family networks. The outcomes of genotyping PCR for the family networks to identify the status of the parents, siblings, and spouses. Figures 18A-18D. Case studies illustrating the utility of NanoRanger, Related to Figures 3 and 5. (FIG. 18A) Optical genome mapping report of case 14DG1602 affecting the PRDM5 gene. The report illustrates two SVs encompassing the PRDM5 gene, which present conflicting information complicating primer design for valid genotyping. (FIG. 18B) Comprehensive summary of case 18DG0135. Details gene map, prior testing outcomes, primer design strategy, and the sequencing results achieved with NanoRanger, which successfully identified the relevant breakpoints. (FIG. 18C) multiple singleplex NanoRanger strategy for the TUSC3 gene. Schematic for multiple singleplex NanoRanger assays targeting 10 different regions of the TUSC3 gene, which offers an alternative multiplexing strategy with no theoretical upper limit on the number of targets and requires minimal optimization. (FIG. 18D) present the gene maps, prior testing outcomes, primer design strategy, and the sequencing results achieved with NanoRanger, which successfully identified the relevant breakpoints for each case. Figures 19A-19B. Multiplex NanoRanger performance in detecting genomic alterations, Related to Figure 3 and STAR Methods. (FIG. 19A) The gel electrophoresis compares the DNA profile of a patient with that of a healthy control, with the expected divergent patterns 9 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT clearly visible. (FIG. 19B) The alignments to five targeted gene loci. NanoRanger not only accurately identified a deletion in the PHYH gene for the patient but also confirmed the absence of alterations in four additional genomic loci. These findings validate the proficiency of multiplex NanoRanger in simultaneously examining multiple regions of interest and pinpointing the specific locus affected. Figures 20A-20E. Homologous sequence alignments and genetic rearrangements on regulatory elements, Related to Figure 6. (FIG. 20A) The alignment of homologous sequences in identical orientations found near both the 5’ and 3’ ends of the deletion breakpoints in case 19DG0075, which features a deletion spanning the USH1C and ABCC8 genes. (FIG. 20B) The alignment of homologous sequences with opposing orientations adjacent to the inversion breakpoints in case 10DG0002, illustrating the genetic complexity underlying the structural variation. (FIGs. 20C, 20D, 20E) Schematic illustration of genetic rearrangements on regulatory elements in three Cases. Disruptions and rearrangements affecting enhancers and transcription start sites (TSSs) were observed across multiple cases: a rearrangement (FIG. 20C) involving enhancers and TSSs of multiple genes, a case (FIG. 20D) in TUSC3, where a breakpoint disrupts an enhancer at its edge, and a case (FIG. 20E) in VPS13B, where a breakpoint disrupts an enhancer at the edge, and a deletion involves multiple enhancers with TSSs, suggesting a significant impact on gene regulation. DETAILED DESCRIPTION OF THE INVENTION To overcome the limitations with existing methods, NanoRanger was developed, which enables rapid and accurate detection of all types of structural variations in expanded genomic regions at base resolution. Unlike computational T-LRS (targeted-LRS) methods which requires micrograms of DNA, NanoRanger requires three orders of magnitude less DNA and a fraction of the flow-cell capacity while achieving tens of thousands fold higher sequence coverage. First, current T-LRS approaches were evaluated using five clinical cases with unresolved breakpoints and compared their performance with that of NanoRanger. Then NanoRanger was used to resolve 10 cases of suspected autosomal-recessive Mendelian disorders, which all failed to be diagnosed with conventional genetic tests (e.g., CMA, ES, and OGM). Using the validated breakpoints, carriers were screened in 1000 healthy Saudi individuals. These results demonstrated that NanoRanger is a rapid and cost-effective approach with wide applicability in the clinic. 10 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT The disclosed method and compositions can be understood more readily by reference to the following detailed description of particular embodiments and the Example included therein and to the Figures and their previous and following description. It is to be understood that the disclosed method and compositions are not limited to specific synthetic methods, specific analytical techniques, or to particular reagents unless otherwise specified, and, as such, can vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. It is to be understood that the disclosed compounds, compositions, and methods are not limited to specific synthetic methods, specific analytical techniques, or to particular reagents unless otherwise specified, and, as such, may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular forms and embodiments only and is not intended to be limiting. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of skill in the art to which the disclosed method and compositions belong. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present method and compositions, the particularly useful methods, devices, and materials are as described. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such disclosure by virtue of prior invention. No admission is made that any reference constitutes prior art. The discussion of references states what their authors assert, and applicants reserve the right to challenge the accuracy and pertinency of the cited documents. It will be clearly understood that, although a number of publications are referred to herein, such reference does not constitute an admission that any of these documents forms part of the common general knowledge in the art. I. DEFINITIONS As used herein, “subject” includes, but is not limited to, animals, plants, bacteria, viruses, parasites and any other organism or entity. The subject can be a vertebrate, more specifically a mammal (e.g., a human, horse, pig, rabbit, dog, sheep, goat, non-human primate, cow, cat, guinea pig or rodent), a fish, a bird or a reptile or an amphibian. The subject can be an invertebrate, more specifically an arthropod (e.g., insects and crustaceans). The term does not denote a particular age or sex. Thus, adult and newborn subjects, as well as fetuses, whether male or female, are intended to be covered. A patient refers to a subject afflicted with a disease or disorder. The term “patient” includes human and veterinary subjects. 11 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT A structural variation (SV) as used herein in connection with DNA is generally defined as a region of DNA approximately 1 kb and larger in size and can include inversions and balanced translocations or genomic imbalances (insertions and deletions), commonly referred to as copy number variants (CNVs). Ranges may be expressed herein as from "about" one particular value, and/or to "about" another particular value. When such a range is expressed, also specifically contemplated and considered disclosed is the rangefrom the one particular value and/or to the other particular value unless the context specifically indicates otherwise. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another, specifically contemplated embodiment that should be considered disclosed unless the context specifically indicates otherwise. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint unless the context specifically indicates otherwise. It should be understood that all of the individual values and sub-ranges of values contained within an explicitly disclosed range are also specifically contemplated and should be considered disclosed unless the context specifically indicates otherwise. Finally, it should be understood that all ranges refer both to the recited range as a range and as a collection of individual numbers from and including the first endpoint to and including the second endpoint. In the latter case, it should be understood that any of the individual numbers can be selected as one form of the quantity, value, or feature to which the range refers. In this way, a range describes a set of numbers or values from and including the first endpoint to and including the second endpoint from which a single member of the set (i.e. a single number) can be selected as the quantity, value, or feature to which the range refers. The foregoing applies regardless of whether in particular cases some or all of these embodiments are explicitly disclosed. II. COMPOSITIONS Compositions for detecting a structural variation in a gene of interest involved in disease pathologies commonly found in large Mendelian genomics programs, are disclosed. Exemplary conditions and implicated genes (and structural variation) include, but are not limited to Bardet– Biedl syndrome (BBIP1; chr10:110898918-110903058del; chr10:110903059-110907644inv; and chr10:110907645-110907833del); Severe upper and lower limb defects (LMBR1; chr7:156699499- 156799477del); Retinitis pigmentosa (MERTK-chr2: 111980501-111989438del; PHYH- chr10: 13282200-13284270del; BBS9- chr7: 33255115-33263754del); Syndromic microcephaly (VPS13B; 12 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT chr8: 99501255-99689552del); Spastic paraplegia (AP4S1; chr14: 31071709-31073690del); and atypical hemolytic uremic syndrome (CFHR1 and CFHR3; chr1:196749230- 196832818del). The compositions are inverse primers used in inverse PCR to detect structural variations in genes. Exemplary inverse primers for the disclosed methods can be found in Tables 2 and 13, below, identified therein as “NanoRanger inverse primer 1 or 2”. In some forms the primers can be used to detect a structural variation in BBIP1 (BBSome Interacting Protein 1) protein coding gene, and they include a first inverse primer ACAGCCTATGCCCCATTTTGG (SEQ ID NO:13) and a second inverse primer CGAAGGAGATGGAGGTCGTC (SEQ ID NO:14). In some forms the primers can be used to detect a structural variation in LMBR1 (Limb Development Membrane Protein 1) protein coding gene, and they include a first inverse primer GCTTGGAGCATAAGGATGACACA (SEQ ID NO:19) and GAACTTAGGGAGATGGCTGGA (SEQ ID NO: 20) or variants thereof. In some forms the primers can be used to detect a structural variation in MERTK (Mer Tyrosine Kinase) protein coding gene, and they include a first inverse primer GAGGGGGATGTGGAGAGACT (SEQ ID NO:23) and GGGCGATTCCCAGCACAAGT (SEQ ID NO: 24) or variants thereof. In some forms the primers can be used to detect a structural variation in PHYH (Mer Tyrosine Kinase) protein coding gene, and they include a first inverse primer GGAGGGTTTTGGGAACCCTTGT (SEQ ID NO:27) and AAAGGCCCAGAGAAGTGAGGC (SEQ ID NO: 28) or variants thereof. In some forms the primers can be used to detect a structural variation in BBS9 (Bardet-Biedl Syndrome 9) protein coding gene, and they include a first inverse primer GCAGGAATGTGATACCATGGAGC (SEQ ID NO:31) and ACACCACTATTGAGGAGGTCAAAGG (SEQ ID NO:32) or variants thereof. In some forms the primers can be used to detect a structural variation in VPS13B (vacuolar protein sorting 13 homolog B) protein coding gene, and they include a first inverse primer GGCATGTCTGTGGTAATGAGAG (SEQ ID NO:35) and GCAACCTCAGAAGGAGGCCC (SEQ ID NO:36) or variants thereof. In some forms the primers can be used to detect a structural variation in AP4S1 (Adaptor Related Protein Complex 4 Subunit Sigma 1) protein coding gene, and they include a first inverse 13 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT primer TTACAGGCCAGCACGATTCAT (SEQ ID NO:39) and CCAAGCCCAGAAGCAGGTAG (SEQ ID NO:40) or variants thereof. In some forms the primers can be used to detect a structural variation in CHFR1/3 (Complement Factor H Related 1/3) protein coding gene, and they include a first inverse primer ACAATGAGCCTCAGAAGCTGT (SEQ ID NO:43) and GTCGGGGTAAAAGTTAGGGTTT (SEQ ID NO:44) or variants thereof. In some forms, the primers can be used to detect TEs in a sample, such as human-specific LINE-1s (L1Hs), Alus, SVAs, and HERVs, by annealing to conserved regions specific to the TE subtype of interest. For example, the primers can be used to detect L1Hs in a sample, and they include 5’-ATGCTAGATGACACATTAGTGGG-3’ (SEQ ID NO:92) (targeting the 3’ end of L1Hs) and 5’-GCTCTGCGTTTTAGAGTTTCCA-3’ (SEQ ID NO:93) (targeting the 5’ end of L1Hs). TEs are mobile genetic elements capable of moving from one location to another within the genome. TEs are classified into two main groups: Class I elements (retrotransposons) and Class II elements (DNA transposons). Retrotransposons move through a copy-and-paste mechanism via an RNA intermediate and are further divided into long terminal repeat (LTR) and non-LTR elements. The non-LTR group includes long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). Recent research has provided valuable insights into the activity of TEs in somatic tissues over an individual’s lifetime, with growing evidence suggesting that TE mobilization plays a significant role in aging and diseases. In some embodiments, primers provided herein include oligonucleotide with 70% or greater sequence identity with SEQ ID NOs: 13, 14, 19, 20, 23, 24, 27, 28, 35, 36, 39, 40, 43 and 44, (e.g. an oligonucleotide with about 70% . . . 75% . . . 80% . . . 90% . . . 95% . . . 98% . . . 99% sequence identity), portions thereof, and sequences complementary thereto. In some embodiments, primers are provided that function substantially similarly to primers provided herein Each primer in the first inverse primer and second inverse primer is a specific primer designed according to the region related to a mutation in a gene of interest, and its function is to locate at a specific position in the genome for subsequent inverse PCR enrichment of the target region. In some forms, the disclosed primers are lyophilized. Methods for preparing lyophilized reagents for a PCR reaction are known. For example, a lyophilized product can be prepared by freeze-drying a conventional aqueous reaction mixture which includes the disclosed primers using a 14 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT stablizer, alone or in combination with MgCl2, dNTPs and a DNA polymerase. Various stabilizers and stabilization methods have been developed for the lyophilized materials for successful lyophilization. Frank et al. Disclose the use of carbohydrates as cryoprotectants to improve the stability and shelf life of materials (US Pat. No. 5,098,893). De Luca et al. Disclose a PCR lyophilized composition comprising cellobiose as a stabilizer (US Published Publication 2012/0064536). Other stabilizers include glycerol, DMSO, sucrose, glycine, and a polysorbate. III. METHODS The disclosed compositions can be readily made using techniques described herein and/or generally known. A strategy that rapidly captures large neighboring genomic regions of interest without crosslinking. This strategy when coupled with nanopore sequencing is called nanopore-based rapid acquisition of neighboring genomic regions (NanoRanger) (FIG. 3). NanoRanger offers targeted sequencing of extensive genomic regions—ranging from tens to hundreds of kilobases—in candidate loci suggested either by prior genomics knowledge or conventional clinical genetic testing methods. This is achieved through a combination of partial restriction digestion, ligation-mediated inverse PCR, and long-read sequencing techniques. Generally, disclosed methods include sample preparation, or the extraction and isolation of cell free polynucleotide sequences such as genomic DNA from a bodily fluid. In some embodiments the genomic DNA is extracted from the group consisting of blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool, and tears, and the sample can be obtained from a subject suspected of having a genetic disorder, or in the embodiments where subjects are been tested to determine if they are carriers, from a subject not presenting with any symptoms leading to a suspicion of the presence of a genetic disorder. The methods include: (i) contacting a sample comprising genomic DNA with a DNA endonuclease for an effective amount of time to cleave the genomic DNA into fragments, wherein the sample has not been in contact with a cross-linking agent and/or the genomic DNA is not crosslinked, and the DNA endonuclease is in an effective amount and the time of contact with the genomic DNA is effective to partially digest the genomic DNA (ii) subjecting the fragments obtained from step (b) to a DNA ligase to obtain circularized DNA, (iii) subjecting the circularized DNA to inverse PCR amplification comprising inverse primers, wherein the inverse primers are designed to align with anticipated wild-type sequences, adjacent to a suspected mutation locus in a gene, (iv) sequencing the amplified product and comparing the amplified product sequence to a 15 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT reference human genome i.e., wild type (for example GRCh38.p14: Human genome assembly. GCF_000001405.40. Available from: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.40.). One can readily compare the sequences of the long reads obtained with the disclosed methods, for example, by identifying the restriction site of the DNA endonuclease used in step (i) as a guide and comparing the sequence with a reference genome. The disclosed methods use partial (as opposed to complete) digestion of the DNA in the collected sample. Partial digestion is a restriction digest that has not been allowed to go to completion, and thus contains pieces of DNA with some restriction endonuclease sites that have not yet been cleaved. This can be accomplished by performing the digestion reaction for a shorter time that required for a complete digestion and/or a smaller restriction enzyme concentration than required for a complete digestion. The disclosed methods include the following steps. 1. Primer Design: To initiate the process, inverse PCR primers are meticulously designed to align with anticipated wild-type sequences, adjacent to the suspected mutation locus. Inverse PCR primers are designed within expected wild-type sequences (the “anchor” sequences) in the sample (through a priori knowledge or testing) that are near the suspected mutation locus. Genomic DNA is extracted from patient samples using routine methods without crosslinking. Inverse PCR (Polymerase Chain Reaction) The standard polymerase chain reaction (PCR) is used to amplify a segment of DNA that lies between two inward-pointing primers. In contrast, inverse PCR (also known as inverted or inside- out PCR) is used to amplify DNA sequence Inverse PCR DNA involves digestion by a restriction enzyme of a preparation of DNA containing the known sequence and its flanking region. The individual restriction fragments (many thousands in the case of total mammalian genomic DNA) are converted into circles by intramolecular ligation, and the circularized DNA is then used as a template in PCR. The unknown sequence is amplified by two primers that bind specifically to the known sequence and point in opposite directions. The product of the amplification reaction is a linear DNA fragment containing a single site for the restriction enzyme originally used to digest the DNA. This site marks the junction between the previously cloned sequence and the flanking sequences. The size of the amplified fragment depends on the distribution of restriction sites within known and flanking DNA sequences that flank one end of a known DNA sequence and for which no primers are available. In some embodiments, preparation of the sample may include combining the 16 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT sample with reagents for amplification and for reporting whether or not amplification occurred. Reagents for amplification may include any combination of primers for the targets, dNTPs and/or NTPs, at least one enzyme (e.g., a polymerase, a ligase, a reverse transcriptase, a restriction enzyme, or a combination thereof, each of which may or may not be heat-stable), and/or the like 2. DNA Fragmentation: An appropriate restriction enzyme is chosen to cleave the genomic DNA into substantial fragments, typically ranging from a few kilobases to tens of kilobases in size. Fragmentation is achieved through a brief (~ 5-minute) partial digestion utilizing the selected enzyme, facilitating the generation of restriction fragments of varying sizes. In some forms the restriction endonuclease is a site specific restriction enzyme such as PstI or its isoschizomer, SalPI. Other exemplary restriction enzymes include, but are not limited to EcoRI, BamHI, HindIII, HaeIII, PvuII, BglI, PstI, and SalI, 3. Fragment ligation: These resultant restriction fragments are then circularized through DNA ligation, employing conditions conducive to intra-molecular ligation. The resulting restriction fragments are self-circularized using DNA ligase under conditions favoring intra-molecular ligation. Exemplary DNA ligases include, but are not limited to T4 DNA Ligase; E. coli DNA Ligase; DNA Ligase I; DNA Ligase III; DNA Ligase IV; Lambda Ligase; Taq DNA Ligase; Pfu DNA Ligase and CircLigase. 4. Amplification: The circularized restriction fragments serve as the substrate for inverse PCR amplification, focusing on DNA regions proximal to the suspected breakpoints. Notably, circularized restriction fragments of differing sizes, deliberately generated through partial digestions, are amplified using the same pair of primers, enhancing the coverage of the target locus. This strategy eliminates the tedious trial-and-error primer testing in LR-PCR, because any changes to the wild-type restriction sites, be it elimination due to deletion or creation due to rearrangements, would be captured by the partial digestion, self-circularization, and inverse PCR steps. Additionally, before inverse PCR amplification, the circularized restriction fragments can be labeled with unique molecular identifiers (UMIs). The UMI labeling and the subsequent inverse PCR require primers: a UMI primer, a universal primer, and a reverse primer. The UMI primer contains three parts: a universal primer for amplifying the DNA, an optimized UMI structure for labeling individual DNA molecules, and a gene-specific primer for targeted DNA amplification. The universal sequence is CATCTTACGATTACGCCAACCAC (SEQ ID NO:47); it is designed to avoid forming a secondary structure or nonspecific amplification of the human genome. The UMI 17 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT sequence is NNNNTGNNNN (SEQ ID NO:48); it is designed to avoid homopolymers. The gene- specific primers can be any sequence to amplify a region of interest. To label individual DNA molecules, a UMI primer should be used to label one end of the known genomic region by following specific PCR parameters, which depend on the amplicon length and the polymerase. After the UMI labeling and purification, a universal primer and a reverse primer are used to amplify all labeled DNA by following the manufacturer’s instructions with the longest recommended extension time. In the downstream bioinformatic analysis, the inverse PCR amplicons can be grouped by the UMI they contain. The grouped reads with the same UMI represent one original DNA molecule, which means they share the same DNA sequence. UMIs would eliminate errors introduced by PCR amplification and/or sequencing so that the sequence of the original DNA molecule can be accurately deduced. By comparing each consensus sequence with the reference sequence, we can detect genetic variations in each original DNA molecule. This method can achieve unbiased, highly accurate, and sensitive DNA sequencing at a single-allele level. Multiplex inverse PCR with or without UMI labeling can be used to analyze multiple genomic loci at once. 5. Sequencing: in the final phase, the generated amplicons are subjected to sequencing using long-read sequencing technologies, such as Nanopore and PacBio sequencing, to comprehensively analyze the target genomic region. Long-read sequencing is a form of next- generation sequencing (NGS) that has technical advantages over short-read sequencing for the detection of specific types of genetic variation. It can sequence long strands of DNA or RNA in one go, without breaking it up into smaller fragments. PacBio Sequencing is known in the art and is described for example in Rhoads, et al., Genomics, Proteomics & Bioinformatics, (13)(5):278-289 (2015). Nanopore sequencing is known in the art and is described for example in Wang, et al., Nature Biotechnology volume 39, pages1348–1365 (2021). The efficacy of NanoRanger is demonstrated herein using a case of Bardet–Biedl syndrome and conditions listed in Table A and 1. PCR-Free NanoRanger An alternative design of NanoRanger allows PCR-free capture of extended genomic regions for long-read sequencing. Instead of inverse-PCR amplification of self-ligated restriction digestion DNA fragment (the third step in NanoRanger workflow, Fig. 3A), a site-specific nuclease is used to cleave an expected wild-type sequence in the sample (through a priori knowledge or testing) that are near the suspected mutation locus or an ROI (Fig. 3B). Any nuclease that can generate DNA 18 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT double-strand break (DSB) in a sequence specific manner and generate DSB only in the chosen target sequence will be suited for this purpose. Examples of such nucleases include but not limited to CRISPR/Cas9, CRISPR/Cas12a (CPF1), other CRISPR/Cas variants, Transcription activator-like effector nuclease (TALEN), meganuclease, and zinc-finger nuclease. The DSB generated by the nuclease will linearize only those self-ligated DNA circles that contain the target sequence. These linearized DNA can be of various sizes due to partial digestion and cover the extended sequences flanking the ROI in either direction (Fig. 3B). These linearized DNA fragments will be end- processed and ligated with sequencing adapters in subsequent library preparation steps followed by long-read sequencing. The uncut DNA circles, which by definition do not contain the nuclease target sequence near the ROI and are not from the ROI neighborhood, do not have free DNA ends for sequencing adapter ligation and will not be sequenced. Thus, this strategy achieves enrichment of sequences flanking the ROI in both directions in the sequencing step (Fig. 3B). PCR-free NanoRanger is not limited by the maximal DNA lengths that can be amplified using PCR. It thus can take advantage of the ultra-long reads of nanopore sequencing to efficiently analyze even larger regions of the suspected mutation locus. Because of this longer range, PCR-free NanoRanger is referred to herein as, LongRanger. LongRanger has the advantage of preserving the native DNA and the epigenetic information on the DNA. For instance, DNA methylation (a form of epigenetic modification of DNA) patterns are relevant to cancers (e.g., MHL1 epimutation in colon cancer), X-chromosome inactivation, imprinting disorders (e.g., Prader-Willi syndrome, Angelman syndrome, Beckwith-Wiedemann syndrome, etc.) and aging. LongRanger coupled with long-read sequencing technologies capable of detecting DNA methylation can analyze changes in DNA sequence and in DNA methylation pattern simultaneously, thereby making it uniquely suitable for analyzing imprinting disorders and cancers where mutations and abnormal DNA methylation (aka. epimutation) can both be the cause (Fig. 3B). Multiplexing of LongRanger can be achieved by using multiple nucleases, each targeting one wild-type DNA sequence near an ROI, to cut the self-ligated DNA circles. These cleavages will proceed independently of each other and simultaneously enrich multiple ROI’s for sequencing. For example, CAS9 complexed with a pool of single-guide RNAs that are designed for different sequences can be used for this purpose (Fig. 3B). The disclosed compositions and methods can be used to detect a structural variation in any gene of interest involved in disease pathologies commonly found in large Mendelian genomics 19 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT programs, including neurodevelopmental disorders, dysmorphic/congenital malformation syndromes, inborn errors of metabolism, hematological conditions, immunological disorders, ophthalmological diseases, audiological disorders, pulmonary conditions, gastrointestinal issues, connective tissue-related disorders, cardiovascular diseases, skeletal abnormalities, reproductive disorders, and renal conditions. Exemplary conditions and implicated genes (and structural variation) include, but are not limited to Bardet–Biedl syndrome (BBIP1; chr10:110898918- 110903058del; chr10:110903059-110907644inv; and chr10:110907645-110907833del); Severe upper and lower limb defects (LMBR1; chr7:156699499-156799477del); Retinitis pigmentosa (MERTK-chr2: 111980501-111989438del; PHYH- chr10: 13282200-13284270del; BBS9- chr7: 33255115-33263754del); Syndromic microcephaly (VPS13B; chr8: 99501255-99689552del); Spastic paraplegia (AP4S1; chr14: 31071709-31073690del); and Atypical hemolytic uremic syndrome (CFHR1 and CFHR3; chr1:196749230- 196832818del). In some forms, the disclosed method detects chr10:110898918-110903058del in the BBSome Interacting Protein 1 coding gene, BBIP1, and the method employs inverse primer 1 and invers primer 2 as disclosed herein. Examples MATERIAL AND METHODS Human subjects All human subject samples were collected from King Faisal Specialist Hospital & Research Center (KFSHRC). The research complies with all relevant ethical regulations at KFSHRC. The study of human samples collection was approved by the Institutional Review Board (IRB) of KFSHRC and KAUST Institutional Biosafety and Bioethics Committee (IBEC). Patients with a suspected Mendelian disease are included in the Mendelian genomics cohort. Informed consent is obtained prior to enrollment. The patients in the cohort were named by collected year and serial numbers. The consent covers the use of human fetal material, and for the generation and use of patient-derived cell-lines (LCLs and Fibroblasts) whenever it is needed. The authors affirm that human research participants provided written informed consent for the publication of identifiable data. Prior testing strategies Cases recruited prior to 2012 were analyzed using next-generation-based multi-gene panels relevant to their clinical phenotype. Negative cases and cases recruited after 2012 were submitted for ES. Select negative cases after ES were analyzed using optical genome mapping. MLPA was 20 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT requested clinically as appropriate by the ordering physicians from various CAP-accredited laboratories, and their results were recorded. DNA isolation DNA applied for Nanopore sequencing was isolated from fibroblast cells using standard methods. Control DNA was isolated from HEK 293T cells (ATCC, Cat#CRL-1573) using standard methods. Fixed lymphocyte cell lines were prepared and preserved for TLA-based Nanopore sequencing. Long-range PCR Each PCR reaction solution (50 μl) consisted of ~50 ng genomic DNA, 10 μl 5X PrimeSTAR GXL Buffer, 4 μl dNTP Mixture, 1 μl primer each (final concentration 10pmol each) (see Table 13), 1 μl PrimeSTAR® GXL DNA Polymerase, and sterile purified water to 50 μl. The PCR parameters were set following the manufacturer’s instructions. PCR products were qualified with gel electrophoresis and purified by AMPure XP beads (Beckman Coulter, cat. no. A63882). TLA TLA was performed as previously described
9. Briefly, patient derived cells were crosslinked by formaldehyde followed by restriction enzyme NlaIII digestion. The digested cells were further ligated by T4 DNA ligase and reverse crosslinked for DNA purification. A second restriction enzyme NspI digestion was conducted to trim the large chimeric DNA to small fragments of approximately 2 kb, and intramolecular ligated with T4 DNA ligase. After ligation the DNA was purified and used as template to amplify the gene locus of interest by anchor primers. Partial digestion and ligation For the samples implemented by NanoRanger, the digestion reaction was performed with 200ng DNA in a 12.5μl reaction by following the manufacturer’s instructions except that the digestion time was reduced to 5 minutes to achieve stochastic partial digestion that yielded longer restriction DNA fragments. The reaction was then inactivated by incubating for 20 min at 80C, and then placed on ice for 1 min. The ligation reaction of the digested DNA was performed by using T4 DNA ligase following the manufacturer’s instructions. Increasing the reaction volume 10 times is recommended to avoid random ligation events. The DNA was purified by 0.8X AMPure XP beads. NanoRanger PCR For the samples implemented by NanoRanger, the PCR reaction was composed of 10 μl 5X PrimeSTAR GXL Buffer, 4 μl dNTP Mixture, 1 μl inverse primer each (final concentration 10pmol each) (see Table 2), 1 μl PrimeSTAR® GXL DNA Polymerase, 10ng purified DNA and sterile 21 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT purified water to 50 μl. The PCR parameters were set as 95 ̊C 2 min, repeat 30 times for 98 ̊C 10 s and then 68 ̊C 10 min, 68 ̊C 5 min, 4 ̊C hold (following the manufacturer’s instructions with the longest recommended extension time). Table 2: Primers used in this study Medical case T-LRS Primer Sequence ID technology 1
7DG0332 LR-PCR LR-PCR forward primer AGCATTATAAGAGCCGATGGAG (SEQ ID NO:1) LR-PCR reverse primer CACGAGCAACCAGCATGTAG (SEQ ID NO:2) Sanger Genotyping forward primer CGCCTCCAACTCTCAAAGCAA (SEQ ID sequencing NO:3) Genotyping reverse primer TGCCTCTAACCTCAACTTATT (SEQ ID NO:4) 0
9DG01213 LR-PCR LR-PCR forward primer CTGCCTTCCACAGGAGAATGT (SEQ ID NO:5) LR-PCR reverse primer CCGAGTACTCCAATTAGGCGG (SEQ ID NO:6) Sanger Genotyping forward primer GAAGAACCGCTGGGTATGGA (SEQ ID sequencing NO:7) Genotyping reverse primer GGGCTACCACATTCCCAAGG (SEQ ID NO:8) 1
9DG0075 TLA TLA inverse primer 1 CTGGCTGAAATTCTCCCCGCCTT (SEQ ID NO:9) TLA inverse primer 2 CCTTCGTGAGGAAGACCAGCATCT (SEQ ID NO:10) Sanger Genotyping forward primer ACCAGCCTGAAGCTCAAAGAGGGC sequencing (SEQ ID NO:11) Genotyping reverse primer AGGGTGGATGCTCACGGCTCCT (SEQ ID NO:12) 10DG0002/ NanoRanger NanoRanger inverse ACAGCCTATGCCCCATTTTGG (SEQ ID 10DG0003/ primer 1 NO:13) 10DG0092 NanoRanger inverse CGAAGGAGATGGAGGTCGTC (SEQ ID primer 2 NO:14) Sanger Genotyping forward primer 1 CATGATGGTTTATCCACAGGTCCA (SEQ sequencing ID NO:15) Genotyping reverse primer 1 GCTGCTGTTGAGATACTGTGC (SEQ ID NO:16) Genotyping forward primer 2 AGTTCAGAGTAGCTGGAGTTGC (SEQ ID NO:17) Genotyping reverse primer 2 TCCTCCCTTTCAGTGGCATATCA (SEQ ID NO:18) 14DG0861 NanoRanger NanoRanger inverse GCTTGGAGCATAAGGATGACACA (SEQ primer 1 ID NO:19) NanoRanger inverse GAACTTAGGGAGATGGCTGGA (SEQ ID primer 2 NO:20) Sanger Genotyping forward primer GTCACCTCTTGAATGCTTTGCT (SEQ ID sequencing NO: 21) Genotyping reverse primer GACCACAGGCAGAATGGGCTTA (SEQ ID NO: 22) 07-00796/ NanoRanger NanoRanger inverse GAGGGGGATGTGGAGAGACT (SEQ ID 07-00462 primer 1 NO: 23) NanoRanger inverse GGGCGATTCCCAGCACAAGT (SEQ ID primer 2 NO: 24) 22 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Sanger Genotyping forward primer GTCCTGCTCCAGGAATTAAACGT (SEQ sequencing ID NO: 25) Genotyping reverse primer AGCCGTTGCCATCATTCTGA (SEQ ID NO: 26) 10DG1265 NanoRanger NanoRanger inverse GGAGGGTTTTGGGAACCCTTGT (SEQ ID primer 1 NO: 27) NanoRanger inverse AAAGGCCCAGAGAAGTGAGGC (SEQ ID primer 2 NO: 28) Sanger Genotyping forward primer CTGTAACAGAGCCCAAGGCA (SEQ ID sequencing NO: 29) Genotyping reverse primer AATGGGCTGCTTCCCTTACC (SEQ ID NO: 30) 09DG00509 NanoRanger NanoRanger inverse GCAGGAATGTGATACCATGGAGC (SEQ primer 1 ID NO: 31) NanoRanger inverse ACACCACTATTGAGGAGGTCAAAGG primer 2 (SEQ ID NO: 32) Sanger Genotyping forward primer CCTGTTGGCGATTTGTATGTCTTAT sequencing (SEQ ID NO: 33) Genotyping reverse primer GCAAAGCAGTCAGATGGAGTAG (SEQ ID NO: 34) 15DG1177/ NanoRanger NanoRanger inverse GGCATGTCTGTGGTAATGAGAG (SEQ 15DG1178 primer 1 ID NO: 35) NanoRanger inverse GCAACCTCAGAAGGAGGCCC (SEQ ID primer 2 NO: 36) Sanger Genotyping forward ACGGACAGCAAAGTTTGGGA (SEQ ID sequencing primer NO: 37) Genotyping reverse TCTCAGGCACCATTGTGGTC (SEQ ID primer NO:38) 1
2DG0797 NanoRanger NanoRanger inverse TTACAGGCCAGCACGATTCAT (SEQ ID primer 1 NO: 39) NanoRanger inverse CCAAGCCCAGAAGCAGGTAG (SEQ ID primer 2 NO: 40) Sanger Genotyping forward GCCAATTTGTGGTAAAGTCAGGAA sequencing primer (SEQ ID NO: 41) Genotyping reverse GCATTTCCACATCACAGACCA (SEQ ID primer NO: 42) 2
0DG1339 NanoRanger NanoRanger inverse ACAATGAGCCTCAGAAGCTGT (SEQ ID primer 1 NO: 43) NanoRanger inverse GTCGGGGTAAAAGTTAGGGTTT (SEQ primer 2 ID NO: 44) 0
9DG01002 Sanger Genotyping forward GTCTCCTGCCTTGGGTACAA(SEQ ID sequencing primer NO: 45) Genotyping reverse ACCCTTCACAGCTACATTCACAT (SEQ primer ID NO: 46) 23 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Selection of adaptive sampling target regions 16 Bardet Biedl gene loci were selected as gene regions of interest. DNA size distribution was checked using the Agilent Femto Pulse Systems. DNA sample purity was checked using NanoDropTM 8000 spectrophotometer. For each experiment, The read-N10 of the library was set for the buffer size on either side of the regions of interest. The target bed file was uploaded to MinKNOW software ahead of adaptive sequencing. Library preparation and Nanopore sequencing The native DNA and the products thereafter were quantified using Qubit. Sample products were barcoded and library-prepared using the ONT Ligation Sequencing Kit (SQK-LSK109) and Native Barcoding Expansion 1-12 Kit (EXP-NDB104) following the manufacturer’s instructions (available from https://community.nanoporetech.com/protocols), expect that all elutions were done for 10 min at 37 C. For each library, approximately 50 fmol was loaded onto a Release 9.4.1 flow cell for sequencing on an ONT MinION sequencer. For the samples implemented by adaptive sampling, shearing step was not performed due to the already appropriate fragment size. Approximately 1.2 ug of genomic DNA was used to make sequencing libraries. Sequencing experiments were run for up to 72 h. DNA libraries were loaded onto the same flow cell after washing approximately every 24 h to increase output. Sequence analysis FASTQ files were generated using guppy 6.3.4. Sequencing results generated by using NanoRanger approach were analyzed by pyNanoRanger. pyNanoRanger requires input parameters including restriction enzyme sites, inverse primer sequences. pyNanoRanger can operate either in real-time or post-sequencing. It continuously monitors the Nanopore sequencing output folder and processes newly generated fastq files for analysis, sorting them into new folders for each sample in multiplex experiments. If the Nanopore demultiplexing tool Guppy is used, pyNanoRanger performs additional demultiplexing to ensure accurate read classification. pyNanoRanger will select reads that contain the anchor sequences for further analysis. Filtered reads are sorted based on the number of restriction sites they contain and processed accordingly. As sequencing progresses, pyNanoRanger updates sequencing statistics by merging new and existing results. Filtered high-quality long reads were aligned to GRCh38 using minimap2 (v.2.17) with default parameters, enabling users to generate and compare consensus sequences with reference sequences. The complex structural variations were identified by searching the variant files generated by Sniffles (v.1.0.12) for 24 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT structural variations that occurred near the suspected mutant locus and by visually checked with IGV (v.2.16.0). Genotyping Uncovered breakpoints were validated by performing PCR and Sanger sequencing. Primer sequence information is shown in Table 2 (or Table 13). PCR reactions were performed by following the manufacturer’s instructions of the 2× Platinum SuperFi PCR Master Mix (Invitrogen, cat. no. 12358010). Large-scale singleplex- and multiplex-genotyping The large-scale genotyping assays were performed on 1,000 Saudi individuals sourced from the Human Knockout Project. The PCR reaction was composed of 0.2 mL Hot Star Taq polymerase (Qiagen, Cat#203205), 2.5 mL of 10X Buffer, 2 mL 2.5 mM dNTP Mixture, 1 mL of each primer (final concentration 10 pmol each) (Table 13), 30 ng purified DNA and sterile purified water to 25 mL. The PCR parameters were set as 95
oC 10 min, repeat 30 times for 95
o C 30 s, 62
oC 30 s, and then 72
oC 1 min, final extension was set as 72
oC 10 min, and then 4
oC hold. Data availability and Code availability Raw sequencing data are available in the SRA database (accession ID PRJNA1023960), which is accessible with the following link:https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1023960. pyNanoRanger is available at https://github.com/YingziZhanggithub/ pyNanoRanger RESULTS To systematically explore the utility of T-LRS in genetic diagnosis a cohort of Saudi patients (Table 1) were assembled who had been diagnosed with diverse recessive genetic disorders but all lacked molecular diagnoses. The study encompassed a wide spectrum of disease pathologies commonly found in large Mendelian genomics programs, including neurodevelopmental disorders, dysmorphic/congenital malformation syndromes, inborn errors of metabolism, hematological conditions, immunological disorders, ophthalmological diseases, audiological disorders, pulmonary conditions, gastrointestinal issues, connective tissue-related disorders, cardiovascular diseases, skeletal abnormalities, reproductive disorders, and renal conditions. 25 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Table 1 Case summary. Suspected variant(s) Molecular Sex Prior Medical Age Clinical Disease Affected suggested T-LRS diagnosis by (ascribed clinical case ID (years) features name gene(s) by prior technology T-LRS at birth) tests clinical (hg38) testing Delayed speech and All prior language tests failed developme to resolve nt; breakpoint Intellectua s. l SRS- Genotypin disability; based g array Delayed multi- suggested fine motor gene homozygo developme panels; us deletion nt; chr4: Neuronal exome of exon 4 Stereotypy 127945278- 17DG033 ceroid sequenci in LR-PCR Male 9 ; Autism; MFSD8 127954112 2 lipofusci ng; NM_1527 (1/1) Poor eye del; 8-bp nosis genotypi 78.2 contact; insertion ng (MFSD8); Visual array; Optical fixation optical genome instability; genome mapping Dysphagia mapping suggested ; chr4: Cerebellar 12792432 atrophy; 2- Seizures; 12795586 Generalize 2 del. d hypotonia 19DG007 A Usher USH1C and SRS- All prior TL chr11: Male 16 5 Sensorineu syndrome ABCC8 based tests failed A (1/1) 17409987- 26 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT ral hearing and multi- to resolve 17532675 impairmen hyperinsu gene breakpoint del t; Rod- linism panels; s. cone exome Genotypin dystrophy; sequenci g array Hyperinsul ng; suggested inemia; genotypi a Hypoglyce ng array homozygo mia; Gait us deletion disturbanc in USH1C, e with all exons missing except for exon 1; in ABCC8, exon 3-17 and 19-21 were suggested missing, All prior tests failed to resolve SRS- breakpoint Obesity; based s. Retinitis multi- Genotypin pigmentos gene g array chr3: a; Bardet- Adaptive 09DG010 panels; suggested 97790092- Male 18 Polydactyl Biedl ARL6/BBS3 sampling 02 exome homozygo 97794213 y; Typical syndrome (1/2) sequenci us deletion del facies; ng; encompass Hypogenit genotypi ing alism ng array NM_0321 46.3 (ARL6/BB S3): 27 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT c.(480- 1700_535 +2392 del), r.(480_53 5 del). All prior tests failed to resolve Obesity; breakpoint SRS- chr10: Developm s. based 110898918- ental Genotypin multi- 110903058 delay; g array NanoRange gene del; chr10: Retinitis Bardet– suggested r (1/13) 10DG000 BBIP1/BBS panels; 110903063- Male 23 pigmentos Biedl a /Adaptive 2 18 exome 110907644 a; syndrome homozygo sampling sequenci inv; chr10: Polydactyl us deletion (2/2) ng; 110907645- y; Atopy; of exon 3 genotypi 110907830 Hypogenit and 4 in ng array del alism NM_0011 95305.1 (BBIP1/B BS18). Congenital All prior SRS- glaucoma; tests failed based Hypotonia to resolve multi- chr4:120200 ; breakpoint gene 167- Buphthalm s. Optical possible panels; 121299936 os; genome 14DG160 de Barsy exome NanoRange del; Male 10 Decreased PRDM5 mapping 2 Syndrom sequenci r (2/13) chr4:121299 corneal suggested e ng; 937- thickness; chr4: genotypi 121683937 Jaundice; 12019763 ng inv Poor head 2- array; control; 12130637 optical Relative 4 del and 28 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT macroceph genome chr4: aly; mapping 12130256 Decreased 3- body 12167737 weight; 4 inv. Abnormal delivery; Decreased corneal thickness; Joint hypermobi lity; Corneal opacity All prior tests failed to resolve breakpoint Intellectu SRS- s. Exome Visual al based sequencin fixation developm multi- chr8: g 18DG013 instability; ental gene NanoRange 15644700- Male 4 TUSC3 suggested 5 Infantile disorder, panels; r (3/13) 15692773 homozygo muscular autosoma exome del us deletion hypotonia l sequenci of exons recessive ng 3-6 in NM_0067 65 (TUSC3). Abnormali Severe SRS- All prior ty of the upper based chr7: tests failed 14DG086 heart and multi- NanoRange 156699499- Female 11 LMBR1 to resolve 1 valves; lower gene r (4/13) 156799477 breakpoint Abnormali limb panels; del s. ty of limbs defects exome 29 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT sequenci Genotypin ng; g array genotypi suggested ng homozygo array; us deletion optical in genome chr7q36.3 mapping (chr7: 15649024 1- 15659203 5); Optical genome mapping suggested chr7: 15669467 3- 15680035 1 del. All prior tests failed SRS- to resolve based breakpoint multi- s. gene Genotypin panels; g array exome chr2: Retinitis suggested Rod-cone sequenci NanoRange 111980501- 07-00796 Male 41 pigmento MERTK homozygo dystrophy ng; r (5/13) 111989438 sa us deletion genotypi del of exon 8 ng in array; NM_0063 optical 43.2 genome (MERTK); mapping Optical genome 30 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT mapping suggested chr2: 11197417 0- 11199328 5 del. All prior tests failed to resolve SRS- breakpoint based s. multi- Genotypin gene g array chr2: Retinitis Rod-cone panels; suggested NanoRange 111980501- 07-00462 Male 25 pigmento MERTK dystrophy exome a r (6/13) 111989438 sa sequenci homozygo del ng; us deletion genotypi of exon 8 ng array in NM_0063 43.2 (MERTK). All prior SRS- tests failed based to resolve multi- breakpoint gene s. chr10: Retinitis panels; Exome 10DG126 Rod-cone NanoRange 13282200- Male 29 pigmento PHYH exome sequencin 5 dystrophy r (7/13) 13284270 sa sequenci g del ng; suggested optical a genome homozygo mapping us deletion of exon 6 31 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT in PHYH. Optical genome mapping suggested chr10: 13281451- 13297324 del. All prior tests failed to resolve breakpoint s. Exome sequencin g suggested SRS- NM_1984 based Retinitis 28.2 multi- pigmentos (BBS9): gene a and other Bardet- c.(443- hr7: panels; 09DG005 typical Biedl BB 1675_443- NanoRange 33255115- Female 46 exome 09 features of Syndrom S9 1116)_(61 r (8/13) 33263754 sequenci Bardet- e 8- del ng; Biedl 986_618- optical Syndrome 508) del; genome r.442+3_7 mapping 04 del (incl ex.6-7); Optical genome mapping suggested chr7: 33253077- 32 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT 33264805 del. All prior tests failed to resolve breakpoint s. SRS- Genotypin based g array multi- suggested gene homozygo panels; Microceph us deletion chr8: Syndromi exome aly; of exon 99501256- 15DG117 c sequenci NanoRange Male 13 abnormal VPS13B 25-35 in 99689552 7 microcep ng; r (9/13) facial NM_1525 del; 3-bp haly genotypi shape 64.4 insertion ng (VPS13B); array; Optical optical genome genome mapping mapping suggested chr8: 99500374- 99693820 del. All prior SRS- tests failed based to resolve multi- Microceph breakpoint chr8: Syndromi gene aly; s. 99501256- 15DG117 c panels; NanoRange Male 18 abnormal VPS13B Genotypin 99689552 8 microcep exome r (10/13) facial g array del; 3-bp haly sequenci shape suggested insertion ng; a genotypi homozygo ng array us deletion 33 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT of exon 25-35 in NM_1525 64.4 (VPS13B). All prior Mitochond tests failed rial to resolve abnormalit breakpoint ies; s. Neurodege SRS- Exome nerative based sequencin disease multi- g with gene suggested chr14: severe Spastic panels; homozygo 12DG079 NanoRange 31071671- Male 19 MR; paraplegi AP4S1 exome us deletion 7 r (11/13) 31073796 Spastic a sequenci of exon 4 del quadripare ng; in AP4S1; sis; optical Optical Seizures genome genome which mapping mapping resolved suggested with chr14: microceph 31071319- aly 31079171 del. Cholestasi Cholestas All prior SRS- s; Stage 5 is with tests failed based chronic high to resolve multi- kidney gamma- breakpoint chr6: gene 17DG096 disease; glutamyl s. Exome NanoRange 24357210- Female 20 DCDC2 panels; 7 Elevated transpepti sequencin r (12/13) 24357528 exome gamma- dase and g del sequenci glutamyltr end-stage suggested ng; ansferase renal NM_0011 optical activity; disease 95610.1 34 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Recurrent genome (DCDC2): urinary mapping c.223_293 tract del; infections Optical genome mapping suggested chr6:2435 5708- 24363595 del. MLPA suggested combined Stage 5 homozygo chronic SRS- us deletion kidney based of both disease; multi- CFHR1 Hypertensi Atypical chr1: gene and 20DG133 on; hemolyti CFHR1 and NanoRange 196749230- Male 33 panels; CFHR3; 9 Anemia; c uremic CFHR3 r (13/13) 196832818 exome Optical Secondary syndrome del sequenci genome hyperparat ng; mapping hyroidism; MLPA suggested Hyperkale chr1:1967 mia 48824- 19685380 1 del. LR-PCR and Nanopore sequencing identify large structural variations that were missed by conventional clinical testing In a case of neuronal ceroid lipofuscinosis (17DG0332, Table 1), the initial diagnosis by SNP array indicated a homozygous deletion of exon 4 in the MFSD8 gene, but the breakpoints were not identified. To find the breakpoints, a long-range primer pair was designed for PCR amplification of the locus in question (see Table 2 for details). The amplicon was pooled and sequenced on an Oxford Nanopore MinION flow cell. The 84,929 reads aligned to the MFSD8 gene in the hg38 35 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT revealed an 8.8-kb deletion (chr4: 127945278-127954112del) encompassing exon 4 (FIG. 1A), which matched initial clinical findings. De novo assembly generated a 5006-bp consensus sequence that further revealed an 8-bp inversion at the deletion breakpoint, the exact sequence of which was verified by Sanger sequencing (FIGS. 1B-1C). The structural variation identified by T-LRS in MFSD8 is has not previously identified and not currently listed in the ClinVar database. OGM using Bionano technology was conducted in parallel. Bionano analysis suggested a 31.5 kb deletion (chr4:128845477-128877017 in hg19, transferred to chr4:127924322-127955862 in hg38) but could not provide the breakpoint sequence. Moreover, primers suggested by the Bionano breakpoints failed to produce any PCR product. After T-LRS resolved the structural variation , it became clear that Bionano analysis misplaced the proximal breakpoint by 1.8 kb and the distal breakpoint by 21.0 kb, which misled the primer design (FIG. 1A). A carrier screening for the MFSD8 structural variation was performed in 1,000 healthy Saudi individuals (genotyping PCR primer sequences listed in Table 2). The screening found four individuals carries the structural variation , suggesting that it may be a founder mutation in the Saudi population. In a separate case of congenital glaucoma (09DG01213, Table 1), molecular karyotyping revealed a homozygous deletion of 20 bp in the CYP1B1 gene (NM_000104.3:c. 1517_1536del:p. (Ser506Thrfs*5)). Bionano OGM failed to identify any variant. Conventional PCR followed by nanopore sequencing confirmed the 20-bp deletion (chr2: 38071221-38071240del) (FIG. 5). No other variants were present in the 3588-bp region covered by the amplicon. TLA and Nanopore sequencing identify large structural variations that missed by clinical testing Despite the two successes of the LR-PCR strategy, it failed to resolve most cases in cohort used in this study. The failures lay in the difficulty in designing effective PCR primers to amplify the genomic region containing the mutation. Typically, several pairs of primers are designed to flank the suspected lesion in increasing distances, but the range covered by this approach is limited by the upper limit of LR-PCR (~30 kb, beyond which positive control cannot be obtained, making it impossible to interpret negative PCR results). This strategy therefore is ineffective for large structural variations, as the trial-and-error approach would consume excessive labor, time, and samples. TLA, on the other hand, can capture a locus of up to a few hundred kilobases in one go, thus offering a more efficient strategy to resolve large structural variations. 36 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT One cohort used in this study has a founder deletion in Arab population, affecting portions of the USH1C and ABCC8 genes. Patients with this homozygous deletion exhibited a complex clinical picture, combining severe hyperinsulinemic hypoglycemia—which necessitated pancreatectomy— with progressive vision loss. However, the absence of precise coordinates for this deletion hindered the development of a screening assay for unaffected carriers, perpetuating the risk of having children afflicted by this debilitating combination of symptoms. SNP array analysis of one case (19DG0075, Table 1) initially suggested a homozygous deletion of ABCC8 exon-3 to -17, and -19 to -21, and ABCC8 exon-1, -18, and -22 to -39 were suggested to be present. Several LR-PCR attempts failed to amplify specific products from patient DNA (case). Fortunately, a lymphoblastoid cell line (LCL) was available for this case, which allowed TLA using primers anchored flanking the exon-21 region, positioned away from the suggested deletion, followed by Nanopore sequencing. Analysis of the nanopore reads captured by TLA showed a deep coverage spanning 318-kb around the ABCC8 locus and an absence of reads in a 123-kb region encompassing parts of the ABCC8 and USH1C genes, suggesting a large deletion (FIG. 2A). A short PCR genotyping assay (oligonucleotide sequences listed in table 2) was designed to verify the breakpoints and screen for carriers of the deletion in 1000 healthy Saudi individuals (FIGS.2A-2C). This case study demonstrated the advantage of TLA over LR-PCR in efficiently capturing and amplifying large genomic regions, extending up to hundreds of kilobases, thereby facilitating the detection of large deletions. However, TLA as a genetic testing strategy has several intrinsic limitations. Firstly, the capture frequency (i.e., sequencing coverage) of any DNA fragment by the anchor depends on their physical proximity fixed by crosslinking, which means copy number variations and duplications will be difficult to interpret. Secondly, TLA reads are from ligation- mediated PCR of crosslinked restriction-digestion fragments and thus require split read mapping, which confounds analysis of many types of structural variations such as inversions, translocations, and complex rearrangements. Lastly, TLA requires a large number (5-10 x 106) of live cells and a long and complex workflow that typically lasts 5-7 days from TLA to nanopore sequencing–both unrealistic demands in clinical settings. 37 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT NanoRanger accurately and rapidly identifies complex SVs It was reasonable that that an ideal T-LRS strategy should retain the efficiency of TLA but avoid its pitfalls. Since most issues of TLA discussed in the preceding paragraph originate from the crosslinking of chromatinized DNA, a strategy that rapidly captures large neighboring genomic regions without crosslinking was invented. This strategy, when coupled with nanopore sequencing is called nanopore-based rapid acquisition of neighboring genomic regions (NanoRanger). NanoRanger offers targeted sequencing of extensive genomic regions—ranging from tens to hundreds of kilobases—in the candidate locus suggested by conventional clinical genetic testing methods. This is achieved through a combination of partial restriction digestion, ligation-mediated inverse PCR, and longread sequencing techniques (FIG. 3A). First, inverse PCR primers are designed within expected wild-type sequences (the “anchor ” sequences) in the sample (through a priori knowledge or testing) that are near the suspected mutation locus. Genomic DNA is extracted from patient samples using routine methods without crosslinking. An appropriate restriction enzyme is then selected to shear the genomic DNA into large fragments, typically ranging from a few kilobases to tens of kilobases. This shearing is accomplished through a brief (~ 5 min) partial digestion using the selected enzyme to allow for the production of restriction fragments of diverse sizes. The resulting restriction fragments are self- circularized using DNA ligase under conditions favoring intra-molecular ligation. These circularized restriction fragments are then used for inverse PCR amplification of DNA regions adjacent to the suspected breakpoints. Circularized restriction fragments of different sizes (resulting from deliberate partial digestions) are amplified by the same pair primers to improve the coverage of the locus. This strategy eliminates the tedious trial-and-error primer testing in LR-PCR, because any changes to the wild-type restriction sites, be it elimination due to deletion or creation due to rearrangements, would be captured by the partial digestion, self-circularization, and inverse PCR steps. Finally, the amplicons are sequenced using long-read sequencing technologies, such as Nanopore and PacBio (FIG. 3A). A bioinformatics pipeline called pyNanoRanger is tailored to handle NanoRanger sequencing data analysis (see Material and Methods) (FIG. 7). The efficacy of NanoRanger was tested using a case of Bardet–Biedl syndrome (10DG0002, Table 1). Previous testing by SNP array suggested homozygous deletions of exon 3 and exon 4 in NM_001195305.1 without definitive breakpoint coordinates. Inverse PCR primers, tailored to the suspected mutation locus, were designed (Table 2, FIG. 4A). PstI was selected as the restriction 38 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT endonuclease for partial digestion. 200 ng of each genomic DNA samples of the proband and his parents (collectively termed the "trio" in this context) were processed using the NanoRanger protocol (see FIG. 3A). For each sample, the digestion reaction was performed with 200ng DNA in a 12.5µl digestion reaction by following the manufacturer’s instructions except that the digestion time was reduced to 5 minutes to achieve stochastic partial digestion that yields longer restriction DNA fragments. The reaction was then inactivated by incubating for 20 min at 80C, and then placed on ice for 1 min. The ligation reaction of the digested DNA was performed by using T4 DNA ligase following the manufacturer’s instructions. The DNA was purified by 0.8X AMPure XP beads. For the case in this proof-of-concept study, the primer sequences for inverse PCR were designed as 5’- ACAGCCTATGCCCCATTTTGG-3’ (SEQ ID NO:13) and 5’- CGAAGGAGATGGAGGTCGTC3’(SEQ ID NO:14). The PCR reaction was composed of 10 μl 5X PrimeSTAR GXL Buffer, 4 μl dNTP Mixture, 1 µl primer each (final concentration 10pmol each), 1 μl PrimeSTAR® GXL DNA Polymerase, 10ng purified DNA and sterile purified water to 50 μl. The PCR parameters were set as 95 ̊C 2 min, repeat 30 times for 98 ̊C 10 s and then 68 ̊C 10 min, 68 ̊C 5 min, 4 ̊C hold. To label individual DNA molecules, a UMI primer should be used to label one end of the known genomic region by following the PCR parameters: 98 ̊C 1 min, 70 ̊C 5 s, 69 ̊C 5 s, 68 ̊C 5 s, 67 ̊C 5 s, 66 ̊C 5 s, 65 ̊C 5 s, 72 ̊C 5 min, 4 ̊C hold. After that, the labeled DNA samples are purified by 1X AMPure XP beads to remove the primers. Then a universal primer and a reverse primer were used to amplify all labeled DNA by following the manufacturer’s instructions with the longest recommended extension time. Gel electrophoresis of the inverse PCR products revealed a different pattern of bands in the trio samples as compared with a healthy control (FIG. 4B). The inverse PCR products from the three individuals of the family were purified, then barcoded and library-prepared using the ONT Ligation Sequencing Kit (SQK-LSK109) and Native Barcoding Expansion 1-12 Kit (EXP- NDB104) following the manufacturer’s instructions (available from https://community.nanoporetech.com/protocols). The final library was loaded onto a Release 9.4.1 flow cell for sequencing on ONTMinION sequencer. The sequencing data were processed and analyzed using pyNanoRanger. The long reads facilitated sequence alignment and phasing, which revealed that the proband is homozygous for the mutant haplotype, whereas the parents each has one mutant allele and one wild-type allele (FIG. 4C). De novo assembly using NanoRanger generated a consensus of 12-kb, 39 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT identified two previously unidentified breakpoint junctions, a 4.6-kb inversion and two deletions, one of 189bp and the other of 4.1-kb, respectively (FIG. 4C). These junctions were further verified by Sanger sequencing (FIG. 4D). NanoRanger provided high sequencing coverage that precisely pinpointed the breakpoints at base-level resolution. This proof-of-concept experiment demonstrated that NanoRanger can efficiently target DNA regions suspected of containing unmapped inherited disease variants, facilitating the resolution of large SVs in patients with genetic diseases. A genotyping assay was further designed based on the NanoRanger breakpoints and screened 1,000 healthy individuals for carrier of the mutation. Only the sibling and the parents were identified as carriers (see FIG. 8). NanoRanger is widely applicable to diverse clinical cases Next, it was considered if the success of NanoRanger could be generalizable to different gene loci and to SVs of various sizes. To this end, the efforts were focused on the remaining cases from the Saudi genetic disorder cohort (Table 1). These individuals had been diagnosed with a range of recessive genetic disorders but were all without precise molecular diagnoses because the coordinates of their rearrangements could not be determined by other technologies. Consider the case 09DG00509: the patient, presenting symptoms indicative of retinitis pigmentosa, Bardet-Biedl Syndrome, or Sjogren-Larsson Syndrome, had previously undergone a series of diagnostic tests, including NGS multi-gene panels, ES, and OGM. Unfortunately, all these prior tests were unable to pinpoint the breakpoints. While ES suggested a deletion encompassing exons 6 and 7, represented as NM_198428.2: c.(443-1675_443-1116)_(618-986_618-508)del; r.442+3_704del, OGM indicated a deletion at chr7:33253077-33264805 (approximately 11kb in length) (FIGS. 9A-9B). With NanoRanger a 8,640-bp deletion (chr7: 33255115-33263754) involving the BBS9 gene was identified (FIGS. 9A-9B). Remarkably, it took only 40 minutes of nanopore sequencing to uncover the breakpoint with a read depth of 10,973, showcasing the efficiency of NanoRanger in resolving complex cases. In the case labeled 14DG0861, characterized by severe defects in both upper and lower limbs, a series of diagnostic tests including NGS multi-gene panels, ES, molecular karyotyping, and OGM were unsuccessful in identifying the breakpoints. Molecular karyotyping of the affected individual and their parents indicated a homozygous deletion in chr7q36.3, approximately 100kb in length (chr7:156490241-156592035), encompassing LMBR1 (NM_022458.3), while OGM suggested a deletion at chr7:156694673-156800351. Genotyping PCR based on these suggested 40 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT breakpoints failed to produce any product. NanoRanger identified a 99,979-bp deletion at chr7:156699499-156799477. Similarly, NanoRanger identified base-resolution breakpoints in seven more cases that had undergone various conventional genetic tests (e.g., NGS multi-gene panels, ES, molecular karyotyping, OGM, and Multiplex Ligation-dependent Probe Amplification (MLPA)) but remained unresolved. These included two cases (07-00462 and 07-00796) of retinitis pigmentosa with a 8,938-bp deletion involving the MERTK gene, one case (10DG1265) of retinitis pigmentosa with a 2,071-bp deletion involving the PHYH gene, two cases (15DG1177 and 15DG1178) of syndromic microcephaly with a 188,298-bp deletion involving the VPS13B gene, one case (12DG0797) of spastic paraplegia with a 1,982-bp deletion involving the AP4S1 gene, and one case (20DG1339) of atypical hemolytic uremic syndrome with a 83,589-bp deletion involving the CFHR1 and CFHR3 genes (Table 1). Together, these cases underscore the universal applicability of NanoRanger in elucidating complex genetic anomalies that eluded conventional genetic testing. A summary of the clinical successes accomplished using the disclosed methods is provided in Table A, below. Table A: Summary of clinical successes of the disclosed methods. Suspected variant(s) Molecular diagnosis by Medical Disease Affected Prior clinical tests suggested by prior clinical targeted long read sequencing case ID name gene(s) testing (LRS) (hg38) All prior tests failed to resolve SRS-based multi- breakpoints. chr10: 110898918-110903058 Bardet– 10DG000 BBIP1/B gene panels; exome Genotyping array suggested a del; chr10: 110903063- Biedl 2 BS18 sequencing; homozygous deletion of exon 3 110907644 inv; chr10: syndrome genotyping array and 4 in NM_001195305.1 110907645-110907830 del (BBIP1/BBS18). SRS-based multi- All prior tests failed to resolve gene panels; exome breakpoints. Optical genome possible chr4:120200167-121299936 14DG160 sequencing; mapping suggested chr4: de Barsy PRDM5 del; chr4:121299937- 2 genotyping array; 120197632-121306374 del and Syndrome 121683937 inv optical genome chr4: 121302563-121677374 mapping inv. 41 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Intellectua l All prior tests failed to resolve developme SRS-based multi- breakpoints. Exome sequencing DG0135 ntal TUSC3 gene panels; exome suggested homozygous deletion chr8: 15644700-15692773 del disorder, sequencing of exons 3-6 in NM_006765 autosomal (TUSC3). recessive All prior tests failed to resolve SRS-based multi- breakpoints. Severe gene panels; exome Genotyping array suggested DG086 upper and sequencing; homozygous deletion in chr7: 156699499-156799477 LMBR1 lower limb genotyping array; chr7q36.3 (chr7: 156490241- del defects optical genome 156592035); Optical genome mapping mapping suggested chr7: 156694673-156800351 del. All prior tests failed to resolve SRS-based multi- breakpoints. gene panels; exome Genotyping array suggested Retinitis sequencing; homozygous deletion of exon 8 chr2: 111980501-111989438-00796 pigmentos MERTK genotyping array; in NM_006343.2 (MERTK); del a optical genome Optical genome mapping mapping suggested chr2: 111974170- 111993285 del. All prior tests failed to resolve SRS-based multi- Retinitis breakpoints. gene panels; exome chr2: 111980501- 111989438-00462 pigmentos MERTK Genotyping array suggested a sequencing; del a homozygous deletion of exon 8 genotyping array in NM_006343.2 (MERTK). All prior tests failed to resolve breakpoints. SRS-based multi- Retinitis Exome sequencing suggested a DG126 gene panels; exome pigmentos PHYH homozygous deletion of exon 6 chr10: 13282200-13284270 del sequencing; optical a in PHYH. Optical genome genome mapping mapping suggested chr10: 13281451-13297324 del. 42 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT All prior tests failed to resolve breakpoints. Exome sequencing suggested SRS-based multi- Bardet- NM_198428.2 (BBS9): c.(443-DG005 gene panels; exome Biedl BBS9 1675_443-1116)_(618-986_618- chr7: 33255115-33263754 del sequencing; optical Syndrome 508) del; r.442+3_704 del (incl genome mapping ex.6-7); Optical genome mapping suggested chr7: 33253077-33264805 del. All prior tests failed to resolve SRS-based multi- breakpoints. gene panels; exome Genotyping array suggested Syndromic DG117 sequencing; homozygous deletion of exon chr8: 99501256-99689552 del; microceph VPS13B genotyping array; 25-35 in NM_152564.4 3-bp insertion aly optical genome (VPS13B); Optical genome mapping mapping suggested chr8: 99500374-99693820 del. All prior tests failed to resolve SRS-based multi- breakpoints. Syndromic DG117 gene panels; exome Genotyping array suggested a chr8: 99501256-99689552 del; microceph VPS13B sequencing; homozygous deletion of exon 3-bp insertion aly genotyping array 25-35 in NM_152564.4 (VPS13B). All prior tests failed to resolve breakpoints. SRS-based multi- Exome sequencing suggested DG079 Spastic gene panels; exome AP4S1 homozygous deletion of exon 4 chr14: 31071671-31073796 del paraplegia sequencing; optical in AP4S1; Optical genome genome mapping mapping suggested chr14: 31071319-31079171 del. Cholestasi All prior tests failed to resolve s with SRS-based multi- breakpoints. Exome sequencing DG096 high gene panels; exome suggested NM_001195610.1 DCDC2 chr6: 24357210-24357528 del gamma- sequencing; optical (DCDC2): c.223_293 del; glutamyl genome mapping Optical genome mapping transpepti suggested chr6:24355708- 43 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT dase and 24363595 del. end-stage renal disease MLPA suggested combined Atypical CFHR1 SRS-based multi- homozygous deletion of both DG133 hemolytic chr1: and gene panels; exome CFHR1 and CFHR3; Optical uremic 196749230-196832818 del CFHR3 sequencing; MLPA genome mapping suggested syndrome chr1:196748824-196853801 del. Adaptive sampling vs. multiplex NanoRanger Oxford Nanopore sequencing offers a unique method called adaptive sampling that takes advantage of the independent voltage control of individual sequencing channels and real-time software analysis of sequencing results to achieve target enrichment in native DNA
8,11. Because the targets to enrich can be added in the software with virtually no additional cost, adaptive sampling has the potential to be a simple and efficient T-LRS solution. To compare adaptive sampling with NanoRanger in a realistic clinical scenario, adaptive sampling was applied to the same Bardet– Biedl syndrome (10DG0002) case resolved using NanoRanger. To simulate de novo diagnosis, 16 Bardet-Biedl gene loci was chosen as enrichment targets and conducted a preliminary experiment using a fresh 293T control sample (Table 3). Table 3. Adaptive sampling data for 293T cell sample control, per sequencing targets, coverage, and average read length. T
arget region Gene region of Size of Read Average Adaptive Average interest gene counts of coverage of depth length of region interested interested reads in of gene gene region interested interest region (x) gene region (bp) chr10:110898730- BBIP1 20471 37 17.31 8.83 9,577.80 110919201 chr11:66510606- BBS1 23007 58 19.15 9.77 7,595.30 66533613 44 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT chr3:97764521- ARL6/BBS3 36708 92 21.03 10.73 8,392.10 97801229 chr16:56582667- BBS2 117027 187 11.35 5.79 7,103.20 56465640 chr15:72686207- BBS4 52266 119 20.30 10.35 8,917.10 72738473 chr2:169479480- BBS5 27175 60 19.24 9.81 8,712.20 169506655 chr4:121824329- BBS7 46145 66 12.38 6.31 8,654.20 121870474 chr14:88824153- TTC8/BBS8 56926 89 10.59 5.40 6,774.50 88881079 chr12:76344474- BBS10 3941 13 44.91 22.90 13,613.60 76348415 chr9:116687305- TRIM32/BBS11 13995 35 22.64 11.55 9,052.30 116701300 chr1:32196011- CCDC28B 9442 38 36.81 18.77 9,146.20 32205453 chr12:88049016- CEP290 93072 141 11.32 5.77 7,473.40 88142088 chr8:93754844- TMEM67 77809 147 13.64 6.95 7,217.30 93832653 chr17:58205441- MKS1 14164 25 18.98 9.68 10,753.60 58219605 chr20:10401009- MKKS 33213 59 16.98 8.66 9,559.30 10434222 chr4:122700442- BBS12 44500 75 10.00 5.10 5,933.00 122744942 In the preliminary experiment, sequencing depth of the 16 target regions ranged from 5.10x to 22.90x; the average length of reads in the interested gene regions ranged from 5,933.00 bp to 13,613.60 bp (Table 3). Adaptive sampling to 10DG0002 sample was then applied. The sample has been frozen for a long period of time and was of shorter sequence length contribution. Sequencing depth of the 16 target regions ranged from 2.36x to 8.76x; the average length of reads in the interested gene regions ranged from 1854.00 bp to 2913.00 bp (Table4). 45 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Table 4. Adaptive sampling data for 10DG0002, per sequencing targets, coverage, and average read length. Target region Gene region of Size of Read Average Adaptive Average interest gene counts of coverage of depth length of region interested interested reads in of gene gene region interested interest region (x) gene region (bp) BBIP1 chr10:110898730- 20471 19 2.25 2.36 2426.40 110919201 BBS1 chr11:66510606- 23007 43 3.79 3.98 2028.90 66533613 ARL6/BBS3 chr3:97764521- 36708 65 3.65 3.83 2063.00 97801229 BBS2 chr16:56465640- 117027 202 4.21 4.42 2440.60 56582667 BBS4 chr15:72686207- 52266 99 5.46 5.72 2881.80 72738473 BBS5 chr2:169479480- 27175 58 6.22 6.52 2913.00 169506655 BBS7 chr4:121824329- 46145 73 4.57 4.79 2887.50 121870474 TTC8/BBS8 chr14:88824153- 56926 107 4.53 4.75 2409.20 88881079 BBS10 chr12:76344474- 3941 8 3.76 3.95 1854.00 76348415 TRIM32/BBS11 chr9:116687305- 13995 41 8.36 8.76 2853.90 116701300 CCDC28B chr1:32196011- 9442 24 7.38 7.74 2904.10 32205453 CEP290 chr12:88049016- 93072 170 3.97 4.16 2174.50 88142088 TMEM67 chr8:93754844- 77809 152 4.40 4.61 2251.60 93832653 MKS1 chr17:58205441- 14164 23 4.43 4.64 2725.30 58219605 46 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT MKKS chr20:10401009- 33213 61 4.72 4.94 2567.30 10434222 BBS12 chr4:122700442- 44500 88 4.55 4.77 2300.20 122744942 Despite the low sequencing depth, adaptive sampling reads showed the correct breakpoints as identified by NanoRanger (Table 4). However, the two junctions were covered by only three and four reads, respectively, making it difficult to confidently call the true sequence of the SV solely from adaptive sampling. Another Bardet–Biedl syndrome case (09DG0002, Table 1) was similarly tested. Sequencing depth of the 16 target regions ranged from 3.27x to 4.97x; the average length of reads in the interested gene regions ranged from 1449.60 bp to 2035.20 bp (Table 5). Table 5: Adaptive sampling data for 09DG0002, per sequencing targets, coverage, and average read length. T
arget region Gene region of Size of Read Average Adaptive Average interest gene counts of coverage of depth length of region interested interested reads in of gene region gene region interested interest (x) gene region (bp) chr10:110898730- BBIP1 20471 106 7.93 3.57 1531.80 110919201 chr11:66510606- BBS1 23007 132 10.45 4.70 1821.00 66533613 chr3:97764521- ARL6/BBS3 36708 185 8.20 3.69 1626.50 97801229 47 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT chr16:56465640- BBS2 117027 595 7.37 3.32 1449.60 56582667 chr15:72686207- BBS4 52266 266 7.54 3.39 1482.10 72738473 chr2:169479480- BBS5 27175 140 7.57 3.40 1468.60 169506655 chr4:121824329- BBS7 46145 218 7.27 3.27 1539.40 121870474 chr14:88824153- TTC8/BBS8 56926 292 8.22 3.70 1602.90 88881079 chr12:76344474- BBS10 3941 27 10.11 4.55 1475.10 76348415 chr9:116687305- TRIM32/BBS11 13995 72 8.52 3.83 1656.40 116701300 chr1:32196011- CCDC28B 9442 60 11.05 4.97 1738.30 32205453 chr12:88049016- CEP290 93072 451 7.25 3.26 1495.90 88142088 chr8:93754844- TMEM67 77809 403 7.86 3.54 1518.50 93832653 chr17:58205441- MKS1 14164 72 10.35 4.65 2035.20 58219605 chr20:10401009- MKKS 33213 193 8.68 3.91 1494.10 10434222 chr4:122700442- BBS12
44500 9.39 4.22 122744942 251 1664.30 Adaptive sampling revealed a large deletion in ARL6 covered by four reads, later validated by Sanger sequencing (Table 1). The adaptive sampling process consumed ~1 μg (vs. 200 ng for NanoRanger) of DNA, up to two (vs. 10% of the capacity of one MinION flow cell for NanoRanger) MinION flow cells, and up to 48 hours (vs. 40 mins for NanoRanger) of sequencing, which proved too resource intensive to be practical in its current format. To extend the successes of NanoRanger in single locus to multiple ROIs, multiplexed NanoRanger was tested in one reaction. The inverse primers were chosen for the ABCC8 and BBIP1 48 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT loci, known for their efficacy in TLA and NanoRanger, respectively (Table 2). DNA samples were utilized extracted from wild-type 293T cells and tested four primer concentrations (0.2 uM, 0.3 uM, 0.4 uM, and 0.5 uM). The other steps were processed by following the standard NanoRanger protocol. The gel electrophoresis of the resulting amplicon products revealed consistent patterns (FIG. 7). The PCR products were then purified, barcoded, and sequenced on a MinION flow cell. Total read counts for the four samples were: 236056, 231739, 323003, and 305581, respectively. While total read count generally increased with primer concentration, the effective sequencing depth peaked at the 0.2 uM primer concentration (7448 reads in BBIP1 region and 3379 reads in ABCC8), outperforming the other three conditions (the 0.3 uM condition yielded 1,741 reads in the BBIP1 region and 1,480 reads in the ABCC8 region, the 0.4 uM condition produced 2,215 reads in the BBIP1 region and 1,548 reads in the ABCC8 region, and the 0.5 uM condition resulted in 1,483 reads in the BBIP1 region and 1,227 reads in the ABCC8 region). Under this condition, NanoRanger covered a span of 10kb in BBIP1 region and 15kb in ABCC8 region, while the lowest depth was as deep as 141 at one end and 819 at the other for BBIP1, and 15 at one end and 17 at the other for ABCC8. The results demonstrate the effectiveness of multiplex NanoRanger for at least two gene loci, highlighting NanoRanger’s potential as a first-line test for genetic disorders with candidate genes and as a cost-effective premarital screening test for common genetic disorders. A role of transcription-mediated genome instability in breakpoints In our comprehensive analysis of breakpoints across the spectrum of diseases, we observed a predominant association with mechanisms involving repetitive elements and microhomology mediated end joining (MMEJ) (Table 11). Table 11. Breakpoints and the possible mechanisms. This table provides a detailed summary of the sequences adjacent to breakpoints, along with the possible mechanisms underlying each case. Case ID Breakpoint sequences with microhomology underlined Possible mechanisms 17DG0332 5’--TTATATATCC | TTACTAGTTA---CCCTTGTTTT | CCAGTATCAA--3’ (SEQ ID NO:99) Possible Alu- mediated non-allelic homologous recombination 19DG0075 5’--TCTTTCCTCA | AAAGGAAAGG---CTCCTAGTGA | ACTCGGGTCT--3’ (SEQ ID NO:100) Possible non- a
llelic homologous recombination between repetitive elements 49 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT 09DG01002 5’--TGTGTGTGTG | CATGTTGTGT---TGTGTGTGTG | TGTGTTTTTG--3’(SEQ ID NO:101) Homologous end joining; (TG)*18 repeat elements
10DG0002 5’--TTTGAAAATA | TACATCACAG---AAAGGGGTCA | CATTAGAATT--3’(SEQ ID NO:102); 5’--GTCCCGAAGA | GTTTAAATTA---GAAGTTTATT | TCTTTACAC--3’(SEQ ID NO:103) 14DG1602 5’--ACAACAGGAA | ACATTTAAAG---TAGACTACTG | ACTTTATACT—3 (SEQ ID NO:104); 5’--CATTGTTATG | ACTCCTGACC---TTATCATTTG | TTTTCTGGTT—3 (SEQ ID NO:105) Non-homologous end joining; Homologous recombination Non-homologous end joining; Homologous recombination 18DG0135 5’--TTTTTTTTTT | CTAATGAATC---TGGTCAATTT | ACTAGTTTTC--3’ (SEQ ID NO:106) Microhomology-mediated end joining 14DG0861 5’--GTGTTAAACG | TCGCGGGGGT---GTACTTAAGG | AACGGGCACG--3’(SEQ ID NO:107) Repeat-mediated deletions 07-00796 5’--TCACTGCAAG | CTCTGCCTCC---TCACTGCAAG | TTCCGCCTTC--3’(SEQ ID NO:108) Microhomology-mediated end joining 07-00462 5’--TCACTGCAAG | CTCTGCCTCC---TCACTGCAAG | TTCCGCCTTC--3’(SEQ ID NO:109) Microhomology-mediated end joining 10DG1265 5’--CACACTAGTA | CCGAGTGACC---CACGTTAGTA | TCGAGTGACG--3’(SEQ ID NO:110) Microhomology-mediated end j
oining 09DG00509 5’--TTTTTTTGTT | GATTGTTTTC---ATTTATTGTT | TAATCCTTTT--3’(SEQ ID NO:111) Microhomology-mediated end joining 1
5DG1177 5’--GACTTTGTGGA | TTGTAGCTTC---CACAGTGGTG | CACACCTGTA--3’(SEQ ID NO:112) Repeat-mediated deletions 15DG1178 5’--GACTTTGTGGA | TTGTAGCTTC---CACAGTGGTG | CACACCTGTA--3’(SEQ ID NO:113) Repeat-mediated deletions 12DG0797 5’--ACCTCCCTCA | G
CCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCG--- GCCCGCCTCG | GCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCG--3’(SEQ ID NO:114) Microhomology-mediated end joining 1
7DG0967 5--GGTGGCTTAG | GCCTTCGATC---TTGTCCTTAG | AATTTCTTTT--3’(SEQ ID NO:115) Microhomology-mediated end joining 20DG1339 5’--TACAAAGGAG | TCAATTCACA---TTTAGAGATA | GTCGGGGTAA--3’(SEQ ID NO:116) Repeat-mediated deletions In case 17DG0332, which involves a rearrangement in the MFSD8 gene, two homologous Alu element sequences with the same orientation were identified near the 50 and 30 breakpoints (Figures 15A and 15B). This observation suggests that the deletion was mediated by non-allelic homologous recombination between the two Alu elements, which is reminiscent of a previous report (Figure 15A). 50 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Two cases suggested that gene regulatory mechanisms could be responsible for the novel recessive breakpoints. First, in case 19DG0075 with a deletion in the USH1C and ABCC8 genes, two homologous sequences with the same orientation (Figure 20A) were identified near at 5’ and 3’ breakpoints. Notably, the deleted region encompassed the transcription start site (TSS) of ABCC8 and seven enhancers identified and classified by ENCODE (Z-Lab), VISTA, and dbSUPER (Figure 15C). Interestingly, multiple long-distance TSS-enhancer interactions are identified by Hi-C, expression quantitative trait loci (eQTLs), and distance-based associations. These frequent long-distance TSS-enhancer interactions are reminiscent of active transcription hubs that regulate gene expression. Two distal enhancers that interact with the TSS coincide with the 5’ and 3’ breakpoints, respectively, suggesting that the long-distance interactions might bring the 5’ and 3’ breakpoints into close proximity in the 3D space, which facilitates homologous recombination between the direct repeats and the ensuing deletion (Figure 15C). Second, in case 10DG0002, with a rearrangement in the BBIP1 gene, two homologous sequences in the opposite orientation (Figure 20B) were found at the 5’ and 3’ inversion breakpoints, respectively (Figure 15D). These inverted repeats could mediate the inversion between breakpoints c and d (Figure 15D) via homologous recombination. Intriguingly, both repeats overlap with enhancers classified by ENCODE. Additionally, breakpoint b is close to a cryptic TSS annotated in the FANTOM5 CAGE dataset (Figure 15E). The TSS and the enhancer near breakpoint b also overlap with ENCODE H3K27Ac peaks, suggesting active transcription and enhancer activity. Breakpoint a, on the other hand, lies at the edge of a block of human chained self-alignments (a form of repeat) (Figure 15E). Disruptions and rearrangements affecting enhancers and TSS were observed in three additional disease-associated genes in this study, which suggests a non-random distribution of breakpoints in relation to genomic features critical for gene expression regulation (Figures 20C–20E). The analysis of the novel breakpoints in the context of cis-regulatory and repetitive elements suggest that both cis DNA elements and transcription activity can contribute to genome instability in human recessive diseases. Additionally, for all four cases of retinitis pigmentosa (07-0046, 07-00796, 09DG00509, and 10DG1265), although the breakpoints are localized in different genomic regions, microhomology is consistently identified at the exact edges of all breakpoints. This observation suggests that the 51 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT breakpoints were processed by the MMEJ pathway, highlighting the role of microhomology in shaping SVs in retinitis pigmentosa. T-LRS strategies, particularly NanoRanger, have demonstrated remarkable efficacy in detecting and characterizing elusive breakpoints, thereby elucidating the molecular processes that contribute to their formation. Clinical feasibility of NanoRanger and main T-LRS approaches Four unique approaches within the framework of T-LRS strategies were delineated (Table 1): LR-PCR: This method demands detailed prior locus information and approximately 50ng of DNA. While it can achieve high sequence coverage, it can only cover a limited range by trial and error and has a low success rate. TLA: With a need for only basic locus information, TLA requires 5-10 million live cells. It boasts high success rates and coverage. However, its crosslinking strategy confound analysis of many types of SVs such as inversions, translocations, and complexed rearrangements. The necessity of a large number of live cells and a long and complex workflow makes it impractical in routine clinical testing. NanoRanger: Ideal for expansive genomic regions, NanoRanger uses about 200ng of DNA. It not only provides high coverage but also distinguishes itself with its cost-effectiveness and expedited processing time. Adaptive Sampling: This approach doesn't require prior locus information or PCR amplification. It covers abroad genomic field and needs over 1ug of DNA. Its drawbacks include notably low sequence coverage and a significant demand on time and resources. Each of these strategies addresses varied sequencing requirements. They present a balance between success rates, coverage, and resource allocation, solidifying their importance in genomics research. Table 6 Comparison among T-LRS Methods NanoRanger LR-PCR TLA Adaptive sampling Knowledge from prior optional required required optional 52 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT testing Sample requirement ~ 100 ng DNA ~ 50 ng DNA 5-10 million live cells ≥1 μg DNA PCR yes yes yes no Restriction digestion partial (~5 min) no complete (overnight) no Success rate high low intermediate high Range hundreds of kb < 30 kb hundreds of kb hundreds of kb Sequencing depth high high high very low Time ~ 6 hours hours to days 5-7 days 2-3 days Cost low low intermediate very high In some forms, NanoRanger is employed within the context of no prior molecular diagnostic knowledge. When there is no prior diagnostic knowledge as a basis for designing primers, NanoRanger employs multiple primers designed to target and amplify various genes of interest based on prior genetic knowledge. Sequencing data is generated and analyzed using pyNanoRanger to align the sequences to the human reference genome and identify SVs, such as insertions, deletions, or inversions. In this way, NanoRanger captures and analyzes all relevant genomic regions through targeted amplification of multiple genes of interest, even in cases where no prior testing data is available. NanoRanger for the characterization of transposable elements NanoRanger also allows for rapid and sensitive identification of transposable elements (TEs) throughout the genome of humans and other species (Fig. 11). By designing a pair of primers specific to TEs (e.g. human-specific LINE-1 insertions (L1Hs)) while keeping all the other experimental steps unchanged, NanoRanger can generate reads that align to the human reference genome, capturing extended genomic regions containing L1Hs insertions. Two categories of L1Hs insertions can be detected based on their supporting reads’ alignment patterns to L1Hs: 1) Known L1Hs insertions: Reads mapping to L1Hs regions present in the reference genome are categorized as known insertions. 2) Potential novel L1Hs insertions: Reads aligning to genomic regions not annotated as containing L1Hs in the reference genome are flagged as potential novel L1Hs insertions. This approach can specifically detect both known and novel L1Hs, even in the presence of nearby repetitive elements, and it requires a low sequencing depth. In summary, by employing NanoRanger with specially designed primers for TEs, researchers can achieve rapid, accurate, and sensitive identification of TEs, providing valuable insights into their role in development, aging, and diseases. 53 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Existing Technologies TE detection has been a focal point in genomics research for decades, given its raising significance in genome evolution, structural variation, and diseases. Various methodologies have been developed to detect and analyze TEs, such as LINEs (Long Interspersed Nuclear Elements), SINEs (Short Interspersed Nuclear Elements), and other transposons. Key methodologies in this field include the following, and they present with short-comings which are absent from the disclosed methods: Conventional PCR-based methods. Early assays for detecting TEs were primarily conventional PCR-based and relied on gel- based amplicon separation to determine the presence or absence of specific elements. Examples include techniques such as amplification typing of L1 active subfamilies (ATLAS) [ Badge, et al. (2003). Am J Hum Genet 72, 823-838. 10.1086/373939.], L1display [Sheen et al.. Genome Res 2000,10:1496-1508], and L1 insertion dimorphisms identification by PCR (LIDSIP) [Pornthanakasem et al.. Biotechniques 2004, 37:750-752]. These approaches targeted sequences from younger L1 families and provided valuable initial insights into the high level of L1 polymorphism within human genomes. Despite their utility, these methods were not well-suited for comprehensive TE mapping in large sample cohorts, limiting their broader applicability. SRS-based methods. A series of SRS-based methods, particularly computational approaches, have been established to identify TEs across the genome
1,2. While SRS offers high throughput and scalability, and numerous software tools have been developed, significant challenges remain. These challenges are primarily related to the inherent limitations of SRS platforms like Illumina, which struggle to map reads to repetitive elements due to their short length. Moreover, detecting TEs involved in structural variations is particularly challenging without the ability to sequence longer reads that can span these repetitive elements. LRS-based methods. LRS platforms, such as Nanopore and PacBio, generate reads long enough to cover full TE insertions, allowing identification of active TEs in their full lengths and provides insights into diverse DNA intermediates. However, LRS still faces challenges, including high costs and sensitivity limitations. For instance, Zhou et al. reported that whole-genome PacBio sequencing at 54 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT 50x coverage, combined with the PALMER tool, detected only 283 L1Hs-a moderate number
3. Similarly, Ramirez et al. applied Nanopore sequencing to 18 aged human brain samples, identifying approximately 150 L1Hs, most of which were truncated, limiting their utility for in-depth research
4. Importantly, due to low coverage and sensitivity, such LRS whole-genome sequencing approaches can only detect a subset of germline TEs, and cannot detect somatic TE transposition events that are reported in the brain
5 and aging tissues
6. Proof-of-principle study In the proof-of-principle study, the NanoRanger technology was applied to investigate the role of TEs in Wiskott–Aldrich syndrome (WAS), a rare immunodeficiency disorder. The study focused on detecting potential L1Hs mobilization events within the genomic DNA of four individuals from a WAS-affected family: the proband (index), his sister, and both parents. Given the structural complexity and variability inherent in TEs, the use of NanoRanger, with its capability to generate ultra-long reads, is invaluable for precisely identifying both known and novel L1Hs insertions across the genomes of the family members. A pilot experiment was conducted using 293T wild-type cells to validate the primer efficiency designed to anneal specifically to L1Hs. The L1Hs noncutter SalI was chosen as the restriction enzyme to perform partial digestion. The samples were processed according to the NanoRanger protocol, utilizing primers exclusive to the human LINE1 (L1.3) element sequence. The primer sequences included 5’-ATGCTAGATGACACATTAGTGGG 3’ (SEQ ID NO:92)- (targeting the 3’ end of L1Hs) and 5’-GCTCTGCGTTTTAGAGTTTCCA-3’ (SEQ ID NO:93)- (targeting the 5’ end of L1Hs). Following sample preparation and sequencing, 17,856 reads were obtained, with 84.92% containing L1Hs, and retrieved for downstream analysis. The preliminary experiment successfully identified 102 known L1Hs annotated in the human telomere-to-telomere reference genome (T2T-CHM13), including 84 full-length L1Hs and 18 truncated L1Hs. Additionally, 3 novel L1Hs elements, previously unannotated in T2T-CHM13, were detected, supported by 1,216 reads, 3 reads, and 1 read, respectively (Figures 12A-12D). NanoRanger’s robustness to specifically detect L1Hs insertions was well observed, even in the presence of nearby satellite sequences and/or other repetitive elements (Fig. 12E). Notably, NanoRanger operates effectively at low sequencing depths with a single read being sufficient to accurately identify an L1Hs insertion. This pilot validation study confirmed the effectiveness of the primers and 55 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT NanoRanger protocol for detecting both full-length and truncated L1Hs, as well as potential previously unidentified insertions. TE characterization of L1Hs across a WAS family Building on the success of the pilot experiment, the workflow for characterizing L1Hs elements in the WAS case study included the following steps: 1) Sample preparation and sequencing: Genomic DNA was extracted from peripheral blood mononuclear cells (PBMCs) of the proband, his sister, and both parents. The NanoRanger protocol, utilizing the same primer pair from the pilot experiment, was applied. Library preparation was performed using the ONT Native Barcoding Kit (SQK-NBD112.24), followed by sequencing on an ONT MinION sequencer with one FLO-MIN112 flow cell, generating long-read sequences for each individual. 2) Data analysis and L1Hs characterization: The sequencing reads were analyzed through a pipeline that integrates NanoRanger and the T2T-CHM13v2.0 reference genome for TE annotation (Fig. 13). The data analysis pipeline includes three main phases: filtering, categorizing, and summarizing, ensuring accurate identification of both known and potential novel TEs across genomic samples. Initially, a pre-filter is applied using specific primer sequences and length criteria to capture reads relevant to TE detection. These filtered reads are then aligned to the T2T reference genome (or an equivalent reference) to obtain their genomic coordinates. After alignment, reads are filtered based on their identity in TE target regions, ensuring that only high-confidence alignments are included in further analysis. Reads that fully align to known repetitive elements are excluded to reduce noise and improve specificity. The aligned genomic regions of the filtered reads are then quantified and annotated for downstream classification. Summaries from multiple samples are consolidated into a unified dataset, enabling comprehensive analysis of TE presence and variation across all samples. TE insertions are classified into two categories: known and potential novel TE insertions. Known insertions are supported by reads that fully map to TEs and their flanking regions annotated in the reference genome, while potential novel insertions are identified by reads aligning to genomic regions that are not annotated as containing TEs in the reference genome. These novel candidates represent potential new TE insertions that are not present in the standard reference. NanoRanger’s capacity to detect both known and potential novel L1Hs insertions enabled a detailed comparison of L1Hs insertions across the family members, providing insights into the dynamics of L1Hs mobilization in the context of WAS. 56 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT The NanoRanger analysis revealed distinct L1Hs insertion profiles across the family members, as summarized in Table 7. NanoRanger showed its higher sensitivity by detecting more L1Hs than the other published studies, even in regions with high sequence diversity or overlapping repetitive elements. Table 7. Summary of L1Hs detection across family members Category Index Sister Mother Father Number of detected L1Hs insertions in total 695 714 699 752 Number of detected known L1Hs insertions 423 412 407 451 Size distribution of detected known L1Hs insertions Size <5000bp 34 23 21 25 Size 5000-6000bp 7 5 6 9 Size 6000-6500bp 379 382 378 415 Size >6500bp 3 2 2 2 Summary of known L1Hs insertions detected by both ends or one end Detected by both ends 181 206 208 221 Detected by one end 242 206 199 230 Number of detected novel L1Hs insertions 272 302 292 301 Summary of previously unidentified L1Hs insertions detected by both ends or one end Detected by both ends 32 42 45 46 Detected by one end 240 260 247 255 Unlike conventional whole-genome LRS approaches, which often have to detect mostly fragmented TEs, NanoRanger is designed to identify full-length active TEs, offering more meaningful insights into their biological impact. 57 45686617
ATTORNEY REF #: KAUST 2024-024-02 PCT Unique L1Hs insertions were found in the proband, and different insertion patterns were found among the family members. Notably, a previously unidentified L1H insertion was identified near the CABS1 promoter in the proband but was absent in the other family members, indicating a potential de novo L1H mobilization event in the context of WAS. Last but not least, NanoRanger is engineered to reduce the need for ultra-high sequencing coverage and minimize off-target sequencing, making it a more economical solution compared to current LRS technologies. This application of NanoRanger showcases its effectiveness in characterizing TEs, in scenarios like genetic diseases and wellness assessment. The ultra-long reads generated by NanoRanger facilitate the accurate detection of both known and previously unidentified L1Hs insertions, offering valuable insights into TE mobilization and its potential implications for disease pathogenesis. These findings highlight the importance of further exploring TEs in genetic disease, as their mobilization may play a critical role in disease mechanisms. NanoRanger’s ability to map TEs with base-pair resolution positions it as a valuable tool for studying TE dynamics. Exemplary applications include, but are not limited to: 1) Mapping breakpoints and quantifying rare genetic variants in complex genomic disorders that failed genetic diagnosis by conventional clinical testing methods. 2) Carrier screening for clinically challenging genetic diseases (e.g., Spinal Muscular Atrophy, Duchenne muscular dystrophy, Saudi founder genetic diseases). 3) Premarital carrier screening and prenatal care. 4) Preimplantation genetic testing 5) Studying the dynamics of transposable elements and their impact on disease progression and aging. 6) Analyzing transposable element mobilization as a biomarker of disease and longevity. 58 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT Overall, the present studies aim to address the limitations and challenges encountered with conventional approaches in the realm of genetic disease diagnosis. To address the challenges, a pioneering approach termed NanoRanger was introduced and implemented. The integration of NanoRanger alongside traditional approaches played a pivotal role in uncovering disease-causing breakpoints that had been missed by conventional clinical testing. Work from this study showed the strength of targeted sequence enrichment and the application of long-read sequencing in the diagnosis of genetic diseases. Unlike commercial next- generation sequencing solutions, current strategy has no bias in gene coverage and no compromise on DNA length or data throughput. It works with routine clinical samples (e.g., peripheral blood) and provides long DNA reads that can resolve large or complex genomic alterations at base-pair resolution. A proof-of-concept clinical studies have been successfully conducted and shown that NanoRanger technology provides fast and cost-effective diagnosis of a cohort of various genetic disorders. While the breakpoints that were unveiled in medical cases of this study are rare in the human population, their clinical identification holds significance for unaffected carrier diagnoses within the population and contributes to ongoing scientific endeavors related to the genes involved. It is worth noting that this study included DNA samples spanning several years, with some dating back up to 16 years. Despite their prolonged storage and limited quantities, NanoRanger exhibited robust performance in detecting breakpoints and providing deep and comprehensive coverage. This underscores the competitive edge of NanoRanger over technologies like genome optical mapping, which demands ultrahigh molecular weight (UHMW) DNA, and adaptive sampling, which necessitates large DNA quantities (≥1 ug DNA). This study showed that NanoRanger for different gene loci could be multiplexed in one reaction to reduce diagnosis cost. In further development, higher levels of multiplexing will be tested to build gene panels (e.g., genetic disease genes, cancer genes, and pathogen panels). The multiplex NanoRanger could serve as a first-line diagnostic tool for genetic disorders with well- defined candidate genes and as a cost-effective premarital screening test for numerous common genetic disorders in countries including Saudi Arabia. 59 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT Table 8. Multiplex genotyping summary in affected families. This table provides an overview of the genotyping outcomes, detailing the identification of carrier status within the families studied. Family members Sex (ascribed at birth) Relationship Status 17DG0332 M Index Affected 17DG0333 F Mother Carrier 17DG0334 M Father Carrier 17DG0335 M Brother Noncarrier 19DG0075 M Index Affected 19DG0076 F Sister Carrier 19DG0077 F Mother Carrier 19DG0078 M Brother Carrier 10DG0002 M Index Affected 10DG0003 F Mother Carrier 10DG0091 M Father Carrier 14DG0861 F Index Affected 14DG0862 F Mother Carrier 14DG0863 M Father Carrier 07-00796 M Index Affected 07-00462 M Brother Affected 07-00458 F Sister Carrier 07-00459 M Brother Carrier 07-00460 M Brother Carrier 07-00461 F Sister Carrier 07-00463 M Brother Carrier 22DG0163 F Wife Noncarrier 10DG1265 M Index Affected 10DG1262 M Brother Carrier 10DG1263 M Brother Carrier 10DG1264 M Brother Carrier 11DG0600 M Brother Carrier 11DG0601 F Sister Noncarrier 11DG0602 M Brother Noncarrier 11DG0603 F Sister Noncarrier 11DG0604 M Father Carrier 11DG0605 F Mother Carrier 09DG00509 F Index Affected 09DG00510 F Sister Carrier 60 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT 09DG00511 F Sister Carrier 09DG00512 F Sister Carrier 09DG00513 M Father Carrier 09DG00514 F Mother Carrier 22DG0778 F Sister Carrier 15DG1177 M Index Affected 15DG1178 M Brother Affected 15DG1179 F Mother Carrier 15DG1180 M Father Carrier 12DG0797 M Index Affected 12DG0798 F Sister Carrier 12DG0799 F Mother Carrier 12DG0800 M Father Carrier 09DG01002 M Index Affected 09DG01003 F Mother Carrier 09DG01004 M Father Carrier 13DG1655 M Brother Noncarrier Table 9. Primer concentration efficacy in multiplex NanoRanger. This table summarizes the reads number obtained from the multiplex NanoRanger, specifically focusing on the outcomes relative to different primer concentrations used during the test. Primer 0.2 μM 0.3 μM 0.4 μM 0.5 μM concentration Total reads 236,056 231,739 323,003 305,581 Locus 1 (BBIP1) 7,448 (3.16%) 1,741 (0.75%) 2,215 (0.69%) 1,483 (0.49%) Locus 2 (ABCC8) 3,379 (1.43%) 1,480 (0.64%) 1,548 (0.48%) 1,227 (0.40%) Table 10. The cost evaluation in a 96-sample sequencing run of NanoRanger. This table provides a comprehensive cost breakdown for running a batch of 96 samples through the NanoRanger sequencing process. Materials Estimated cost per reaction (USD) Note Restriction endonuclease 12.5 Based on $65.3 per 500 reactions, for 96 reactions Thermo Scientific T4 DNA Ligase 6.6 Based on $68.8 per 1000 reactions, for 96 reactions Flow Cell (R10.4.1) 475.0 Priced per unit in a 96-flowcell pack 61 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT 266.3 Based on $799 per 288 barcoding Native Barcoding Kit 96 V14 reactions, for 96 reactions 83.3 Based on $4000 per 48 ligation reactions Ligation Sequencing Kit XL V14 Total cost (for 96 samples) 843.8 - Total cost (per sample) 8.8 Calculated from the sum for 96 samples in the run Table 12. Random ligation occurrence in different ligation volumes. This table displays the incidence of random ligation events across various ligation volumes. Ligation volume 1x (50 μl) 6x (300 μl) 10x (500 μl) % Random ligation occurrence 7.97% 2.25% 0.88% Table 13. Oligonucleotides used in the study. This table shows all the primer sequences designed and used throughout the study. Oligo Sequence 17DG0332 LR-PCR forward primer AGCATTATAAGAGCCGATGGAG (SEQ ID NO:1) 17DG0332 LR-PCR reverse primer CACGAGCAACCAGCATGTAG (SEQ ID NO:2) 19DG0075 TLA inverse primer 1 AGCTGGAAAGAGCCGCGACC (SEQ ID NO:49) 19DG0075 TLA inverse primer 2 GTGCAACGACGCAGCTGGACCT (SEQ ID NO:50)
19DG0075 TLA inverse primer 3 GTGGTTCTCGCTGCCGCAGA (SEQ ID NO:51) 19DG0075 TLA inverse primer 4 TGCCGCACGTCTTCCTACTCT (SEQ ID NO:52) 19DG0075 TLA inverse primer 5 CCCGCTTCAGGACGATCACCA (SEQ ID NO:53) 19DG0075 TLA inverse primer 6 AGAAGCTGTGTGAGCAAAAGCCT (SEQ ID NO:54) 19DG0075 TLA inverse primer 7 ACGATCACCAGGTCTGCACTCA (SEQ ID NO:55) 19DG0075 TLA inverse primer 8 ATCCCACATTCGGACCCTGC (SEQ ID NO:56) 19DG0075 TLA inverse primer 9 TCGATGGCCACCAGGACCATG (SEQ ID NO:57) 19DG0075 TLA inverse primer 10 CAAGTGGACCGACAGCGCCCTGA (SEQ ID NO:58) 19DG0075 TLA inverse primer 11 TCCAGAGCTGAGAAGGGGTCATC (SEQ ID NO:59) 19DG0075 TLA inverse primer 12 TAGTGACCCACAAGCTACAGTACC (SEQ ID NO:60) 19DG0075 TLA inverse primer 13 AGCGGCTCAGGCACTCCAG (SEQ ID NO:61) 19DG0075 TLA inverse primer 14 TACTTCCGGGTGGCGTCCAG (SEQ ID NO:98) 19DG0075 TLA inverse primer 15 CTGGCTGAAATTCTCCCCGCCTT (SEQ ID NO:9) 62 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT 19DG0075 TLA inverse primer 16 CCTTCGTGAGGAAGACCAGCATCT (SEQ ID NO:10) 19DG0075 TLA inverse primer 17 ATCCAAGTCGGTCGCTGTCTC (SEQ ID NO:63) 19DG0075 TLA inverse primer 18 ATCAGGTACTGCGTCCTGG (SEQ ID NO:64) 10DG0002 NanoRanger inverse primer 1 ACAGCCTATGCCCCATTTTGG (SEQ ID NO:13) 10DG0002 NanoRanger inverse primer 2 CGAAGGAGATGGAGGTCGTC (SEQ ID NO:14) 14DG0861 NanoRanger inverse primer 1 GCTTGGAGCATAAGGATGACACA (SEQ ID NO:23) 14DG0861 NanoRanger inverse primer 2 GAACTTAGGGAGATGGCTGGA (SEQ ID NO:24) 07-00796/07-00462 NanoRanger inverse primer 1 GCACTGCCTTTGCTGTTCATT (SEQ ID NO:25) 07-00796/07-00462 NanoRanger inverse primer 2 GGTCACTCACTCTCAAGCCAG (SEQ ID NO:26) 10DG1265 NanoRanger inverse primer 1 ATTTACACTTGTGCCCCCGT (SEQ ID NO:27) 10DG1265 NanoRanger inverse primer 2 GGCTCAGCGATGTCCCTAAA (SEQ ID NO:28) 09DG00509 NanoRanger inverse primer 1 GCAGGAATGTGATACCATGGAGC (SEQ ID NO:31) 09DG00509 NanoRanger inverse primer 2 ACACCACTATTGAGGAGGTCAAAGG (SEQ ID NO:32) 15DG1177 NanoRanger inverse primer 1 GGCATGTCTGTGGTAATGAGAG (SEQ ID NO:35)
15DG1177 NanoRanger inverse primer 2 GCAACCTCAGAAGGAGGCCC (SEQ ID NO:36) 12DG0797 NanoRanger 1 inverse primer 1 TTACAGGCCAGCACGATTCAT (SEQ ID NO:33) 12DG0797 NanoRanger 1 inverse primer 2 CCAAGCCCAGAAGCAGGTAG (SEQ ID NO:34) 12DG0797 NanoRanger 2 inverse primer 1 GCCACTCCCAAATCAATAGCA (SEQ ID NO:94) 12DG0797 NanoRanger 2 inverse primer 2 GGTGATGTAACTGCCATACAAT (SEQ ID NO:117) 14DG1602 NanoRanger 1 inverse primer 1 TTGGGGGAAAGTCCTCAGGT (SEQ ID NO:118) 14DG1602 NanoRanger 1 inverse primer 2 GACATCACCATCTCAGGCCC (SEQ ID NO:119) 14DG1602 NanoRanger 2 inverse primer 1 GAGCCTGTCTTAGCCTGTGG (SEQ ID NO:97) 14DG1602 NanoRanger 2 inverse primer 2 CCAGGTGAACCCCAAAAGGA (SEQ ID NO:120) 18DG0135 NanoRanger inverse primer 1 GGATTGAAGTATCTTGAGGCAGTG (SEQ ID NO:56) 18DG0135 NanoRanger inverse primer 2 GAGGAAAGCGTTAACTCACATCT (SEQ ID NO:57) 18DG0135 NanoRanger inverse primer 3 TATCACGTCAGGTCGTTGGC (SEQ ID NO:95) 18DG0135 NanoRanger inverse primer 4 AAGGCAGTGGTACAGGATGC (SEQ ID NO:96) 18DG0135 NanoRanger inverse primer 5 CTTTGAGGGGTTTGCAACAGG (SEQ ID NO:121) 18DG0135 NanoRanger inverse primer 6 GGTGTGCTGCAAAAGCGATG (SEQ ID NO:47)
18DG0135 NanoRanger inverse primer 7 TTCTAGGCTCACCCCTAGCC (SEQ ID NO:48) 18DG0135 NanoRanger inverse primer 8 GTCACGGGAGGAACCTGATG (SEQ ID NO:49) 18DG0135 NanoRanger inverse primer 9 GCTGCGAGGCTTGAATTCTG (SEQ ID NO:50) 18DG0135 NanoRanger inverse primer 10 GGAGCCAGAGATCCAAATCATT (SEQ ID NO:51) 18DG0135 NanoRanger inverse primer 11 TAGCCCACAAACCAGTGTCC(SEQ ID NO:52) 18DG0135 NanoRanger inverse primer 12 GGTCAAGTCTGAAGAGATAGCAGT (SEQ ID NO:53) 18DG0135 NanoRanger inverse primer 13 CCATGACAAGCAACTATTCTACCCA (SEQ ID NO:54) 18DG0135 NanoRanger inverse primer 14 TACGAAAGGGTTCCTGGCAT (SEQ ID NO:55) 18DG0135 NanoRanger inverse primer 15 GGATTGAAGTATCTTGAGGCAGTG (SEQ ID NO:56) 18DG0135 NanoRanger inverse primer 16 GAGGAAAGCGTTAACTCACATCT (SEQ ID NO:57)
20DG1339 NanoRanger inverse primer 1 ACAATGAGCCTCAGAAGCTGT (SEQ ID NO:58) 20DG1339 NanoRanger inverse primer 2 GTCGGGGTAAAAGTTAGGGTTT (SEQ ID NO:59) 63 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT 17DG0967 NanoRanger inverse primer 1 CTGGGAATAAGGACCCCTGC (SEQ ID NO:60) 17DG0967 NanoRanger inverse primer 2 TGAGGACAATTCGTGGGCTTTCA (SEQ ID NO:61) 17DG0332 Genotyping forward primer CGCCTCCAACTCTCAAAGCAA (SEQ ID NO:3) 17DG0332 Genotyping reverse primer TGCCTCTAACCTCAACTTATT (SEQ ID NO:4) 19DG0075 Genotyping forward primer ACCAGCCTGAAGCTCAAAGAGGGC (SEQ ID NO:64) 19DG0075 Genotyping reverse primer AGGGTGGATGCTCACGGCTCCT (SEQ ID NO:65) 10DG0002 Genotyping 1 forward primer CATGATGGTTTATCCACAGGTCCA (SEQ ID NO:66) 10DG0002 Genotyping 1 reverse primer GCTGCTGTTGAGATACTGTGC (SEQ ID NO:67) 10DG0002 Genotyping 2 forward primer AGTTCAGAGTAGCTGGAGTTGC (SEQ ID NO:68) 10DG0002 Genotyping 2 reverse primer TCCTCCCTTTCAGTGGCATATCA (SEQ ID NO:69) 14DG0861 Genotyping forward primer GTCACCTCTTGAATGCTTTGCT (SEQ ID NO:70) 14DG0861 Genotyping reverse primer GACCACAGGCAGAATGGGCTTA (SEQ ID NO:71) 07-00796/07-00462 Genotyping forward primer GTCCTGCTCCAGGAATTAAACGT (SEQ ID NO:72) 07-00796/07-00462 Genotyping reverse primer AGCCGTTGCCATCATTCTGA (SEQ ID NO:73
10DG1265 Genotyping forward primer CTGTAACAGAGCCCAAGGCA (SEQ ID NO:29) 10DG1265 Genotyping reverse primer AATGGGCTGCTTCCCTTACC (SEQ ID NO:75) 09DG00509 Genotyping forward primer CCTGTTGGCGATTTGTATGTCTTAT (SEQ ID NO:76) 09DG00509 Genotyping reverse primer GCAAAGCAGTCAGATGGAGTAG (SEQ ID NO:77) 15DG1177/15DG11778 Genotyping forward primer ACGGACAGCAAAGTTTGGGA (SEQ ID NO:78) 15DG1177/15DG11778 Genotyping reverse primer TCTCAGGCACCATTGTGGTC (SEQ ID NO:38) 12DG0797 Genotyping forward primer CCATGAACCAAACTTTGAAGTGGT (SEQ ID NO:80) 12DG0797 Genotyping reverse primer TTTTCGGAAAGGAGCTCACA (SEQ ID NO:81) 09DG01002 Genotyping forward primer GTCTCCTGCCTTGGGTACAA (SEQ ID NO:122) 09DG01002 Genotyping reverse primer ACCCTTCACAGCTACATTCACAT (SEQ ID NO:46) 14DG1602 Genotyping 1 forward primer TCTGGACCACGTAGGGAAAA (SEQ ID NO:84) 14DG1602 Genotyping 1 reverse primer AGGAAACAAGTGCAAAGCAAA (SEQ ID NO:85) 14DG1602 Genotyping 2 forward primer CCTTTCACTCATCCTTTTTACCTTC (SEQ ID NO:86) 14DG1602 Genotyping 2 reverse primer GGTTTCAGAACACCACGTAGATT (SEQ ID NO:87) 18DG0135 Genotyping forward primer GGGATATGGAGGGCGCATAA (SEQ ID NO:88)
18DG0135 Genotyping reverse primer CCACTCATGGGAGTGTTGTGA (SEQ ID NO:89) 17DG0967 Genotyping forward primer GGGAGCAGGGTCACAAAAGT (SEQ ID NO:90) 17DG0967 Genotyping reverse primer TCTCGTTCGTTTCAGGAGGC (SEQ ID NO:91) L1Hs NanoRanger inverse primer 1 ATGCTAGATGACACATTAGTGGG (SEQ ID NO:92) L1Hs NanoRanger inverse primer 2 GCTCTGCGTTTTAGAGTTTCCA (SEQ ID NO:93) Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the method and compositions described herein. Such equivalents are intended to be encompassed by the following claims. 64 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT References 1 RARE Facts - Global Genes [Internet]. Available from: https://globalgenes.org/rare-facts/ 2 Graessner, et al. European Journal of Human Genetics 29, 1319-1320 (2021). https://doi.org:10.1038/s41431-021-00924-8 3 Shashi, et al. Genetics in Medicine 16, 176-182 (2014). https://doi.org:10.1038/gim.2013.99 4 Molster, et al. Orphanet J Rare Dis 11, 30 (2016). https://doi.org:10.1186/s13023-016- 0409-z 5 AlAbdi, L. et al. Diagnostic implications of pitfalls in causal variant identification based on 4577 molecularly characterized families. Nature Communications 14, 5269 (2023). https://doi.org:10.1038/s41467-023-40909-3 1403-1416 (2021). https://doi.org:10.1007/s00439-021-

02331-x 7 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report. New England Journal of Medicine 385, 1868-1880 (2021). https://doi.org:10.1056/NEJMoa2035790 8 Miller, et al. Am J Hum Genet 108, 1436-1449 (2021). https://doi.org:10.1016/j.ajhg.2021.06.006 23 9 Hottentot, et al. Methods Mol Biol 1492, 185-196 (2017). https://doi.org:10.1007/978-1- 4939-6442-0_13 10 Payne, A et al. Nature Biotechnology 39, 442-450 (2021). https://doi.org:10.1038/s41587-020-00746-x 11 Martin, et al. Genome Biol 23, 11 (2022). https://doi.org:10.1186/s13059-021-02582-x 12 AlAbdi, et al. Nature Communications 14, 5269, doi:10.1038/s41467-023-40909-3 (2023). 13 Shickh, et al. Hum Genet 140, 1403-1416, doi:10.1007/s00439-021-02331-x (2021). 14 Vieler, L. M. et al. Optical Genome Mapping Reveals and Characterizes Recurrent Aberrations and New Fusion Genes in Adult ALL. Genes (Basel) 14, doi:10.3390/genes14030686 (2023). 15 Zhang, et al. Med, doi:https://doi.org/10.1016/j.medj.2024.07.003 (2024). 16 Kim, et al.. Methods Mol Biol 2250, 115-121, doi:10.1007/978-1-0716-1134-0_11 (2021). 65 45686617
ATTORNEY DOCKET NO. KAUST 2024-024-02 PCT 17 Nakagome, et al. BMC Bioinformatics 15, 71, doi:10.1186/1471-2105-15-71 (2014). 18 Baduel, P., Quadrana, L. & Colot, V. in Plant Transposable Elements: Methods and Protocols (ed Jungnam Cho) 157-169 (Springer US, 2021). 19 Zhou, et al. Nucleic Acids Res 48, 1146-1163, doi:10.1093/nar/gkz1173 (2020). 20 Ramirez, et al. bioRxiv, 2024.2002.2001.578450, doi:10.1101/2024.02.01.578450 (2024). 21 Toda, et al. Cell reports 43, 113774, doi:10.1016/j.celrep.2024.113774 (2024). 22 Liu, et al. Cell, doi:10.1016/j.cell.2022.12.017 (2023). Additional References 1. Nakagome, et al. (2014). Transposon Insertion Finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 15, 71. 10.1186/1471-2105-15-71. 2. Baduel, P., Quadrana, L., and Colot, V. (2021). Efficient Detection of Transposable Element Insertion Polymorphisms Between Genomes Using Short-Read Sequencing Data. In Plant Transposable Elements: Methods and Protocols, J. Cho, ed. (Springer US), pp. 157-169. 10.1007/978-1-0716-1134-0_15. 3. Zhou, W et al. Nucleic Acids Res 48, 1146-1163. 10.1093/nar/gkz1173. 4. Ramirez, et al. bioRxiv, 2024.2002.2001.578450. 10.1101/2024.02.01.578450. 5. Toda, T et al. (2024). Cell reports 43, 113774. 10.1016/j.celrep.2024.113774. 6. Liu, et al. (2023). Resurrection of endogenous retroviruses during aging reinforces senescence. Cell. 10.1016/j.cell.2022.12.017. 66 45686617