US20130116930A1 - Method and System for Assessment of Regulatory Variants in a Genome - Google Patents
Method and System for Assessment of Regulatory Variants in a Genome Download PDFInfo
- Publication number
- US20130116930A1 US20130116930A1 US13/592,291 US201213592291A US2013116930A1 US 20130116930 A1 US20130116930 A1 US 20130116930A1 US 201213592291 A US201213592291 A US 201213592291A US 2013116930 A1 US2013116930 A1 US 2013116930A1
- Authority
- US
- United States
- Prior art keywords
- computer
- genome
- impact
- genetic variants
- disease
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000001105 regulatory effect Effects 0.000 title claims abstract description 20
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 69
- 201000010099 disease Diseases 0.000 claims abstract description 68
- 230000002068 genetic effect Effects 0.000 claims abstract description 33
- 230000007170 pathology Effects 0.000 claims abstract description 10
- 230000008827 biological function Effects 0.000 claims abstract description 8
- 230000027455 binding Effects 0.000 claims description 73
- 108091023040 Transcription factor Proteins 0.000 claims description 30
- 102000040945 Transcription factor Human genes 0.000 claims description 30
- 238000004458 analytical method Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 6
- 108020004414 DNA Proteins 0.000 claims description 5
- 238000010195 expression analysis Methods 0.000 claims description 5
- 238000003559 RNA-seq method Methods 0.000 claims description 3
- 101000979342 Homo sapiens Nuclear factor NF-kappa-B p105 subunit Proteins 0.000 description 26
- 102100023050 Nuclear factor NF-kappa-B p105 subunit Human genes 0.000 description 26
- 108090000623 proteins and genes Proteins 0.000 description 11
- 230000014509 gene expression Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 8
- 238000013459 approach Methods 0.000 description 5
- 108700028369 Alleles Proteins 0.000 description 4
- 241000282412 Homo Species 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000007614 genetic variation Effects 0.000 description 4
- 230000009897 systematic effect Effects 0.000 description 4
- 238000001353 Chip-sequencing Methods 0.000 description 3
- 208000006673 asthma Diseases 0.000 description 3
- 230000002757 inflammatory effect Effects 0.000 description 3
- 102000054765 polymorphisms of proteins Human genes 0.000 description 3
- 206010039073 rheumatoid arthritis Diseases 0.000 description 3
- 201000001320 Atherosclerosis Diseases 0.000 description 2
- 208000023275 Autoimmune disease Diseases 0.000 description 2
- 108091026890 Coding region Proteins 0.000 description 2
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 2
- 101000909637 Homo sapiens Transcription factor COE1 Proteins 0.000 description 2
- 238000000692 Student's t-test Methods 0.000 description 2
- 102100024207 Transcription factor COE1 Human genes 0.000 description 2
- 238000000540 analysis of variance Methods 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 208000027866 inflammatory disease Diseases 0.000 description 2
- 239000002773 nucleotide Substances 0.000 description 2
- 125000003729 nucleotide group Chemical group 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004983 pleiotropic effect Effects 0.000 description 2
- 230000009870 specific binding Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000012353 t test Methods 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 238000000729 Fisher's exact test Methods 0.000 description 1
- 101000835928 Homo sapiens Signal-regulatory protein gamma Proteins 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 238000000585 Mann–Whitney U test Methods 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 101150025612 POLL gene Proteins 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 102100025795 Signal-regulatory protein gamma Human genes 0.000 description 1
- 125000003275 alpha amino acid group Chemical group 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 208000029078 coronary artery disease Diseases 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 230000004064 dysfunction Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000007310 pathophysiology Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 201000000596 systemic lupus erythematosus Diseases 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012418 validation experiment Methods 0.000 description 1
Images
Classifications
-
- G06F19/24—
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B99/00—Subject matter not provided for in other groups of this subclass
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- Genome-wide association studies have discovered many genetic loci associated with disease, but the molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation, and may allow for functional annotation of disease-associated loci.
- Embodiments of the present invention enable genome-wide systematic evaluation of potentially clinically relevant genetic variation in a personal genome.
- the present invention provides methods embodied in a system that can be applied to genetic information comprising an individual genome to assess the regulatory impact of specific genetic variants and their possible impact on biological function or disease pathology.
- An embodiment of the invention is comprised of databases and algorithms embodied in software, where the databases contain information providing genome-wide quantitative and genetic profiles of transcription factor binding measured across a multitude of individual human genomes, information of genetic variants associated with disease conditions, information on DNA motifs associated with transcription factor binding, as well as molecular profiles of disease pathology.
- Embodiments of the present invention use genome-wide quantitative gene regulatory information to assess total genetic variation presenting in an individual genome.
- the present invention provides the ability to infer transcription factor binding events from genotypes. Also, the present invention provides the ability to associate individual variation in gene regulatory regions with biological function and disease pathology.
- Applications of the present invention include clinical assessment of personal genomes, clinical assessment of cancer genomes, and interpretation of genetic disease associations, discovery of regulatory DNA biomarkers. In other embodiments, additional types of gene regulatory information can be added.
- FIGS. 1A and 1B are block diagrams of computer systems on which embodiments of the present invention can be practiced.
- FIG. 2 is a block diagram of a method according to an embodiment of the invention.
- FIG. 3 is a block diagram of a method according to an embodiment of the invention.
- FIG. 4 is a graph illustrating the variability of transcription factor binding in the analysis of regulatory variation.
- FIG. 5 is an illustration of the manner in which a single SNP is associated with NFkB binding.
- FIG. 6 is an illustration of the manner in which the EBF1 motif affects NFkB binding.
- FIG. 7 is a graph demonstrating the manner in which disease-associated SNPs in NFkB binding regions are more pleiotropic.
- FIG. 8 is a graph illustrating the effects of transcription factor binding and disease SNPs.
- FIG. 9 is a graph illustrating the manner in which SNPs associated with inflammatory and autoimmune diseases were overrepresented in NF ⁇ B binding regions.
- the present invention relates to methods, techniques, and algorithms that are intended to be implemented in digital computer system 100 such as generally shown in FIG. 1A .
- digital computer system 100 such as generally shown in FIG. 1A .
- Such a digital computer or embedded device is well-known in the art and may include the following.
- Computer system 100 may include at least one central processing unit 102 but may include many processors or processing cores.
- Computer system 100 may further include memory 104 in different forms such as RAM, ROM, hard disk, optical drives, and removable drives that may further include drive controllers and other hardware.
- Auxiliary storage 112 may also be include that can be similar to memory 104 but may be more remotely incorporated such as in a distributed computer system with distributed memory capabilities.
- Computer system 100 may further include at least one output device 108 such as a display unit, video hardware, or other peripherals (e.g., printer).
- At least one input device 106 may also be included in computer system 100 that may include a pointing device (e.g., mouse), a text input device (e.g., keyboard), or touch screen.
- Communications interfaces 114 also form an important aspect of computer system 100 especially where computer system 100 is deployed as a distributed computer system.
- Computer interfaces 114 may include LAN network adapters, WAN network adapters, wireless interfaces, Bluetooth interfaces, modems and other networking interfaces as currently available and as may be developed in the future.
- Computer system 100 may further include other components 116 that may be generally available components as well as specially developed components for implementation of the present invention.
- computer system 100 incorporates various data buses 118 that are intended to allow for communication of the various components of computer system 100 .
- Data buses 118 include, for example, input/output buses and bus controllers.
- the present invention is not limited to computer system 100 as known at the time of the invention. Instead, the present invention is intended to be deployed in future computer systems with more advanced technology that can make use of all aspects of the present invention. It is expected that computer technology will continue to advance but one of ordinary skill in the art will be able to take the present disclosure and implement the described teachings on the more advanced computers or other digital devices such as mobile telephones or “smart” televisions as they become available.
- the present invention may be implemented on one or more distributed computers. Still further, the present invention may be implemented in various types of software languages including C, C++, and others. Also, one of ordinary skill in the art is familiar with compiling software source code into executable software that may be stored in various forms and in various media (e.g., magnetic, optical, solid state, etc.). One of ordinary skill in the art is familiar with the use of computers and software languages and, with an understanding of the present disclosure, will be able to implement the present teachings for use on a wide variety of computers.
- a computer server that implements certain of the methods of the invention is remotely situated from a user.
- Computer server 122 is communicatively coupled so as to receive information from a user; likewise, computer server 122 is communicatively coupled so as to send information to a user.
- the user uses user computing device 124 so as to access computer server 122 via network 126 .
- Network 126 can be the internet, a local network, a private network, a public network, or any other appropriate network as may be appropriate to implement the invention as described herein.
- User computing device 124 can be implemented in various forms such as desktop computer 128 , laptop computer 130 , smart phone 132 , or tablet device 134 . Other devices that may be developed and are capable of the computing actions described herein are also appropriate for use in conjunction with the present invention.
- computing and other activities will be described as being conducted on either computer server 122 or user computing device 124 . It should be understood, however, that many if not all of such activities may be reassigned from one to the other device while keeping within the present teachings. For example, for certain steps computations that may be described as being performed on computer server 122 , a different embodiment may have such computations performed on user computing device 124 .
- computer server 122 is implemented as a web server on which Apache HTTP web server software is run.
- Computer server 122 can also be implemented in other manners such as an Oracle web server (known as Oracle iPlanet Web Server).
- Oracle web server known as Oracle iPlanet Web Server
- computer server 122 is a UNIX-based machine but can also be implemented in other forms such as a Windows-based machine. Configured as a web server, computer server 122 is configured to serve web pages over network 126 such as the internet.
- user computing device 124 is configured so as to run web browser software.
- web browser software includes Internet Explorer, Firefox, and Chrome.
- Other browser software is available for different applications of user computing device 124 .
- Still other software is expected to be developed in the future that is able to execute certain steps of the present invention.
- user computing device 124 through the use of appropriate software, queries computer server 122 . Responsive to such query, computer server 122 provides information so as to display certain graphics and text on user computing device.
- the information provided by computer server 122 is in the form of HTML that can be interpreted by and properly displayed on user computing device 124 .
- Computer server 122 may provide other information that can be interpreted on user computing device.
- transcription factor binding is significant in the analysis of regulatory variation. For example, as shown in FIG. 4 , whereas individuals vary very little at approximately 1/1300 bp and vary at approximately 1/1000 in promoter regions, transcription factor binding is much more variable. As shown in FIG. 4 , promoter sequences 406 , coding region nucleotides 404 , and amino acid sequences 408 vary very little. Note, however that, transcription factor binding 402 has been found to vary in approximately 7.5% of NFkB binding sites. For purposes of personal genome sequencing, it has been found that 7.5% of SNPs are in transcription factor binding sites and 21% of SNPs are in DNasel HS sites. Embodiments of the present invention make use of these facts for purposes of improved functional and clinical analysis. For example, a single SNP is associated with NFkB binding as shown in FIG. 5 .
- Stat1 motif 502 affects Stat1 504 binding. But it is also important to note that Stat1 504 binding affects NFkB 508 binding, for example, despite the conservation of the NFkB motif 510 . More particularly, for example, it has been found that EBF1 motif affects NFkB binding as shown in FIG. 6 .
- GWASs Genome-wide association studies
- FIG. 2 Shown in FIG. 2 is a block diagram of a method according to an embodiment of the present invention.
- step 202 information is received regarding genome-wide quantitative and genetic profiles of transcription factor binding measured across a multitude of individual human genomes.
- information is received regarding genetic variants associated with disease conditions.
- Information regarding DNA motifs associated with transcription factor binding is received at step 206 .
- molecular profiles of disease pathologies are received.
- the information of steps 202 through 208 is contained in databases.
- certain of the information of steps 202 through 208 is automatically generated or manually curated as further disclosed in copending application Ser. No.
- the databases are maintained locally on a computer system on which the method of the present invention are processed. In another embodiment of the invention, the databases are maintained remotely from a computer on which certain steps of the present invention are performed.
- step 210 of FIG. 2 the regulatory impact of specific genetic variants using the received information is assessed. Examples this analysis are provided below, however, those of ordinary skill in the art will understand that many other types of analyses are possible without deviating from the teachings of the present invention. Further processing can be performed such as shown in step 212 where the impact of genetic variants on biological function is assessed. Also, an embodiment of the present invention further performs step 214 where the impact of genetic variants on disease pathology is assessed. One of ordinary skill in the art will understand, however, that many other variations of the present invention are possible.
- NF ⁇ B Binding Regions are Enriched for Disease Associated SNPs
- a compendium of disease SNPs [see Ashley, E. A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525-1535 (2010); Chen, R., Davydov, E. V., Sirota, M. & Butte, A. J. Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association.
- PLoS ONE 5, e13574 (2010)] was intersected with a set of 15,522 NF ⁇ B binding regions found in lymphoblastoid cell lines from ten individuals [Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232-235 (2010)].
- SNPs associated with inflammatory and autoimmune diseases including Rheumatoid Arthritis, Asthma, and Systemic Lupus Erythematosus, were highly overrepresented in NF ⁇ B binding regions. These SNPs were enriched compared to all SNPs as well as the subset of disease-associated SNPs.
- FIG. 7 on average, it was found that 1.5 diseases are associated per SNP and the average disease-associated SNP in NFkB binding regions was 1.33 diseases per SNP.
- areas 702 correspond to NFkB SNPs and areas 704 correspond to all disease SNPs.
- NF ⁇ B binding regions that harbor disease-associated SNPs are more strongly bound by NF ⁇ B, as determined by ChIP-Seq binding intensity, compared to the background of all NF ⁇ B binding regions. Additionally, these binding regions are less variable, indicating the potential for evolutionary constraint on these regions.
- a pipeline as shown in FIG. 3 was developed to discover putative SNPs that may be associated with an effect on NF ⁇ B binding.
- a compendium of 24K disease-associated SNPs was received from a database.
- 15,522 genome-wide transcription factor binding profiles was received.
- the NF ⁇ B binding profiles for eight individuals was considered.
- Genome-wide expression profiling was then performed (e.g., RNA-Seq).
- the human genomic disease expression information was collected in a database.
- rs6135095 a SNP previously reported to be associated with atherosclerosis, shows significant association between genotype and NF ⁇ B binding in the 8 cell lines queried.
- aortic tissues from 10 individuals were genotyped and certain of them certain of them were found with rs6135095 CT and TT. This SNP was found to be associated with binding of NF ⁇ B (by ChIP-qPCR) as well as expression of nearby genes (SIRPG, etc).
- ChIP-Seq reads were mapped to hg19 assembly of the human genome using BWA. PCR duplicates were filtered using Picard tools. Variant calling files were downloaded from 1000 Genomes and converted to hg19 coordinates with VCF tools. Allele-specific binding (ASB) was determined on a per-heterozygote per-individual basis for the ten individuals. Reads were filtered to be above MAQ 30 mapping quality. For each individual, a binomial probability of success was determined based on the probability that a reference allele maps to the genome compared to a non-reference. Allele-specific expression (ASE) was similarly determined using reads from the transcriptome of each individual.
- ASB Allele-specific binding
- ASE Allele-specific expression
- embodiments according to the present invention can be used to discover new transcription factor interactions.
- disease-associated variants can be connected to molecular pathophysiology and can explain the function of non-coding SNPs.
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Ecology (AREA)
- Physiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
- This application claims priority to U.S. Provisional Application No. 61/526,242 filed Aug. 22, 2012, which is hereby incorporated by reference in its entirety for all purposes.
- This application claims priority to U.S. Provisional Application No. 61/526,095 filed Aug. 22, 2012, which is hereby incorporated by reference in its entirety for all purposes.
- This invention was made with Government support under contract HG000237 awarded by the National Institutes of Health. The Government has certain rights in this invention.
- Genome-wide association studies have discovered many genetic loci associated with disease, but the molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation, and may allow for functional annotation of disease-associated loci.
- Complete genome sequences of individual patients will soon become integrated as part of routine clinical care. There exists a need for interpreting the clinical significance of novel genetic variants presenting in a patient's personal genome, which are known to be associated with diverse clinical disorders. Current tools and approaches for clinical assessment of genetic variation do not explicitly consider gene regulatory information and are typically focused on specific gene coding regions.
- Embodiments of the present invention enable genome-wide systematic evaluation of potentially clinically relevant genetic variation in a personal genome. Among other things, the present invention provides methods embodied in a system that can be applied to genetic information comprising an individual genome to assess the regulatory impact of specific genetic variants and their possible impact on biological function or disease pathology.
- An embodiment of the invention is comprised of databases and algorithms embodied in software, where the databases contain information providing genome-wide quantitative and genetic profiles of transcription factor binding measured across a multitude of individual human genomes, information of genetic variants associated with disease conditions, information on DNA motifs associated with transcription factor binding, as well as molecular profiles of disease pathology.
- Embodiments of the present invention use genome-wide quantitative gene regulatory information to assess total genetic variation presenting in an individual genome. The present invention provides the ability to infer transcription factor binding events from genotypes. Also, the present invention provides the ability to associate individual variation in gene regulatory regions with biological function and disease pathology.
- Applications of the present invention include clinical assessment of personal genomes, clinical assessment of cancer genomes, and interpretation of genetic disease associations, discovery of regulatory DNA biomarkers. In other embodiments, additional types of gene regulatory information can be added.
- These and other embodiments can be more fully appreciated upon an understanding of the detailed description of the invention as disclosed below in conjunction with the attached figures.
- The following drawings will be used to more fully describe embodiments of the present invention.
-
FIGS. 1A and 1B are block diagrams of computer systems on which embodiments of the present invention can be practiced. -
FIG. 2 is a block diagram of a method according to an embodiment of the invention. -
FIG. 3 is a block diagram of a method according to an embodiment of the invention. -
FIG. 4 is a graph illustrating the variability of transcription factor binding in the analysis of regulatory variation. -
FIG. 5 is an illustration of the manner in which a single SNP is associated with NFkB binding. -
FIG. 6 is an illustration of the manner in which the EBF1 motif affects NFkB binding. -
FIG. 7 is a graph demonstrating the manner in which disease-associated SNPs in NFkB binding regions are more pleiotropic. -
FIG. 8 is a graph illustrating the effects of transcription factor binding and disease SNPs. -
FIG. 9 is a graph illustrating the manner in which SNPs associated with inflammatory and autoimmune diseases were overrepresented in NFκB binding regions. - Among other things, the present invention relates to methods, techniques, and algorithms that are intended to be implemented in
digital computer system 100 such as generally shown inFIG. 1A . Such a digital computer or embedded device is well-known in the art and may include the following. -
Computer system 100 may include at least onecentral processing unit 102 but may include many processors or processing cores.Computer system 100 may further includememory 104 in different forms such as RAM, ROM, hard disk, optical drives, and removable drives that may further include drive controllers and other hardware.Auxiliary storage 112 may also be include that can be similar tomemory 104 but may be more remotely incorporated such as in a distributed computer system with distributed memory capabilities. -
Computer system 100 may further include at least oneoutput device 108 such as a display unit, video hardware, or other peripherals (e.g., printer). At least oneinput device 106 may also be included incomputer system 100 that may include a pointing device (e.g., mouse), a text input device (e.g., keyboard), or touch screen. -
Communications interfaces 114 also form an important aspect ofcomputer system 100 especially wherecomputer system 100 is deployed as a distributed computer system.Computer interfaces 114 may include LAN network adapters, WAN network adapters, wireless interfaces, Bluetooth interfaces, modems and other networking interfaces as currently available and as may be developed in the future. -
Computer system 100 may further includeother components 116 that may be generally available components as well as specially developed components for implementation of the present invention. Importantly,computer system 100 incorporates various data buses 118 that are intended to allow for communication of the various components ofcomputer system 100. Data buses 118 include, for example, input/output buses and bus controllers. - Indeed, the present invention is not limited to
computer system 100 as known at the time of the invention. Instead, the present invention is intended to be deployed in future computer systems with more advanced technology that can make use of all aspects of the present invention. It is expected that computer technology will continue to advance but one of ordinary skill in the art will be able to take the present disclosure and implement the described teachings on the more advanced computers or other digital devices such as mobile telephones or “smart” televisions as they become available. - Moreover, the present invention may be implemented on one or more distributed computers. Still further, the present invention may be implemented in various types of software languages including C, C++, and others. Also, one of ordinary skill in the art is familiar with compiling software source code into executable software that may be stored in various forms and in various media (e.g., magnetic, optical, solid state, etc.). One of ordinary skill in the art is familiar with the use of computers and software languages and, with an understanding of the present disclosure, will be able to implement the present teachings for use on a wide variety of computers.
- The present disclosure provides a detailed explanation of the present invention with detailed explanations that allow one of ordinary skill in the art to implement the present invention into a computerized method. Certain of these and other details are not included in the present disclosure so as not to detract from the teachings presented herein but it is understood that one of ordinary skill in the art would be familiar with such details.
- In an embodiment of the invention as shown in
FIG. 1B , a computer server that implements certain of the methods of the invention is remotely situated from a user. Computer server 122 is communicatively coupled so as to receive information from a user; likewise, computer server 122 is communicatively coupled so as to send information to a user. In an embodiment of the invention, the user usesuser computing device 124 so as to access computer server 122 vianetwork 126. Network 126 can be the internet, a local network, a private network, a public network, or any other appropriate network as may be appropriate to implement the invention as described herein. -
User computing device 124 can be implemented in various forms such asdesktop computer 128,laptop computer 130,smart phone 132, ortablet device 134. Other devices that may be developed and are capable of the computing actions described herein are also appropriate for use in conjunction with the present invention. - In the present disclosure, computing and other activities will be described as being conducted on either computer server 122 or
user computing device 124. It should be understood, however, that many if not all of such activities may be reassigned from one to the other device while keeping within the present teachings. For example, for certain steps computations that may be described as being performed on computer server 122, a different embodiment may have such computations performed onuser computing device 124. - In an embodiment of the invention, computer server 122 is implemented as a web server on which Apache HTTP web server software is run. Computer server 122 can also be implemented in other manners such as an Oracle web server (known as Oracle iPlanet Web Server). In an embodiment computer server 122 is a UNIX-based machine but can also be implemented in other forms such as a Windows-based machine. Configured as a web server, computer server 122 is configured to serve web pages over
network 126 such as the internet. - In an embodiment,
user computing device 124 is configured so as to run web browser software. For example, whereuser computing device 124 is implemented asdesktop computer 128 orlaptop computer 130, currently available web browser software includes Internet Explorer, Firefox, and Chrome. Other browser software is available for different applications ofuser computing device 124. Still other software is expected to be developed in the future that is able to execute certain steps of the present invention. - In an embodiment,
user computing device 124, through the use of appropriate software, queries computer server 122. Responsive to such query, computer server 122 provides information so as to display certain graphics and text on user computing device. In an embodiment, the information provided by computer server 122 is in the form of HTML that can be interpreted by and properly displayed onuser computing device 124. Computer server 122 may provide other information that can be interpreted on user computing device. - It has been found that transcription factor binding is significant in the analysis of regulatory variation. For example, as shown in
FIG. 4 , whereas individuals vary very little at approximately 1/1300 bp and vary at approximately 1/1000 in promoter regions, transcription factor binding is much more variable. As shown inFIG. 4 ,promoter sequences 406,coding region nucleotides 404, andamino acid sequences 408 vary very little. Note, however that, transcription factor binding 402 has been found to vary in approximately 7.5% of NFkB binding sites. For purposes of personal genome sequencing, it has been found that 7.5% of SNPs are in transcription factor binding sites and 21% of SNPs are in DNasel HS sites. Embodiments of the present invention make use of these facts for purposes of improved functional and clinical analysis. For example, a single SNP is associated with NFkB binding as shown inFIG. 5 . - In a certain sense, it is important to consider how binding of one factor can affect the binding of another. For example, as shown in
FIG. 5 ,Stat1 motif 502 affectsStat1 504 binding. But it is also important to note thatStat1 504binding affects NFkB 508 binding, for example, despite the conservation of theNFkB motif 510. More particularly, for example, it has been found that EBF1 motif affects NFkB binding as shown inFIG. 6 . - A systematic approach is presented herein to combine disease association, transcription factor binding, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, it was found that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide binding variation information for eight fully sequenced individuals, it was found that regions of NFκB binding correlated with disease-associated variants in an allele-specific manner (see pipeline method of
FIG. 3 to be discussed below). Furthermore, it was found that this binding variation is often correlated with expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. In this systematic approach, a loop is closed in biological context-free association studies and assign putative function to many disease-associated SNPs. In other embodiments of the invention, these predictions can be validated for atherosclerosis, asthma, and/or rheumatoid arthritis. It should, therefore, be noted that although certain particular embodiments will be discussed herein, such descriptions are illustrative and do not limit the scope of the present invention. - The association between genotype and phenotype is a fundamental problem in biology and translation medicine. Genome-wide association studies (GWASs) have identified many genetic variants associated with diseases [see Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl
Acad Sci USA 106, 9362-9367 (2009); note that these and other references cited herein are incorporated by reference for all purposes], but such approaches rely on “tag” single nucleotide polymorphisms (SNPs) found on DNA microarrays. While these SNPs may lie in or near gene regions, their specific influences on the biology of disease are not necessarily determined in typical GWASs [Green, E. D., Guyer, M. S. National Human Genome Research Institute Charting a course for genomic medicine from base pairs to bedside. Nature 470, 204-213 (2011)]. Furthermore, disease-associated SNPs that are found outside of genic regions are often not further investigated because they are of unknown function. - Systems biology can provide an approach to bridge the gap between genotype and phenotype. For example, human variation in transcription factor (TF) binding has been correlated with polymorphisms in motifs for NFκB and PolII in ten individuals [Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232-235 (2010); Karczewski, K. J. et al. Discovering Cooperative Transcription Factor Associations using Binding Variation Information and the ALPHABIT Pipeline. 1-25 (2011)] and regulatory features across dozens of cell lines have been mapped extensively by the ENCODE project [Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816 (2007)].
- It is, therefore, expected that polymorphisms that affect transcription factor binding can have a significant influence on disease because the differences in TF binding (that lead to downstream differences in expression) may be the true underlying cause of the disease association of the SNPs. These functional biology-rich sources of data can, therefore, be leveraged to suggest putative function for previously unannotated disease-associated SNPs.
- In the present disclosure, the role of transcription factor binding sites in disease is described. As a non-limiting case study, genome-wide enrichments are explored for disease SNPs in NFκB (p65) binding regions to predict genotype-specific binding events associated with disease.
- Shown in
FIG. 2 is a block diagram of a method according to an embodiment of the present invention. As shown instep 202, information is received regarding genome-wide quantitative and genetic profiles of transcription factor binding measured across a multitude of individual human genomes. Atstep 204, information is received regarding genetic variants associated with disease conditions. Information regarding DNA motifs associated with transcription factor binding is received atstep 206. Atstep 208, molecular profiles of disease pathologies are received. In an embodiment of the present invention, the information ofsteps 202 through 208 is contained in databases. In an embodiment, certain of the information ofsteps 202 through 208 is automatically generated or manually curated as further disclosed in copending application Ser. No. ______, entitled “Method and System for the Use of Biomarkers for Regulatory Dysfunction in Disease,” which is herein incorporated by reference for all purposes. In an embodiment of the invention, the databases are maintained locally on a computer system on which the method of the present invention are processed. In another embodiment of the invention, the databases are maintained remotely from a computer on which certain steps of the present invention are performed. - As shown in step 210 of
FIG. 2 , the regulatory impact of specific genetic variants using the received information is assessed. Examples this analysis are provided below, however, those of ordinary skill in the art will understand that many other types of analyses are possible without deviating from the teachings of the present invention. Further processing can be performed such as shown instep 212 where the impact of genetic variants on biological function is assessed. Also, an embodiment of the present invention further performsstep 214 where the impact of genetic variants on disease pathology is assessed. One of ordinary skill in the art will understand, however, that many other variations of the present invention are possible. - NFκB Binding Regions are Enriched for Disease Associated SNPs
- In a particular embodiment of the present invention, a compendium of disease SNPs [see Ashley, E. A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525-1535 (2010); Chen, R., Davydov, E. V., Sirota, M. & Butte, A. J. Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. PLoS ONE 5, e13574 (2010)] was intersected with a set of 15,522 NFκB binding regions found in lymphoblastoid cell lines from ten individuals [Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232-235 (2010)]. It was found that established disease-associated SNPs were overabundant in regions bound by NFκB (χ2=292.9; p=1.1e-65; Fisher's OR=2.95). These associations are not biased by the platforms used for disease association discovery, as NFκB regions are underrepresented on Affymetrix 6.0 and 500K arrays (Fisher's OR=0.8 and 0.82, respectively) and only slightly overrepresented on Illumina 550K and 1M (Fisher's OR= and 1.42, respectively), which represented a smaller portion of this analysis. Additionally, binding sites of a known interacting factor, Stat1, were also highly enriched for disease-SNPs; this enrichment was not present in promoter regions, as defined by PolII binding as shown in
FIG. 8 . - As shown in
FIG. 9 , SNPs associated with inflammatory and autoimmune diseases, including Rheumatoid Arthritis, Asthma, and Systemic Lupus Erythematosus, were highly overrepresented in NFκB binding regions. These SNPs were enriched compared to all SNPs as well as the subset of disease-associated SNPs. - Disease-associated SNPs in NFκB binding regions are more pleiotropic (e.g., typically associated with more diseases) than the collection of known disease-associated SNPs (1.33 vs. 1.15; t-test p-value=8.7e-4; Mann Whitney U-test p-value=2.7e-7). For example, as shown in
FIG. 7 , on average, it was found that 1.5 diseases are associated per SNP and the average disease-associated SNP in NFkB binding regions was 1.33 diseases per SNP. InFIG. 7 , note that areas 702 correspond to NFkB SNPs and areas 704 correspond to all disease SNPs. - Disease Associated SNPs are Found in More Biologically Relevant Binding Regions
- NFκB binding regions that harbor disease-associated SNPs are more strongly bound by NFκB, as determined by ChIP-Seq binding intensity, compared to the background of all NFκB binding regions. Additionally, these binding regions are less variable, indicating the potential for evolutionary constraint on these regions.
- SNPs in NFκB Binding Regions Suggest a Mechanism for the Biology of Disease
- In a systematic effort to assign functional annotation to disease-associated SNPs, a pipeline as shown in
FIG. 3 was developed to discover putative SNPs that may be associated with an effect on NFκB binding. As shown inFIG. 3 , at step 302 a compendium of 24K disease-associated SNPs was received from a database. At step 304, 15,522 genome-wide transcription factor binding profiles was received. Atstep 306, the NFκB binding profiles for eight individuals was considered. Genome-wide expression profiling was then performed (e.g., RNA-Seq). Finally, atstep 310, the human genomic disease expression information was collected in a database. - Using genotype and NFκB binding information from eight individuals [Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232-235 (2010)], a preliminary, lower-power analysis was performed to identify candidate SNPs. In an assessment of SNPs in NFκB binding regions in linkage disequilibrium (R2>0.5) with a disease-associated SNP, SNPs associated with NFκB binding were found by an ANOVA. For instance, rs6135095, a SNP previously reported to be associated with atherosclerosis, shows significant association between genotype and NFκB binding in the 8 cell lines queried.
- These variants associated with NFκB binding were linked with downstream expression effects of nearby genes. Considering all genes within 200 kb to be potential targets, disease-associated SNPs were found to be associated with changes in NFκB binding which were correlated with expression of nearby genes.
- In an independent validation experiment, aortic tissues from 10 individuals were genotyped and certain of them certain of them were found with rs6135095 CT and TT. This SNP was found to be associated with binding of NFκB (by ChIP-qPCR) as well as expression of nearby genes (SIRPG, etc).
- ENCODE/HS Sites
- Poll binding regions were not overrepresented for disease SNPs. Therefore, an enrichment for disease-associated SNPs was computed for various factors in several cell lines and it was found that these SNPs were overrepresented in certain of those factors.
- Data Sources
- Data on disease-SNP associations (p<0.01) were used as in [Ashley, E. A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525-1535 (2010); Chen, R., Davydov, E. V., Sirota, M. & Butte, A. J. Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. PLoS ONE 5, e13574 (2010)]. ChIP-Seq data on eight cell lines with individual genome sequences was obtained from [Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232-235 (2010)]. All analyses were performed using
dbSNP release 132 and hg19 coordinates. - ASB/ASE
- ChIP-Seq reads were mapped to hg19 assembly of the human genome using BWA. PCR duplicates were filtered using Picard tools. Variant calling files were downloaded from 1000 Genomes and converted to hg19 coordinates with VCF tools. Allele-specific binding (ASB) was determined on a per-heterozygote per-individual basis for the ten individuals. Reads were filtered to be above
MAQ 30 mapping quality. For each individual, a binomial probability of success was determined based on the probability that a reference allele maps to the genome compared to a non-reference. Allele-specific expression (ASE) was similarly determined using reads from the transcriptome of each individual. - Statistical Analysis
- Overall associations between NFκB binding regions and disease-associated SNPs were ascertained by chi-squared and Fisher's exact tests. Associations between individual SNPs and binding strengths were tested by two sample t-tests (with two genotypes grouped) or ANOVA for all 3 genotypes. All statistical analysis methods were performed using R statistical software (2.12.1).
- Using embodiments according to the present invention, the previously unknown functional significance of regulatory variants is possible. Indeed, embodiment of the present invention can be used to discover new transcription factor interactions. For example, using the present invention disease-associated variants can be connected to molecular pathophysiology and can explain the function of non-coding SNPs.
- It should be appreciated by those skilled in the art that the specific embodiments disclosed above may be readily utilized as a basis for modifying or designing other techniques for carrying out the same purposes of the present invention. It should also be appreciated by those skilled in the art that such modifications do not depart from the scope of the invention as set forth in the appended claims.
Claims (25)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/592,291 US20130116930A1 (en) | 2011-08-22 | 2012-08-22 | Method and System for Assessment of Regulatory Variants in a Genome |
| US16/201,913 US20190172556A1 (en) | 2011-08-22 | 2018-11-27 | Method and System for Assessment of Regulatory Variants in a Genome |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201161526095P | 2011-08-22 | 2011-08-22 | |
| US201161526242P | 2011-08-22 | 2011-08-22 | |
| US13/592,291 US20130116930A1 (en) | 2011-08-22 | 2012-08-22 | Method and System for Assessment of Regulatory Variants in a Genome |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/201,913 Continuation US20190172556A1 (en) | 2011-08-22 | 2018-11-27 | Method and System for Assessment of Regulatory Variants in a Genome |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20130116930A1 true US20130116930A1 (en) | 2013-05-09 |
Family
ID=48224278
Family Applications (4)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/592,291 Abandoned US20130116930A1 (en) | 2011-08-22 | 2012-08-22 | Method and System for Assessment of Regulatory Variants in a Genome |
| US13/592,292 Active US9946835B2 (en) | 2011-08-22 | 2012-08-22 | Method and system for the use of biomarkers for regulatory dysfunction in disease |
| US15/954,354 Abandoned US20180373838A1 (en) | 2011-08-22 | 2018-04-16 | Method and System for the Use of Biomarkers for Regulatory Dysfunction in Disease |
| US16/201,913 Abandoned US20190172556A1 (en) | 2011-08-22 | 2018-11-27 | Method and System for Assessment of Regulatory Variants in a Genome |
Family Applications After (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/592,292 Active US9946835B2 (en) | 2011-08-22 | 2012-08-22 | Method and system for the use of biomarkers for regulatory dysfunction in disease |
| US15/954,354 Abandoned US20180373838A1 (en) | 2011-08-22 | 2018-04-16 | Method and System for the Use of Biomarkers for Regulatory Dysfunction in Disease |
| US16/201,913 Abandoned US20190172556A1 (en) | 2011-08-22 | 2018-11-27 | Method and System for Assessment of Regulatory Variants in a Genome |
Country Status (1)
| Country | Link |
|---|---|
| US (4) | US20130116930A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130116931A1 (en) * | 2011-08-22 | 2013-05-09 | The Board Of Trustees Of The Leland Stanford Junior University | Method and System for the Use of Biomarkers for Regulatory Dysfunction in Disease |
| CN103971031A (en) * | 2014-05-04 | 2014-08-06 | 南京师范大学 | Read positioning method oriented to large-scale gene data |
| US10127346B2 (en) | 2011-04-13 | 2018-11-13 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for interpreting a human genome using a synthetic reference sequence |
| WO2020095035A1 (en) * | 2018-11-05 | 2020-05-14 | Earlham Institute | Genomic analysis |
| CN112740239A (en) * | 2018-10-08 | 2021-04-30 | 福瑞诺姆控股公司 | Transcription factor analysis |
| US20240339177A1 (en) * | 2022-11-01 | 2024-10-10 | Invitae Corporation | Population frequency modeling for quantitative variant pathogenicity estimation |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3343416B1 (en) * | 2016-12-27 | 2024-03-06 | Tata Consultancy Services Limited | System and method for improved estimation of functional potential of genomes and metagenomes |
| CN110070908B (en) * | 2019-03-11 | 2021-08-13 | 西安电子科技大学 | A motif search method, device, device and storage medium for binomial tree model |
| US20220375609A1 (en) * | 2019-10-16 | 2022-11-24 | NemaMetrix, Inc. | Clinical variant classifier models, machine learning systems and methods of use |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040161779A1 (en) * | 2002-11-12 | 2004-08-19 | Affymetrix, Inc. | Methods, compositions and computer software products for interrogating sequence variations in functional genomic regions |
| US8163896B1 (en) * | 2002-11-14 | 2012-04-24 | Rosetta Genomics Ltd. | Bioinformatically detectable group of novel regulatory genes and uses thereof |
| US8008013B2 (en) * | 2007-11-16 | 2011-08-30 | Oklahoma Medical Research Foundation | Predicting and diagnosing patients with autoimmune disease |
| EP2601609B1 (en) * | 2010-08-02 | 2017-05-17 | Population Bio, Inc. | Compositions and methods for discovery of causative mutations in genetic disorders |
| US10127346B2 (en) | 2011-04-13 | 2018-11-13 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for interpreting a human genome using a synthetic reference sequence |
| US20130116930A1 (en) * | 2011-08-22 | 2013-05-09 | The Board Of Trustees Of The Leland Stanford Junior University | Method and System for Assessment of Regulatory Variants in a Genome |
-
2012
- 2012-08-22 US US13/592,291 patent/US20130116930A1/en not_active Abandoned
- 2012-08-22 US US13/592,292 patent/US9946835B2/en active Active
-
2018
- 2018-04-16 US US15/954,354 patent/US20180373838A1/en not_active Abandoned
- 2018-11-27 US US16/201,913 patent/US20190172556A1/en not_active Abandoned
Non-Patent Citations (2)
| Title |
|---|
| Annotation of functional variation in personal genomes usign RegulomeDB Boyle et al, Genome research 22:1790-1797, 2012 * |
| Variation in transcription factor binding among humans, Kasowski et al ,Science vol 328, April 9 2010. * |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10127346B2 (en) | 2011-04-13 | 2018-11-13 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for interpreting a human genome using a synthetic reference sequence |
| US20130116931A1 (en) * | 2011-08-22 | 2013-05-09 | The Board Of Trustees Of The Leland Stanford Junior University | Method and System for the Use of Biomarkers for Regulatory Dysfunction in Disease |
| US9946835B2 (en) * | 2011-08-22 | 2018-04-17 | The Board Of Trustees Of The Leland Stanford Junior University | Method and system for the use of biomarkers for regulatory dysfunction in disease |
| CN103971031A (en) * | 2014-05-04 | 2014-08-06 | 南京师范大学 | Read positioning method oriented to large-scale gene data |
| CN112740239A (en) * | 2018-10-08 | 2021-04-30 | 福瑞诺姆控股公司 | Transcription factor analysis |
| WO2020095035A1 (en) * | 2018-11-05 | 2020-05-14 | Earlham Institute | Genomic analysis |
| US20240339177A1 (en) * | 2022-11-01 | 2024-10-10 | Invitae Corporation | Population frequency modeling for quantitative variant pathogenicity estimation |
| US12191001B2 (en) * | 2022-11-01 | 2025-01-07 | Laboratory Corporation Of America Holdings | Population frequency modeling for quantitative variant pathogenicity estimation |
Also Published As
| Publication number | Publication date |
|---|---|
| US20130116931A1 (en) | 2013-05-09 |
| US20190172556A1 (en) | 2019-06-06 |
| US9946835B2 (en) | 2018-04-17 |
| US20180373838A1 (en) | 2018-12-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20190172556A1 (en) | Method and System for Assessment of Regulatory Variants in a Genome | |
| Liu et al. | Trans effects on gene expression can drive omnigenic inheritance | |
| Martin et al. | Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations | |
| Banovich et al. | Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels | |
| Stranger et al. | Patterns of cis regulatory variation in diverse human populations | |
| JP6525434B2 (en) | Methods and processes for non-invasive assessment of gene mutations | |
| Maurano et al. | Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo | |
| Fareed et al. | Single nucleotide polymorphism in genome-wide association of human population: a tool for broad spectrum service | |
| Handsaker et al. | Large multiallelic copy number variations in humans | |
| Brunschwig et al. | Fine-scale maps of recombination rates and hotspots in the mouse genome | |
| JP6561046B2 (en) | Methods and treatments for non-invasive assessment of genetic variation | |
| Wang et al. | Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions | |
| Koopmann et al. | Genome-wide identification of expression quantitative trait loci (eQTLs) in human heart | |
| Wright et al. | Simulating association studies: a data-based resampling method for candidate regions or whole genome scans | |
| Cazares et al. | maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks | |
| Yamamoto et al. | Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population | |
| JP2023123759A (en) | Molecular analysis using cell-free fragments during pregnancy | |
| US20200251178A1 (en) | Method and System for Identifying Clinical Phenotypes in Whole Genome DNA Sequence Data | |
| Zhang et al. | A whole genome long-range haplotype (WGLRH) test for detecting imprints of positive selection in human populations | |
| CN101617227A (en) | Genetic analysis system and method | |
| Wang et al. | Enhancing discoveries of molecular QTL studies with small sample size using summary statistic imputation | |
| Krieger et al. | Evolution of transcription factor binding through sequence variations and turnover of binding sites | |
| Arneson et al. | Systematic discovery of conservation states for single-nucleotide annotation of the human genome | |
| Gatti et al. | FastMap: fast eQTL mapping in homozygous populations | |
| Arkin et al. | EPIQ—efficient detection of SNP–SNP epistatic interactions for quantitative traits |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KARCZEWSKI, KONRAD;DUDLEY, JOEL T.;SNYDER, MICHAEL;AND OTHERS;SIGNING DATES FROM 20121123 TO 20121210;REEL/FRAME:035419/0281 |
|
| AS | Assignment |
Owner name: NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF Free format text: CONFIRMATORY LICENSE;ASSIGNOR:STANFORD UNIVERSITY;REEL/FRAME:036092/0566 Effective date: 20150707 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |