WO2002031188A2 - Method and device for predicting haplotypes - Google Patents
Method and device for predicting haplotypes Download PDFInfo
- Publication number
- WO2002031188A2 WO2002031188A2 PCT/EP2001/011726 EP0111726W WO0231188A2 WO 2002031188 A2 WO2002031188 A2 WO 2002031188A2 EP 0111726 W EP0111726 W EP 0111726W WO 0231188 A2 WO0231188 A2 WO 0231188A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- genotype
- haplotypes
- haplotype
- compatible
- pairs
- Prior art date
Links
- 102000054766 genetic haplotypes Human genes 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 35
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000009826 distribution Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000002068 genetic effect Effects 0.000 description 6
- 108700028369 Alleles Proteins 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000035772 mutation Effects 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000010352 biotechnological method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the invention relates to a method for the treatment of gene sequence data sequences, in particular to predict or assignment of each of at least one haplotype pair to all 's genotypes from a given amount.
- the method is applicable to any number of genotypes that have been determined from diploid organisms.
- the invention also relates to devices for carrying out and using the method.
- the sequencing of the human genome has essentially been completed.
- the entire genetic information is also available in duplicate in the human genome.
- Each individual has two copies of each gene, one from the mother and one from the father. The copies do not have to be identical. They can differ in some positions in the gene sequences.
- the different forms of genes are usually referred to as alleles, and the term genotype is used for the individual pair of alleles of a gene in question.
- the single allele - or the entirety of alleles of several genes that an individual has inherited from a particular parent - is called the haplotype.
- haplotype pair of each individual is of particular interest for research into the genetic causes of complex diseases. If a sample of genotypes is obtained from a population using a selection principle, the associated haplotype pairs cannot the basis of the HWE are estimated because the requirements for the validity of the HWE are not met. The number of possible haplotypes can be very large and lead to unacceptable computing times.
- haplotype prediction or assignment to genotypes of a selected set of organisms is not only human genetics. Haplotype estimation is generally important in all areas of biology, medicine or agriculture where diploid organisms are considered.
- the object of the invention is to provide methods for processing Gensequen 'zen diploid organisms, can be determined with which within practical computational times haplotype pairs for the individual genotypes within a selected from a population amount of genotypes.
- the object of the invention is also to provide devices for implementing the methods and new applications.
- the basic idea of the invention is to determine possible haplotype pairs and their probability for each genotype of a sample, which was obtained from a population according to an arbitrary selection principle, using methods from combinatorics and probability theory.
- the genotypes are expediently coded in sequences of numbers, the elements of which are characteristic of the presence or absence of a mutation at the position of the gene sequence under consideration.
- the elements of the number sequences can be, for example, the numbers -1, 0, 1 and 2 for the genotypes and the numbers 0 and 1 for the haplotypes include.
- the compatible haplotypes that could contribute to the genotype under consideration are selected from a large number of theoretically possible haplotypes.
- an estimation method similar to the maximum likelihood principle is used to determine and save or output a most likely haplotype pair and possibly further haplotype pairs.
- the method according to the invention has the advantage of considering the haplotypes in question as a priori unknown parameters and making their estimates on the basis of a statistical concept.
- Another important advantage is that the computational effort is considerably reduced and data processing is also possible for large data sets within practical computing times.
- the estimation method introduced according to the invention for estimating the haplotype frequencies and accordingly for estimating the most likely haplotype pairs for the individual genotypes represents a well-founded estimate for the concrete sample, which, when the HWE is valid and for sufficiently large samples, with the estimates according to the usual methods for a population that the HWE meets.
- the method according to the invention for processing gene sequences of diploid organisms for haplotype prediction for individual genotypes from a sample of subjects comprises the steps explained below.
- Step 1 data provision
- a sample of volunteers is considered within a population.
- the data to be processed is provided by converting the gene sequences of the genotypes considered previously determined on the test subjects into numerical sequences and the calculation of genotype frequencies f (g).
- genotypes and haplotypes are represented as sequences of numbers. It has proven advantageous here to use the numbers -1, 0, 1 and 2. These have the following meaning at the respective positions of the corresponding gene sequences:
- the standard used for comparison usually corresponds to the first sequence of the gene published in a public database.
- the haplotypes are represented as sequences which consist of the numbers 1 and 0 and have the same length as the genotypes under consideration.
- the number sequences of the haplotypes show where there is a mutation (1) or where there is no mutation (0).
- haplotypes The data of all theoretically possible haplotypes are provided as 1/0-number sequences of the length 'of the genotypes under consideration. The following relationship exists between the number sequences of the genotypes and haplotypes:
- haplotype pair (x, y) corresponding to a genotype g fulfills the equation:
- the addition is to be carried out item by item and under the following assumption with respect to the non-negative items in g.
- the equality relates only to the non-negative positions g.
- the result of the addition is arbitrary, i.e. it is always counted as equality.
- Equation (1) always has at least one solution for a given genotype g.
- Haplotypes (x) and (y), for which equation (1) can be fulfilled at all, are called compatible haplotypes.
- the compatible haplotypes thus comprise a set of theoretically possible haplotypes that lead to the notyp g can contribute.
- Haplotype pairs (x, y) that satisfy equation (1) are said to be haplotype pairs compatible with g.
- haplotypes are important for the implementation of the estimation principle explained below. If one haplotype x is compatible with g, the second haplotype y is uniquely determined according to equation (1). In two haplotype pairs compatible with g, there is either no common haplotype or the pairs are identical.
- G The number of different genotypes found in the sample of subjects is designated by G.
- Each of the genotypes g can occur several times in the test group, its relative frequency is designated by f (g).
- Step 2 Determine the compatible haplotypes
- haplotypes are tested to determine whether they are compatible with one or more of the genotypes of the group of subjects under consideration. Any haplotype that is compatible with at least one of the genotypes is saved. The subset of haplotypes thus formed from the totality considered is irreducible due to the construction method, ie no haplotypes are contained more than once. With this selection, the data to be processed is reduced.
- the quantity H (g) of the haplotypes x compatible with g is formed for each genotype ge G.
- the set H (g) thus includes all the haplotypes x for which there is at least one second haplotype y, so that equation (1) is satisfied.
- the number of these haplotypes is denoted by N (g). "To estimate the haplotypes, an estimation principle that is alternative to the HWE is introduced (see below).
- the a priori probability h (g, x) that x is a haplotype of g (x is compatible with g) is calculated according to a certain distribution principle:
- the equal distribution is used as the distribution principle.
- other distributions could be used, for example those that assign different weights to the genotypes based on a priori information.
- Step 3 Determine the compatible haplotype pairs
- haplotype pairs are selected which are compatible with g. These are saved.
- the probability of finding the haplotype x in a randomly selected sample from the sample is calculated using the following sample averaging:
- the probabilities p (x, y) for a test subject with a genotype g having the haplotypes x and y are calculated.
- the significance of this haplotype pair (x, y) for the genotype g is evaluated with the probability p (x, y).
- the probabilities p (x, y) can be calculated using various model assumptions. For example, the calculation of p according to
- haplotype pairs (x, y) that exceed a specified minimum probability are specified, displayed or saved.
- the method according to the invention for processing gene sequences of diploid organisms can, depending on the application, be in a wide variety of forms, e.g. B. with the help of a computer program or a device for processing gene sequences.
- the device for processing of genetic sequences comprises in particular a data conversion means for providing one of the above genotype and haplotype data strings to a group of genotypes or haplotypes, a 'selecting means for determining the compatible haplotypes and the compatible haplotype pairs, a calculating device for performing the estimation method, and an output device for storing, outputting and / or displaying the estimated haplotype pairs
- the device can be formed by a computer or a circuit and memory arrangement specially equipped with said devices.
- Genotype 1 1 0 0 2 2 2 0 1 0 1 1 2 1 2
- Genotype 2 1 1 0 2 2 2 0 1 0 1 0 2 1 1
- Genotype 3 1 1 -1 2 2 2 0 0 0 0 0 0 0 1 1
- Genotype 4 0 2 0 2 2 2 2 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.00
- Genotype 5 2 0 1 2 2 2 0 1 0 1 0 1 1 1
- Genotype 6 1 1 0 2 2 2 0 0 0 0 -1 0 0 1
- Genotype 7 0 2 0 1 2 2 0 1 1 0 0 0 0 0 1 1
- Genotype 8 2 0 1 2 -1 2 0 1 0 1 0 1 1 1
- Genotype 9 1 1 1 2 2 2 2 0 0 0 0 0 0 1 2
- Genotype 10 1 1 1 1 1 1 2 0 1 0 1 0 1 1 1 2
- Genotype 11 1 1 0 2 2 2 0 -1 0 1 0 1 1 2
- Genotype 12 1 1 0 -1 2 2 0 1 0 1 0 1 1 1
- Genotype 13 2 0 1 2 2 2 0 1 0 1 0 1 1 -1
- Genotype 14 0 2 0 0 1 0 2 2 2 0 0 0 0 0 0 0 0
- Genotype 15 1 1 0 1 2 1 1 2 1 1 0 2 0 0
- Genotype 16 1 1 0 1 2 1 1 2 1 0 0 2 1 1
- Genotype 17 1 1 0 1 -1 -1 1 2 1 0 0 2 1 1
- Genotype 18 1 1 0 2 2 2 0 1 0 0 1 0 0 1 0 0 1
- the compatible haplotype pairs were initially ordered according to the size of the probabilities p (x, y). However, only a maximum of the 3 most likely of them are given here. The last column shows the values of these probabilities relative to the respective maximum value.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Verfahren und Vorrichtung zur Haplotypenvorhersage Method and device for haplotype prediction
Die Erfindung betrifft ein Verfahren zur Bearbeitung von Gensequenz-Datenfolgen, insbesondere zur Vorhersage oder Zuordnung von je mindestens einem Haplotypenpaar zu all'en Genotypen aus einer gegebenen Menge. Das Verfahren ist auf beliebige Mengen von Genotypen anwendbar, die aus diploiden Organismen ermittelt worden sind. Die Erfindung betrifft auch Vorrichtungen zur Durchführung und Anwendungen des Verfahrens .The invention relates to a method for the treatment of gene sequence data sequences, in particular to predict or assignment of each of at least one haplotype pair to all 's genotypes from a given amount. The method is applicable to any number of genotypes that have been determined from diploid organisms. The invention also relates to devices for carrying out and using the method.
Die Sequenzierung des menschlichen Genoms (Feststellung der Abfolge molekularer Bausteine (Nukleotide) , aus denen die DNA als Erbinformationsträger zusammengesetzt ist) , ist im wesentlichen abgeschlossen. Auch im menschlichen Genom ist die gesamte Erbinformation in doppelter Ausführung vorhanden. Jedes Individuum besitzt zwei Kopien jedes Gens, von denen jeweils eine von der Mutter und eine vom Vater stammen. Die Kopien müssen nicht identisch sein. Sie können sich an einigen Positionen in den Gensequenzen unterscheiden. Die unterschiedlichen Ausprägungen von Genen werden üblicherweise als Allele bezeichnet, und für das individuelle Paar von Allelen eines betrachteten Gens wird die Bezeichnung Genotyp verwendet. Das einzelne Allel - oder auch die Gesamtheit von Allelen mehrerer Gene, die ein Individuum von einem bestimmten Elternteil ererbt hat, wird als Haplotyp bezeichnet.The sequencing of the human genome (determination of the sequence of molecular building blocks (nucleotides) from which the DNA as a genetic information carrier is composed) has essentially been completed. The entire genetic information is also available in duplicate in the human genome. Each individual has two copies of each gene, one from the mother and one from the father. The copies do not have to be identical. They can differ in some positions in the gene sequences. The different forms of genes are usually referred to as alleles, and the term genotype is used for the individual pair of alleles of a gene in question. The single allele - or the entirety of alleles of several genes that an individual has inherited from a particular parent - is called the haplotype.
Es besteht ein Interesse, zu dem Genotyp eines Probanden (z.B. kranker Organismus, gesunder Organismus) das Haplotypenpaar zu erfassen, aus dem sich der betrachtete Genotyp zusammensetzt. Praktisch strebt man die Vorhersage aller Haplotypenpaare an, aus denen der Genotyp mit einer vorzugebenden Mindestwahrscheinlichkeit zusammengesetzt sein könnte. Es ist durch allgemein bekannte biotechnologische Verfahren möglich, zu einem gegebenen Genotyp das zugehörige Haplotypenpaar zu ermitteln. Diese individuelle Analyse ist jedoch wegen des hohen Arbeits- und Kostenaufwandes nachteilig.There is an interest in recording the genotype of a test subject (eg sick organism, healthy organism) the pair of haplotypes from which the genotype under consideration is composed. In practice, the aim is to predict all haplotype pairs from which the genotype could be composed with a minimum probability to be specified. It is possible to determine the associated haplotype pair for a given genotype using generally known biotechnological methods. However, this individual analysis is disadvantageous because of the high labor and cost.
Bisher war man im Zusammenhang mit populationsgenetischen Fragestellungen an der Schätzung von Haplotypfrequenzen in einer Population interessiert. Es sind verschiedene Methoden bekannt geworden (siehe: Terwilliger, J. , Ott, J. : Handbook for Human Ge- netic Linkage. Johns Hopkins üniversity Press, Baltimore, 1994; Hawley,M.E., Kidd,K.K.: J. Hered. 86: 409-411, 1995; Excoffier, L., Slatkin, M. : Mol. Biol. Evol. 12: 921-927, 1995), die Haplo- typfrequenzen aus einer repräsentativen Stichprobe von Genotypen aus einer Population rechnerisch zu bestimmen. Die einzige bisher bekannte Methode, individuelle Haplotypen aus Genotypen vorherzusagen, wurde von A. G. Clark entwickelt (Mol. Biol. Evol. 7: 111-122, 1990). Diese Methode hat jedoch zwei wesentliche Nachteile: sie ist rein empirisch, beruht nicht auf einem wohl begründeten statistischen Konzept und ist nur unter gewissen einschränkenden Voraussetzungen überhaupt anwendbar. Die am weitesten verbreiteten Schätzmethoden basieren sämtlich auf der Annahme des Hardy- einberg-Gleichgewichts (HWE) . Mit dem HWE wird ausgesagt, dass unter bestimmten Voraussetzungen, die insbesondere die zufällige Vereinigung der Allele innerhalb der Population und fehlende Mutationen sowie die Unendlichkeit und Abgeschlossenheit der Population betreffen, die Häufigkeiten der in der Population vorhandenen Genotypen bereits durch die Häufigkeiten der Haplotypen bestimmt sind und nach einer Generation konstant bleiben.So far, people have been interested in estimating haplotype frequencies in a population in connection with population genetic issues. Various methods have become known (see: Terwilliger, J., Ott, J.: Handbook for Human Genetic Linkage. Johns Hopkins University Press, Baltimore, 1994; Hawley, ME, Kidd, KK: J. Hered. 86: 409-411, 1995; Excoffier, L., Slatkin, M.: Mol. Biol. Evol. 12: 921-927, 1995) to computationally determine the haplo-type frequencies from a representative sample of genotypes from a population. The only known method to predict individual haplotypes from genotypes was developed by A.G. Clark (Mol. Biol. Evol. 7: 111-122, 1990). However, this method has two major disadvantages: it is purely empirical, is not based on a well-founded statistical concept and can only be used under certain restrictive conditions. The most widely used estimation methods are all based on the assumption of the Hardyeinberg equilibrium (HWE). With the HWE it is stated that under certain conditions, which concern in particular the random association of alleles within the population and missing mutations as well as the infinity and isolation of the population, the frequencies of the genotypes present in the population are already determined by the frequencies of the haplotypes and stay constant after a generation.
Für die Erforschung der genetischen Ursachen von komplexen Erkrankungen ist insbesondere auch die Kenntnis des Haplotypenpaa- res eines jeden Individuums von Interesse. Wenn eine Stichprobe von Genotypen mittels eines Auswahlprinzips aus einer Population gewonnen wird, können die zugehörigen Haplotypenpaare nicht auf der Basis des HWE geschätzt werden, da die Voraussetzungen für die Gültigkeit des HWE nicht erfüllt sind. Die Zahl der in Frage kommenden Haplotypen kann unter Umständen sehr groß sein und zu inakzeptablen Rechenzeiten führen.Knowledge of the haplotype pair of each individual is of particular interest for research into the genetic causes of complex diseases. If a sample of genotypes is obtained from a population using a selection principle, the associated haplotype pairs cannot the basis of the HWE are estimated because the requirements for the validity of the HWE are not met. The number of possible haplotypes can be very large and lead to unacceptable computing times.
Das erläuterte Problem der Haplotypenvorhersage oder -Zuordnung zu Genotypen einer ausgewählten Menge von Organismen besteht nicht nur in der Humangenetik. Die Haplotypenschätzung ist allgemein in allen Gebieten der Biologie, Medizin oder Agrarwirt- schaft von Bedeutung, in denen diploide Organismen betrachtet werden.The explained problem of haplotype prediction or assignment to genotypes of a selected set of organisms is not only human genetics. Haplotype estimation is generally important in all areas of biology, medicine or agriculture where diploid organisms are considered.
Die Aufgabe der Erfindung ist es, Verfahren zur Verarbeitung von Gensequen'zen diploider Organismen anzugeben, mit denen innerhalb praktikabler Rechenzeiten Haplotypenpaare für die einzelnen Genotypen innerhalb einer aus einer Population ausgewählten Menge von Genotypen ermittelt werden können. Die Aufgabe der Erfindung ist es auch, Vorrichtungen zur Umsetzung der Verfahren und neue Anwendungen anzugeben.The object of the invention is to provide methods for processing Gensequen 'zen diploid organisms, can be determined with which within practical computational times haplotype pairs for the individual genotypes within a selected from a population amount of genotypes. The object of the invention is also to provide devices for implementing the methods and new applications.
Diese Aufgabe wird mit Verfahren, Computerprogrammprodukten und Vorrichtungen mit den Merkmalen gemäß den Patentansprüchen 1, 3 bzw. 4 gelöst. Vorteilhafte Ausführungsformen und Anwendungen der Erfindung ergeben sich aus den abhängigen Ansprüchen.This object is achieved with methods, computer program products and devices with the features according to patent claims 1, 3 and 4. Advantageous embodiments and applications of the invention result from the dependent claims.
Die Grundidee der Erfindung ist es, mögliche Haplotypenpaare und ihre Wahrscheinlichkeit für jeden Genotypen einer Stichprobe, die nach einem beliebigen Auswahlprinzip aus einer Population gewonnen wurde, mittels Verfahren aus der Kombinatorik und Wahrscheinlichkeitstheorie zu bestimmen. Die Genotypen werden zweckmäßigerweise in Zahlenfolgen kodiert, deren Elemente jeweils für das Vorliegen oder Ni.chtvorliegen einer Mutation an der betrachteten Position der Gensequenz charakteristisch sind. Die Elemente der Zahlenfolgen können beispielsweise für die Genotypen die Zahlen -1, 0, 1 und 2 und für die Haplotypen die Zahlen 0 und 1 umfassen. Aus einer Vielzahl von theoretisch möglichen Haplotypen werden zunächst die kompatiblen Haplotypen, die zum jeweils betrachteten Genotypen beitragen könnten, ausgewählt.The basic idea of the invention is to determine possible haplotype pairs and their probability for each genotype of a sample, which was obtained from a population according to an arbitrary selection principle, using methods from combinatorics and probability theory. The genotypes are expediently coded in sequences of numbers, the elements of which are characteristic of the presence or absence of a mutation at the position of the gene sequence under consideration. The elements of the number sequences can be, for example, the numbers -1, 0, 1 and 2 for the genotypes and the numbers 0 and 1 for the haplotypes include. First, the compatible haplotypes that could contribute to the genotype under consideration are selected from a large number of theoretically possible haplotypes.
Anschließend werden aus den kompatiblen Haplotypenpaaren mit einem Schätzverfahren ähnlich dem Maximum-Likelihood-Prinzip ein wahrscheinlichstes Haplotypenpaar und ggf. weitere Haplotypenpaare ermittelt und gespeichert bzw. ausgegeben.Then, from the compatible haplotype pairs, an estimation method similar to the maximum likelihood principle is used to determine and save or output a most likely haplotype pair and possibly further haplotype pairs.
Das erfindungsgemäße Verfahren besitzt den Vorteil, die in Frage kommenden Haplotypen als apriori unbekannte Parameter anzusehen und ihre Schätzungen auf der Grundlage eines statistischen Konzeptes vorzunehmen.The method according to the invention has the advantage of considering the haplotypes in question as a priori unknown parameters and making their estimates on the basis of a statistical concept.
Ein weiterer wichtiger Vorteil besteht darin, dass der Rechenaufwand erheblich reduziert und damit eine Datenverarbeitung auch für große Datensätze innerhalb praktikabler Rechenzeiten ermöglicht wird.Another important advantage is that the computational effort is considerably reduced and data processing is also possible for large data sets within practical computing times.
Das erfindungsgemäß eingeführte Schätzverfahren zur Schätzung der Haplotypfrequenzen und dementsprechend zur Schätzung der für die einzelnen Genotypen wahrscheinlichsten Haplotypenpaare stellt eine gut begründete Schätzung für die konkrete Stichprobe dar, die bei Gültigkeit des HWE und für ausreichend große Stichproben mit den Schätzungen nach den üblichen Methoden für eine Population, die das HWE erfüllt, übereinstimmen.The estimation method introduced according to the invention for estimating the haplotype frequencies and accordingly for estimating the most likely haplotype pairs for the individual genotypes represents a well-founded estimate for the concrete sample, which, when the HWE is valid and for sufficiently large samples, with the estimates according to the usual methods for a population that the HWE meets.
Weitere Vorteile und Einzelheiten des Erfindung werden im folgenden unter Bezug auf die einzelnen Schritte des erfindungsgemäßen Verfahrens (insbesondere die mathematische Beschreibung des Schätzverfahrens) , eine erfindungsgemäße Vorrichtung und ein Beispiel erläutert. Verfahren zur HaplotypenvorhersageFurther advantages and details of the invention are explained below with reference to the individual steps of the method according to the invention (in particular the mathematical description of the estimation method), a device according to the invention and an example. Haplotype prediction method
Das erfindungsgemäße Verfahren zur Verarbeitung von Gensequenzen diploider Organismen zur Haplotypenvorhersage für einzelne Genotypen aus einer Probandenstichprobe umfasst die im folgenden erläuterten Schritte.The method according to the invention for processing gene sequences of diploid organisms for haplotype prediction for individual genotypes from a sample of subjects comprises the steps explained below.
1. Schritt: DatenbereitstellungStep 1: data provision
Es wird innerhalb einer Population eine Probandenstichprobe betrachtet . Zunächst erfolgt eine Bereitstellung der zu verarbeitenden Daten durch Konversion der zuvor an den Probanden ermittelten Gensequenzen der betrachteten Genotypen in Zahlenfolgen und die Berechnung von Genotypfrequenzen f(g).A sample of volunteers is considered within a population. First of all, the data to be processed is provided by converting the gene sequences of the genotypes considered previously determined on the test subjects into numerical sequences and the calculation of genotype frequencies f (g).
l.a) Darstellung der Geno- und Haplotypen als Zahlenfolgen Die Genotypen werden als Sequenzen von Symbolen dargestellt. Hier hat es sich als vorteilhaft erwiesen, die Zahlen -1, 0, 1 und 2 zu benutzen. Diese haben an den jeweiligen Positionen der entsprechenden Gensequenzen die folgende Bedeutung:l.a) Representation of the genotypes and haplotypes as sequences of numbers The genotypes are represented as sequences of symbols. It has proven advantageous here to use the numbers -1, 0, 1 and 2. These have the following meaning at the respective positions of the corresponding gene sequences:
-1 <→ Variante nicht spezifiziert-1 < → variant not specified
0 <→ homozygot, nicht mutiert (beide Gene identisch mit einem „Standard")0 <→ homozygous, not mutated (both genes identical to a "standard")
1 <→ heterozygot, ein Gen mutiert1 <→ heterozygous, one gene mutated
2 <→ homozygot, beide Gene mutiert2 <→ homozygous, both genes mutated
Der zum Vergleich herangezogene Standard entspricht in der Regel der ersten in einer öffentlichen Datenbank publizierten Sequenz des Gens .The standard used for comparison usually corresponds to the first sequence of the gene published in a public database.
Gemäß einem weiteren praktischen Gesichtspunkt der Erfindung werden die Haplotypen als Sequenzen dargestellt, die aus den Zahlen 1 und 0 bestehen und die gleiche Länge wie die betrachteten Genotypen aufweisen. Die Zahlenfolgen der Haplotypen zeigen an, wo eine Mutation vorliegt (1) bzw. wo keine Mutation vorliegt (0) .According to a further practical aspect of the invention, the haplotypes are represented as sequences which consist of the numbers 1 and 0 and have the same length as the genotypes under consideration. The number sequences of the haplotypes show where there is a mutation (1) or where there is no mutation (0).
Es werden die Daten aller theoretisch möglichen Haplotypen als 1/0-Zahlenfolgen der Länge 'der betrachteten Genotypen bereitgestellt. Zwischen den Zahlenfolgen der Genotypen und Haplotypen besteht der folgende Zusammenhang:The data of all theoretically possible haplotypes are provided as 1/0-number sequences of the length 'of the genotypes under consideration. The following relationship exists between the number sequences of the genotypes and haplotypes:
Genotyp HaplotypenGenotype haplotypes
-1 beide = 0; beide = 1 oder einer = 0, der andere =1 (Reihenfolge nicht festgelegt)-1 both = 0; both = 1 or the other = 1 (order not determined)
0 beide = 00 both = 0
1 einer = 0, der andere = 11 the other = 1
(Reihenfolge nicht festgelegt)(Order not determined)
2 beide = 12 both = 1
Für die Zahlenfolgen der Genotypen und Haplotypen wird die folgende Arithmetik unter Verwendung der Bezeichnungen x, y für die Haplotypen und g für den Genotypen eingeführt. Das einem Genotypen g entsprechende Haplotypenpaar (x, y) erfüllt die Gleichung:For the numerical sequences of the genotypes and haplotypes, the following arithmetic is introduced using the designations x, y for the haplotypes and g for the genotypes. The haplotype pair (x, y) corresponding to a genotype g fulfills the equation:
g = x + y (1)g = x + y (1)
Dabei ist die Addition positionsweise und unter der folgenden Annahme in Bezug auf die nicht-negativen Positionen in g durchzuführen. Die Gleichheit bezieht sich nur auf die nichtnegativen Positionen g. An den ,,-l"-Positionen in g (sofern vorhanden) is das Ergebnis der Addition beliebig, d. h. es wird stets als Gleichheit gewertet.The addition is to be carried out item by item and under the following assumption with respect to the non-negative items in g. The equality relates only to the non-negative positions g. At the "- 1" positions in g (if available), the result of the addition is arbitrary, i.e. it is always counted as equality.
Die Gleichung (1) besitzt für einen gegebenen Genotypen g stets mindestens eine Lösung. Haplotypen (x) bzw. (y) , für die die Gleichung (1) überhaupt erfüllbar ist, werden als kompatible Haplotypen bezeichnet. Die kompatiblen Haplotypen umfassen also eine Menge von theoretisch möglichen Haplotypen, die zu dem Ge- notypen g beitragen können. Haplotypenpaare (x,y), die die Gleichung (1) erfüllen, werden als mit g kompatible Haplotypenpaare bezeichnet.Equation (1) always has at least one solution for a given genotype g. Haplotypes (x) and (y), for which equation (1) can be fulfilled at all, are called compatible haplotypes. The compatible haplotypes thus comprise a set of theoretically possible haplotypes that lead to the notyp g can contribute. Haplotype pairs (x, y) that satisfy equation (1) are said to be haplotype pairs compatible with g.
Falls g an einer bestimmten Position" keine 1 enthält, ist die Lösung der Gleichung (1) für ein bestimmtes x sogar eindeutig bestimmt. Somit entscheiden die heterozygoten (mutierten) Positionen und die nicht eindeutig bestimmten Positionen im Genotyp über die Kompliziertheit des Schätzproblems.If g does not contain "1" at a certain position, the solution of equation (1) is even uniquely determined for a certain x. Thus, the heterozygous (mutated) positions and the unclearly determined positions in the genotype determine the complexity of the estimation problem.
Für die Umsetzung des unten erläuterten Ξchätzprinzips ist die folgende Eigenschaft der Haplotypen von Bedeutung. Wenn ein Haplotyp x mit g kompatibel ist, so ist der zweite Haplotyp y gemäß Gleichung (1) eindeutig bestimmt. In zwei mit g kompatiblen Haplotypenpaaren gibt es also entweder keinen gemeinsamen Haplotypen oder die Paare sind identisch.The following property of the haplotypes is important for the implementation of the estimation principle explained below. If one haplotype x is compatible with g, the second haplotype y is uniquely determined according to equation (1). In two haplotype pairs compatible with g, there is either no common haplotype or the pairs are identical.
l.b) Berechnung von Genotypfrequenzen f (g)l.b) Calculation of genotype frequencies f (g)
Die Menge der in der Probandenstichprobe festgestellten verschiedenen Genotypen wird mit G bezeichnet. Jeder der Genotypen g kann in der Probandengruppe mehrfach vorkommen, seine relative Häufigkeit wird mit f (g) bezeichnet.The number of different genotypes found in the sample of subjects is designated by G. Each of the genotypes g can occur several times in the test group, its relative frequency is designated by f (g).
2. Schritt: Bestimmung der kompatiblen HaplotypenStep 2: Determine the compatible haplotypes
Es werden alle der zuvor bereitgestellten theoretischen Haplotypen dahingehend getestet, ob sie mit einem oder mehreren der Genotypen der betrachteten Probandengruppe kompatibel sind. Jeder Haplotyp, der mit mindestens einem der Genotypen kompatibel ist, wird gespeichert. Die so gebildete Teilmenge von Haplotypen aus der betrachteten Gesamtheit ist aufgrund des Konstruktionsverfahrens irreduzibel, d.h. es sind keine Haplotypen mehrfach enthalten. Bei dieser Auswahl erfolgt eine Reduzierung der zu verarbeitenden Daten. Es wird für jeden Genotypen g e G die Menge H(g) der mit g kompatiblen Haplotypen x gebildet. Die Menge H(g) umfasst also alle die Haplotypen x, für die es mindestens einen zweiten Haplotypen y gibt, so dass die Gleichung (1) erfüllt ist. Die Anzahl dieser Haplotypen wird mit N(g) bezeichnet." Zur Schätzung der Haplotypen wird ein zum HWE alternatives Schätzprinzip (siehe unten) eingeführt .All of the previously provided theoretical haplotypes are tested to determine whether they are compatible with one or more of the genotypes of the group of subjects under consideration. Any haplotype that is compatible with at least one of the genotypes is saved. The subset of haplotypes thus formed from the totality considered is irreducible due to the construction method, ie no haplotypes are contained more than once. With this selection, the data to be processed is reduced. The quantity H (g) of the haplotypes x compatible with g is formed for each genotype ge G. The set H (g) thus includes all the haplotypes x for which there is at least one second haplotype y, so that equation (1) is satisfied. The number of these haplotypes is denoted by N (g). "To estimate the haplotypes, an estimation principle that is alternative to the HWE is introduced (see below).
Für einen vorgegebenen Genotypen g und einen Haplotypen x wird die apriori-Wahrscheinlichkeit h(g,x) dafür, dass x ein Haplotyp von g ist (x kompatibel mit g ist) , nach einem bestimmten Verteilungsprinzip berechnet:For a given genotype g and a haplotype x, the a priori probability h (g, x) that x is a haplotype of g (x is compatible with g) is calculated according to a certain distribution principle:
h(g,x) = 1/N(g) für alle x aus H(g) und h(g,x) = 0 in allen anderen Fällen.h (g, x) = 1 / N (g) for all x from H (g) and h (g, x) = 0 in all other cases.
Im Anwendungsbeispiel wird also als Verteilungsprinzip die Gleichverteilung verwendet. Ersatzweise könnten auch andere Verteilungen, beispielsweise solche, die den Genotypen auf Grund von a priori Informationen unterschiedliche Gewichte zuordnen, verwendet werden.In the application example, the equal distribution is used as the distribution principle. Alternatively, other distributions could be used, for example those that assign different weights to the genotypes based on a priori information.
3. Schritt : Bestimmung der kompatiblen HaplotypenpaareStep 3: Determine the compatible haplotype pairs
Es werden für jeden Genotypen g diejenigen Haplotypenpaare selektiert, die mit g kompatibel sind. Diese werden abgespeichert.For each genotype g those haplotype pairs are selected which are compatible with g. These are saved.
4. Schritt: HaplotypenvorhersageStep 4: haplotype prediction
4.a) Schätzverfahren4.a) Estimation procedure
Für jedes kompatible Paar (x, y) von Haplotypen werden die folgenden Wahrscheinlichkeitswerte P(x), P(y) und p(x,y) berechnet.For each compatible pair (x, y) of haplotypes, the following probability values P (x), P (y) and p (x, y) are calculated.
Die Wahrscheinlichkeit, den Haplotypen x bei einem beliebig aus der Stichprobe ausgewählten Probanden (ohne Kenntnis seines Ge- notyps) vorzufinden, wird durch die folgende Mittelung über die Stichprobe berechnet:The probability of finding the haplotype x in a randomly selected sample from the sample (without knowing its notyps) is calculated using the following sample averaging:
P(x) Σ <σgeeGG f(g) • h(g,x)P (x) Σ <σgeeGG f (g) • h (g, x)
P(y) wird entsprechend mit h(g,y) berechnet.P (y) is calculated accordingly with h (g, y).
Danach werden die Wahrscheinlichkeiten p(x,y) dafür, dass ein Proband mit einem Genotypen g die Haplotypen x und y besitzt, berechnet. Mit der Wahrscheinlichkeit p(x,y) wird der Stellenwert dieses Haplotypenpaares (x,y) für den Genotypen g bewertet. Die Wahrscheinlichkeiten p(x,y) können nach verschiedenen Modellannahmen berechnet werden. Beispielsweise kann die Berechnung von p gemäßThen the probabilities p (x, y) for a test subject with a genotype g having the haplotypes x and y are calculated. The significance of this haplotype pair (x, y) for the genotype g is evaluated with the probability p (x, y). The probabilities p (x, y) can be calculated using various model assumptions. For example, the calculation of p according to
p(x,y) = P(x) • P(y) erfolgen.p (x, y) = P (x) • P (y).
Für die Anwendung des erfindungsgemäßen Verfahrens ist eine Normierung bei der Berechnung von p(x,y) nicht zwingend erforderlich, die Summe der p-Werte für einen bestimmten Genotypen kann also vom Wert 1 abweichen.For the application of the method according to the invention, normalization is not absolutely necessary when calculating p (x, y); the sum of the p values for a specific genotype can therefore deviate from the value 1.
4.b) Vorhersage4.b) Prediction
Für jeden Genotyp werden die Haplotypenpaare (x,y), die eine vorgegebene Mindestwahrscheinlichkeit überschreiten, angegeben, angezeigt bzw. gespeichert.For each genotype, the haplotype pairs (x, y) that exceed a specified minimum probability are specified, displayed or saved.
Vorrichtung zur HaplotypenvorhersageDevice for haplotype prediction
Das erfindungsgemäße Verfahren zur Verarbeitung von Gensequenzen diploider Organismen kann anwendungsabhängig in den verschiedensten Formen, z. B. mit Hilfe eines Computerprogramms oder einer Vorrichtung zur Verarbeitung von Gensequenzen, umgesetzt werden. Die Vorrichtung zur Verarbeitung von Gensequenzen umfasst insbesondere eine Datenkonversionseinrichtung zur Bereitstellung einer der o. g. Genotyp- und Haplotyp-Datenfolgen zu einer Gruppe von Genotypen bzw. Haplotypen, eine "Auswahleinrichtung zur Ermittlung der kompatiblen Haplotypen und der kompatiblen Haplotypenpaare, eine Recheneinrichtung zur Durchführung des Schätzverfahrens, und eine Ausgabeeinrichtung zur Speicherung, Ausgabe und/oder Anzeige der geschätzten Haplotypenpaare. Die Vorrichtung kann durch einen Computer oder eine speziell mit den genannten Einrichtungen ausgestattete Schaltkreis- und Speicheranordnung gebildet werden.The method according to the invention for processing gene sequences of diploid organisms can, depending on the application, be in a wide variety of forms, e.g. B. with the help of a computer program or a device for processing gene sequences. The device for processing of genetic sequences comprises in particular a data conversion means for providing one of the above genotype and haplotype data strings to a group of genotypes or haplotypes, a 'selecting means for determining the compatible haplotypes and the compatible haplotype pairs, a calculating device for performing the estimation method, and an output device for storing, outputting and / or displaying the estimated haplotype pairs The device can be formed by a computer or a circuit and memory arrangement specially equipped with said devices.
Beispielexample
Es wird von einer Beispielstichprobe mit insgesamt 24 Probanden (Pl bis P24) ausgegangen, die 18 verschiedene Genotypen enthält. Der entsprechende Datensatz, der die Genotypen der Probanden enthält, ist im folgenden dargestellt:A sample sample with a total of 24 subjects (Pl to P24) is assumed, which contains 18 different genotypes. The corresponding data set, which contains the genotypes of the test subjects, is shown below:
Pl 1 0 0 2 2 2 0 1 0 1 1 2 1 2 Genotyp 1Pl 1 0 0 2 2 2 0 1 0 1 1 2 1 2 genotype 1
P2 1 1 0 2 2 2 0 1 0 1 0 2 1 1 Genotyp 2P2 1 1 0 2 2 2 0 1 0 1 0 2 1 1 genotype 2
P3 1 1 -1 2 2 2 0 0 0 0 0 0 0 1 Genotyp 3P3 1 1 -1 2 2 2 0 0 0 0 0 0 0 1 genotype 3
P4 0 2 0 2 2 2 0 0 0 0 0 0 0 1 Genotyp 4P4 0 2 0 2 2 2 0 0 0 0 0 0 0 1 genotype 4
P5 2 0 1 2 2 2 0 1 0 1 0 1 1 1 Genotyp 5P5 2 0 1 2 2 2 0 1 0 1 0 1 1 1 genotype 5
P6 1 1 0 2 2 2 0 0 0 0 -1 0 0 1 Genotyp 6P6 1 1 0 2 2 2 0 0 0 0 -1 0 0 1 genotype 6
P7 0 2 0 1 2 2 0 1 1 0 0 0 0 1 Genotyp 7P7 0 2 0 1 2 2 0 1 1 0 0 0 0 1 genotype 7
P8 1 1 0 2 2 2 0 1 0 1 0 2 1 1 Genotyp 2P8 1 1 0 2 2 2 0 1 0 1 0 2 1 1 genotype 2
P9 2 0 1 2 -1 2 0 1 0 1 0 1 1 1 Genotyp 8P9 2 0 1 2 -1 2 0 1 0 1 0 1 1 1 genotype 8
P10 1 1 1 2 2 2 0 0 0 0 0 0 1 2 Genotyp 9P10 1 1 1 2 2 2 0 0 0 0 0 0 1 2 genotype 9
Pll 1 1 1 1 1 2 0 1 0 1 0 1 1 2 Genotyp 10Pll 1 1 1 1 1 2 0 1 0 1 0 1 1 2 genotype 10
P12 0 2 0 1 2 2 0 1 1 0 0 0 0 1 Genotyp 7P12 0 2 0 1 2 2 0 1 1 0 0 0 0 1 genotype 7
P13 1 1 0 2 2 2 0 -1 0 1 0 1 1 2 Genotyp 11P13 1 1 0 2 2 2 0 -1 0 1 0 1 1 2 genotype 11
P14 0 2 0 1 2 2 0 1 1 0 0 0 0 1 Genotyp 7P14 0 2 0 1 2 2 0 1 1 0 0 0 0 1 genotype 7
P15 0 2 0 2 2 2 0 0 0 0 0 0 0 1 Genotyp 4 P16 1 1 0 -1 2 2 0 1 0 1 0 1 1 1 Genotyp 12P15 0 2 0 2 2 2 0 0 0 0 0 0 0 1 genotype 4 P16 1 1 0 -1 2 2 0 1 0 1 0 1 1 1 genotype 12
P17 2 0 1 2 2 2 0 1 0 1 0 1 1 -1 Genotyp 13P17 2 0 1 2 2 2 0 1 0 1 0 1 1 -1 genotype 13
P18 0 2 0 0 1 0 2 2 2 0 0 0 0 0 Genotyp 14P18 0 2 0 0 1 0 2 2 2 0 0 0 0 0 genotype 14
P19 1 1 0 2 2 2 0 1 0 1 0 2 1 1 Genotyp 2P19 1 1 0 2 2 2 0 1 0 1 0 2 1 1 genotype 2
P20 0 2 0 2 2 2 0 0' 0 0 0 0 0 1 Genotyp 4P20 0 2 0 2 2 2 0 0 '0 0 0 0 0 1 genotype 4
P21 1 1 0 1 2 1 1 2 1 1 0 2 0 0 Genotyp 15P21 1 1 0 1 2 1 1 2 1 1 0 2 0 0 genotype 15
P22 1 1 0 1 2 1 1 2 1 0 0 2 1 1 Genotyp 16P22 1 1 0 1 2 1 1 2 1 0 0 2 1 1 genotype 16
P23 1 1 0 1 -l -l 1 2 1 0 0 2 1 1 Genotyp 17P23 1 1 0 1 -l -l 1 2 1 0 0 2 1 1 genotype 17
P24 1 1 0 2 2 2 0 1 0 0 1 0 0 1 Genotyp 18P24 1 1 0 2 2 2 0 1 0 0 1 0 0 1 genotype 18
Das Ergebnis der Berechnung, das auf dem oben beschriebenen Verfahren beruht, sieht wie folgt aus:The result of the calculation, which is based on the method described above, looks as follows:
Genotyp 1: 1 0 0 2 2 2 0 1 0 1 1 2 1 2Genotype 1: 1 0 0 2 2 2 0 1 0 1 1 2 1 2
0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 0 1 0 1 1.000 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 0 1 0 1 1.00
0 0 0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.000 0 0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.00
0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0.960 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0.96
Genotyp 2: 1 1 0 2 2 2 0 1 0 1 0 2 1 1Genotype 2: 1 1 0 2 2 2 0 1 0 1 0 2 1 1
0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.000 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.00
0 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0.910 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0.91
0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0.880 1 0 1 1 1 0 1 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0.88
Genotyp 3: 1 1 -1 2 2 2 0 0 0 0 0 0 0 1Genotype 3: 1 1 -1 2 2 2 0 0 0 0 0 0 0 1
0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1.00
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.590 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.59
Genotyp 4: 0 2 0 2 2 2 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.00Genotype 4: 0 2 0 2 2 2 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.00
Genotyp 5: 2 0 1 2 2 2 0 1 0 1 0 1 1 1Genotype 5: 2 0 1 2 2 2 0 1 0 1 0 1 1 1
1 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 V 1 1 1 0 0 0 0 0 0 0 1 1.001 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 V 1 1 1 0 0 0 0 0 0 0 1 1.00
1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.73 Genotyp 6: 1 1 0 2 2 2 0 0 0 0 -1 0 0 11 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.73 Genotype 6: 1 1 0 2 2 2 0 0 0 0 -1 0 0 1
0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1.00
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 Ö 0 0 0 0 2 1 0 0"1 1 1 0 0 0 0 1 0 0 0 0.350 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 Ö 0 0 0 0 2 1 0 0 " 1 1 1 0 0 0 0 1 0 0 0 0.35
Genotyp 7: 0 2 0 1 2 2 0 1 1 0 0 0 0 1Genotype 7: 0 2 0 1 2 2 0 1 1 0 0 0 0 1
0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.000 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.00
0 1 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0.96 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 0.120 1 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0.96 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 0.12
Genotyp 8: 2 0 1 2 -1 2 0 1 0 1 0 1 1 1Genotype 8: 2 0 1 2 -1 2 0 1 0 1 0 1 1 1
1 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.001 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.00
1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.731 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.73
Genotyp 9: 1 1 1 2 2 2 0 0 0 0 0 0 1 2Genotype 9: 1 1 1 2 2 2 0 0 0 0 0 0 1 2
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1.00
0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0.090 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0.09
0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 0.080 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 0.08
Genotyp 10: 1 1 1 1 1 2 0 1 0 1 0 1 1 2Genotype 10: 1 1 1 1 1 2 0 1 0 1 0 1 1 2
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1 1 1.00
0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0.500 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0.50
0 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0.500 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0.50
Genotyp 11: 1 1 0 2 2 2 0 -1 0 1 0 1 1 2Genotype 11: 1 1 0 2 2 2 0 -1 0 1 0 1 1 2
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.00
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.00
0 1 0 1 1 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.120 1 0 1 1 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.12
Genotyp 12: 1 1 0 -1 2 2 0 1 0 1 0 1 1 1Genotype 12: 1 1 0 -1 2 2 0 1 0 1 0 1 1 1
0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.00
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0.68 0 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.12 Genotyp 13: 2 0 1 2 2 2 0 1 0 1 0 1 1 -10 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0.68 0 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.12 Genotype 13: 2 0 1 2 2 2 0 1 0 1 0 1 1 -1
1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.001 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.00
1 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.651 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.65
1 0 0 1 1 1 0 Ϊ Ö 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 0.621 0 0 1 1 1 0 Ϊ Ö 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 0.62
Genotyp 14: 0 2 0 0 1 0 2 2 2 0 0 0 0 0Genotype 14: 0 2 0 0 1 0 2 2 2 0 0 0 0 0
0 1 0 0 0 0 1 1 1 0 .0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 1.000 1 0 0 0 0 1 1 1 0 .0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 1.00
Genotyp 15: 1 1 0 1 2 1 1 2 1 1 0 2 0 0Genotype 15: 1 1 0 1 2 1 1 2 1 1 0 2 0 0
0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 0 1.000 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 0 1.00
0 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 0.610 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 0.61
0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0.600 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0.60
Genotyp 16: 1 1 0 1 2 1 1 2 1 0 0 2 1 1Genotype 16: 1 1 0 1 2 1 1 2 1 0 0 2 1 1
0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.000 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.00
0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0.730 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0.73
0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0.510 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0.51
Genotyp 17: 1 1 0 1 -1 -1 1 2 1 0 0 2 1 1Genotype 17: 1 1 0 1 -1 -1 1 2 1 0 0 2 1 1
0 1 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.000 1 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.00
0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.000 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.00
0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0.730 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0.73
Genotyp 18: 1 1 0 2 2 2 0 1 0 0 1 0 0 1Genotype 18: 1 1 0 2 2 2 0 1 0 0 1 0 0 1
0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1.00
0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 1 0.960 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 1 0.96
0 1 0 1 1 ϊ 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 0.370 1 0 1 1 ϊ 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 0.37
Die kompatiblen Haplotypenpaare wurden zunächst nach der Größe der Wahrscheinlichkeiten p(x,y) geordnet. Es werden aber jeweils nur maximal die 3 wahrscheinlichsten davon hier angegeben. Die letzte Spalte zeigt die Werte dieser Wahrscheinlichkeiten relativ zum jeweiligen Maximalwert. The compatible haplotype pairs were initially ordered according to the size of the probabilities p (x, y). However, only a maximum of the 3 most likely of them are given here. The last column shows the values of these probabilities relative to the respective maximum value.
Claims
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2002218225A AU2002218225A1 (en) | 2000-10-11 | 2001-10-10 | Method and device for predicting haplotypes |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE10050361A DE10050361A1 (en) | 2000-10-11 | 2000-10-11 | Statistical processing of gene sequences to determine possible haplotypes and their probability, useful e.g. for identifying genetic origins of complex diseases |
| DE10050361.6 | 2000-10-11 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2002031188A2 true WO2002031188A2 (en) | 2002-04-18 |
| WO2002031188A3 WO2002031188A3 (en) | 2003-10-09 |
Family
ID=7659419
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2001/011726 WO2002031188A2 (en) | 2000-10-11 | 2001-10-10 | Method and device for predicting haplotypes |
Country Status (3)
| Country | Link |
|---|---|
| AU (1) | AU2002218225A1 (en) |
| DE (1) | DE10050361A1 (en) |
| WO (1) | WO2002031188A2 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7107155B2 (en) | 2001-12-03 | 2006-09-12 | Dnaprint Genomics, Inc. | Methods for the identification of genetic features for complex genetics classifiers |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| AU3438699A (en) * | 1998-04-21 | 1999-11-08 | Genset | Biallelic markers for use in constructing a high density disequilibrium map of the human genome |
| US20020077775A1 (en) * | 2000-05-25 | 2002-06-20 | Schork Nicholas J. | Methods of DNA marker-based genetic analysis using estimated haplotype frequencies and uses thereof |
| GB0021667D0 (en) * | 2000-09-04 | 2000-10-18 | Glaxo Group Ltd | Genetic study |
-
2000
- 2000-10-11 DE DE10050361A patent/DE10050361A1/en not_active Withdrawn
-
2001
- 2001-10-10 AU AU2002218225A patent/AU2002218225A1/en not_active Abandoned
- 2001-10-10 WO PCT/EP2001/011726 patent/WO2002031188A2/en active Application Filing
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7107155B2 (en) | 2001-12-03 | 2006-09-12 | Dnaprint Genomics, Inc. | Methods for the identification of genetic features for complex genetics classifiers |
Also Published As
| Publication number | Publication date |
|---|---|
| AU2002218225A1 (en) | 2002-04-22 |
| DE10050361A1 (en) | 2002-04-18 |
| WO2002031188A3 (en) | 2003-10-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Abney et al. | Estimation of variance components of quantitative traits in inbred populations | |
| EP3621080B1 (en) | Reducing error in predicted genetic relationships | |
| Campos et al. | The effects on neutral variability of recurrent selective sweeps and background selection | |
| Elyashiv et al. | Shifts in the intensity of purifying selection: an analysis of genome-wide polymorphism data from two closely related yeast species | |
| US20170337483A1 (en) | Trait prediction model creation method and trait prediction method | |
| Ozturk | Estimation of population mean and total in a finite population setting using multiple auxiliary variables | |
| CN116756510B (en) | Analytical method, system and storage medium capable of controlling kinship correlation in samples | |
| CN108913776A (en) | Chemicotherapy damages the screening technique and kit of relevant DNA molecular marker | |
| CN117409860A (en) | A polygene genetic risk score calculation method and system based on tissue-specific regulatory network maps | |
| Bisschop et al. | Sweeps in time: leveraging the joint distribution of branch lengths | |
| CN118824377A (en) | Gene and disease association analysis method, device, computer equipment and storage medium | |
| WO2002031188A2 (en) | Method and device for predicting haplotypes | |
| WO2007079875A2 (en) | Method for identifying predictive biomarkers from patient data | |
| US20060025929A1 (en) | Method of determining a genetic relationship to at least one individual in a group of famous individuals using a combination of genetic markers | |
| Setakis | Statistical analysis of the GAMES studies | |
| DE60023496T2 (en) | MATHEMATICAL ANALYSIS FOR THE ESTIMATION OF CHANGES IN THE LEVEL OF GENE EXPRESSION | |
| Satagopan | A Markov chain Monte Carlo approach to detect polygene loci for complex traits | |
| WO2022069162A1 (en) | Determining comparable patients on the basis of ontologies | |
| DE112013002565T5 (en) | Minimization of information content data by using a hierarchy of reference genomes | |
| Parker Gaddis et al. | Genomic prediction of disease occurrence using producer-recorded health data: a comparison of methods | |
| DE102005015000A1 (en) | Method and system for analyzing array-based comparative hybridization data | |
| DE102006031979A1 (en) | Method for determining the behavior of a biological system after a reversible disorder | |
| Wang | Genetic and geographic diversity of Gyr (Bos Indicus) cattle in Brazil | |
| Musa | A similarity matrix and its application in genomic selection for hedging haplotype diversity | |
| DE102023105888A1 (en) | Method for identifying a candidate, namely a gene locus and/or a sequence variant, which is indicative of at least one (phenotypic) characteristic |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: COMMUNICATION PURSUANT TO RULE 69 EPC (EPO FORM 1205A OF 110803) |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |