WO2002031188A2

WO2002031188A2 - Method and device for predicting haplotypes

Info

Publication number: WO2002031188A2
Application number: PCT/EP2001/011726
Authority: WO
Inventors: Willi Schmidt
Original assignee: Genprofile Ag
Priority date: 2000-10-11
Filing date: 2001-10-10
Publication date: 2002-04-18
Also published as: AU2002218225A1; DE10050361A1; WO2002031188A3

Abstract

The invention relates to a method for processing gene sequences of diploid organisms. According to said method, for each observed genotype g from a predetermined proband group, the quantity of the possible haplotype pairs (x, y) and the probability of their occurrence is predicted and the following steps are carried out: determination of the haplotypes x from the totality of haplotypes that are compatible with the genotype g; determination of pairs (x, y) of g-compatible haplotypes, of which the genotype g can be composed; determination of the probabilities h(g, x) of the haplotype pairs by a statistical estimation procedure and the output, storage and/or display of at least one haplotype pair, which has a predetermined minimum probability.

Description

Verfahren und Vorrichtung zur Haplotypenvorhersage Method and device for haplotype prediction

Die Erfindung betrifft ein Verfahren zur Bearbeitung von Gensequenz-Datenfolgen, insbesondere zur Vorhersage oder Zuordnung von je mindestens einem Haplotypenpaar zu all^'en Genotypen aus einer gegebenen Menge. Das Verfahren ist auf beliebige Mengen von Genotypen anwendbar, die aus diploiden Organismen ermittelt worden sind. Die Erfindung betrifft auch Vorrichtungen zur Durchführung und Anwendungen des Verfahrens .The invention relates to a method for the treatment of gene sequence data sequences, in particular to predict or assignment of each of at least one haplotype pair to all ^'s genotypes from a given amount. The method is applicable to any number of genotypes that have been determined from diploid organisms. The invention also relates to devices for carrying out and using the method.

Die Sequenzierung des menschlichen Genoms (Feststellung der Abfolge molekularer Bausteine (Nukleotide) , aus denen die DNA als Erbinformationsträger zusammengesetzt ist) , ist im wesentlichen abgeschlossen. Auch im menschlichen Genom ist die gesamte Erbinformation in doppelter Ausführung vorhanden. Jedes Individuum besitzt zwei Kopien jedes Gens, von denen jeweils eine von der Mutter und eine vom Vater stammen. Die Kopien müssen nicht identisch sein. Sie können sich an einigen Positionen in den Gensequenzen unterscheiden. Die unterschiedlichen Ausprägungen von Genen werden üblicherweise als Allele bezeichnet, und für das individuelle Paar von Allelen eines betrachteten Gens wird die Bezeichnung Genotyp verwendet. Das einzelne Allel - oder auch die Gesamtheit von Allelen mehrerer Gene, die ein Individuum von einem bestimmten Elternteil ererbt hat, wird als Haplotyp bezeichnet.The sequencing of the human genome (determination of the sequence of molecular building blocks (nucleotides) from which the DNA as a genetic information carrier is composed) has essentially been completed. The entire genetic information is also available in duplicate in the human genome. Each individual has two copies of each gene, one from the mother and one from the father. The copies do not have to be identical. They can differ in some positions in the gene sequences. The different forms of genes are usually referred to as alleles, and the term genotype is used for the individual pair of alleles of a gene in question. The single allele - or the entirety of alleles of several genes that an individual has inherited from a particular parent - is called the haplotype.

Es besteht ein Interesse, zu dem Genotyp eines Probanden (z.B. kranker Organismus, gesunder Organismus) das Haplotypenpaar zu erfassen, aus dem sich der betrachtete Genotyp zusammensetzt. Praktisch strebt man die Vorhersage aller Haplotypenpaare an, aus denen der Genotyp mit einer vorzugebenden Mindestwahrscheinlichkeit zusammengesetzt sein könnte. Es ist durch allgemein bekannte biotechnologische Verfahren möglich, zu einem gegebenen Genotyp das zugehörige Haplotypenpaar zu ermitteln. Diese individuelle Analyse ist jedoch wegen des hohen Arbeits- und Kostenaufwandes nachteilig.There is an interest in recording the genotype of a test subject (eg sick organism, healthy organism) the pair of haplotypes from which the genotype under consideration is composed. In practice, the aim is to predict all haplotype pairs from which the genotype could be composed with a minimum probability to be specified. It is possible to determine the associated haplotype pair for a given genotype using generally known biotechnological methods. However, this individual analysis is disadvantageous because of the high labor and cost.

Bisher war man im Zusammenhang mit populationsgenetischen Fragestellungen an der Schätzung von Haplotypfrequenzen in einer Population interessiert. Es sind verschiedene Methoden bekannt geworden (siehe: Terwilliger, J. , Ott, J. : Handbook for Human Ge- netic Linkage. Johns Hopkins üniversity Press, Baltimore, 1994; Hawley,M.E., Kidd,K.K.: J. Hered. 86: 409-411, 1995; Excoffier, L., Slatkin, M. : Mol. Biol. Evol. 12: 921-927, 1995), die Haplo- typfrequenzen aus einer repräsentativen Stichprobe von Genotypen aus einer Population rechnerisch zu bestimmen. Die einzige bisher bekannte Methode, individuelle Haplotypen aus Genotypen vorherzusagen, wurde von A. G. Clark entwickelt (Mol. Biol. Evol. 7: 111-122, 1990). Diese Methode hat jedoch zwei wesentliche Nachteile: sie ist rein empirisch, beruht nicht auf einem wohl begründeten statistischen Konzept und ist nur unter gewissen einschränkenden Voraussetzungen überhaupt anwendbar. Die am weitesten verbreiteten Schätzmethoden basieren sämtlich auf der Annahme des Hardy- einberg-Gleichgewichts (HWE) . Mit dem HWE wird ausgesagt, dass unter bestimmten Voraussetzungen, die insbesondere die zufällige Vereinigung der Allele innerhalb der Population und fehlende Mutationen sowie die Unendlichkeit und Abgeschlossenheit der Population betreffen, die Häufigkeiten der in der Population vorhandenen Genotypen bereits durch die Häufigkeiten der Haplotypen bestimmt sind und nach einer Generation konstant bleiben.So far, people have been interested in estimating haplotype frequencies in a population in connection with population genetic issues. Various methods have become known (see: Terwilliger, J., Ott, J.: Handbook for Human Genetic Linkage. Johns Hopkins University Press, Baltimore, 1994; Hawley, ME, Kidd, KK: J. Hered. 86: 409-411, 1995; Excoffier, L., Slatkin, M.: Mol. Biol. Evol. 12: 921-927, 1995) to computationally determine the haplo-type frequencies from a representative sample of genotypes from a population. The only known method to predict individual haplotypes from genotypes was developed by A.G. Clark (Mol. Biol. Evol. 7: 111-122, 1990). However, this method has two major disadvantages: it is purely empirical, is not based on a well-founded statistical concept and can only be used under certain restrictive conditions. The most widely used estimation methods are all based on the assumption of the Hardyeinberg equilibrium (HWE). With the HWE it is stated that under certain conditions, which concern in particular the random association of alleles within the population and missing mutations as well as the infinity and isolation of the population, the frequencies of the genotypes present in the population are already determined by the frequencies of the haplotypes and stay constant after a generation.

Für die Erforschung der genetischen Ursachen von komplexen Erkrankungen ist insbesondere auch die Kenntnis des Haplotypenpaa- res eines jeden Individuums von Interesse. Wenn eine Stichprobe von Genotypen mittels eines Auswahlprinzips aus einer Population gewonnen wird, können die zugehörigen Haplotypenpaare nicht auf der Basis des HWE geschätzt werden, da die Voraussetzungen für die Gültigkeit des HWE nicht erfüllt sind. Die Zahl der in Frage kommenden Haplotypen kann unter Umständen sehr groß sein und zu inakzeptablen Rechenzeiten führen.Knowledge of the haplotype pair of each individual is of particular interest for research into the genetic causes of complex diseases. If a sample of genotypes is obtained from a population using a selection principle, the associated haplotype pairs cannot the basis of the HWE are estimated because the requirements for the validity of the HWE are not met. The number of possible haplotypes can be very large and lead to unacceptable computing times.

Das erläuterte Problem der Haplotypenvorhersage oder -Zuordnung zu Genotypen einer ausgewählten Menge von Organismen besteht nicht nur in der Humangenetik. Die Haplotypenschätzung ist allgemein in allen Gebieten der Biologie, Medizin oder Agrarwirt- schaft von Bedeutung, in denen diploide Organismen betrachtet werden.The explained problem of haplotype prediction or assignment to genotypes of a selected set of organisms is not only human genetics. Haplotype estimation is generally important in all areas of biology, medicine or agriculture where diploid organisms are considered.

Die Aufgabe der Erfindung ist es, Verfahren zur Verarbeitung von Gensequen^'zen diploider Organismen anzugeben, mit denen innerhalb praktikabler Rechenzeiten Haplotypenpaare für die einzelnen Genotypen innerhalb einer aus einer Population ausgewählten Menge von Genotypen ermittelt werden können. Die Aufgabe der Erfindung ist es auch, Vorrichtungen zur Umsetzung der Verfahren und neue Anwendungen anzugeben.The object of the invention is to provide methods for processing Gensequen ^'zen diploid organisms, can be determined with which within practical computational times haplotype pairs for the individual genotypes within a selected from a population amount of genotypes. The object of the invention is also to provide devices for implementing the methods and new applications.

Diese Aufgabe wird mit Verfahren, Computerprogrammprodukten und Vorrichtungen mit den Merkmalen gemäß den Patentansprüchen 1, 3 bzw. 4 gelöst. Vorteilhafte Ausführungsformen und Anwendungen der Erfindung ergeben sich aus den abhängigen Ansprüchen.This object is achieved with methods, computer program products and devices with the features according to patent claims 1, 3 and 4. Advantageous embodiments and applications of the invention result from the dependent claims.

Die Grundidee der Erfindung ist es, mögliche Haplotypenpaare und ihre Wahrscheinlichkeit für jeden Genotypen einer Stichprobe, die nach einem beliebigen Auswahlprinzip aus einer Population gewonnen wurde, mittels Verfahren aus der Kombinatorik und Wahrscheinlichkeitstheorie zu bestimmen. Die Genotypen werden zweckmäßigerweise in Zahlenfolgen kodiert, deren Elemente jeweils für das Vorliegen oder Ni.chtvorliegen einer Mutation an der betrachteten Position der Gensequenz charakteristisch sind. Die Elemente der Zahlenfolgen können beispielsweise für die Genotypen die Zahlen -1, 0, 1 und 2 und für die Haplotypen die Zahlen 0 und 1 umfassen. Aus einer Vielzahl von theoretisch möglichen Haplotypen werden zunächst die kompatiblen Haplotypen, die zum jeweils betrachteten Genotypen beitragen könnten, ausgewählt.The basic idea of the invention is to determine possible haplotype pairs and their probability for each genotype of a sample, which was obtained from a population according to an arbitrary selection principle, using methods from combinatorics and probability theory. The genotypes are expediently coded in sequences of numbers, the elements of which are characteristic of the presence or absence of a mutation at the position of the gene sequence under consideration. The elements of the number sequences can be, for example, the numbers -1, 0, 1 and 2 for the genotypes and the numbers 0 and 1 for the haplotypes include. First, the compatible haplotypes that could contribute to the genotype under consideration are selected from a large number of theoretically possible haplotypes.

Anschließend werden aus den kompatiblen Haplotypenpaaren mit einem Schätzverfahren ähnlich dem Maximum-Likelihood-Prinzip ein wahrscheinlichstes Haplotypenpaar und ggf. weitere Haplotypenpaare ermittelt und gespeichert bzw. ausgegeben.Then, from the compatible haplotype pairs, an estimation method similar to the maximum likelihood principle is used to determine and save or output a most likely haplotype pair and possibly further haplotype pairs.

Das erfindungsgemäße Verfahren besitzt den Vorteil, die in Frage kommenden Haplotypen als apriori unbekannte Parameter anzusehen und ihre Schätzungen auf der Grundlage eines statistischen Konzeptes vorzunehmen.The method according to the invention has the advantage of considering the haplotypes in question as a priori unknown parameters and making their estimates on the basis of a statistical concept.

Ein weiterer wichtiger Vorteil besteht darin, dass der Rechenaufwand erheblich reduziert und damit eine Datenverarbeitung auch für große Datensätze innerhalb praktikabler Rechenzeiten ermöglicht wird.Another important advantage is that the computational effort is considerably reduced and data processing is also possible for large data sets within practical computing times.

Das erfindungsgemäß eingeführte Schätzverfahren zur Schätzung der Haplotypfrequenzen und dementsprechend zur Schätzung der für die einzelnen Genotypen wahrscheinlichsten Haplotypenpaare stellt eine gut begründete Schätzung für die konkrete Stichprobe dar, die bei Gültigkeit des HWE und für ausreichend große Stichproben mit den Schätzungen nach den üblichen Methoden für eine Population, die das HWE erfüllt, übereinstimmen.The estimation method introduced according to the invention for estimating the haplotype frequencies and accordingly for estimating the most likely haplotype pairs for the individual genotypes represents a well-founded estimate for the concrete sample, which, when the HWE is valid and for sufficiently large samples, with the estimates according to the usual methods for a population that the HWE meets.

Weitere Vorteile und Einzelheiten des Erfindung werden im folgenden unter Bezug auf die einzelnen Schritte des erfindungsgemäßen Verfahrens (insbesondere die mathematische Beschreibung des Schätzverfahrens) , eine erfindungsgemäße Vorrichtung und ein Beispiel erläutert. Verfahren zur HaplotypenvorhersageFurther advantages and details of the invention are explained below with reference to the individual steps of the method according to the invention (in particular the mathematical description of the estimation method), a device according to the invention and an example. Haplotype prediction method

Das erfindungsgemäße Verfahren zur Verarbeitung von Gensequenzen diploider Organismen zur Haplotypenvorhersage für einzelne Genotypen aus einer Probandenstichprobe umfasst die im folgenden erläuterten Schritte.The method according to the invention for processing gene sequences of diploid organisms for haplotype prediction for individual genotypes from a sample of subjects comprises the steps explained below.

1. Schritt: DatenbereitstellungStep 1: data provision

Es wird innerhalb einer Population eine Probandenstichprobe betrachtet . Zunächst erfolgt eine Bereitstellung der zu verarbeitenden Daten durch Konversion der zuvor an den Probanden ermittelten Gensequenzen der betrachteten Genotypen in Zahlenfolgen und die Berechnung von Genotypfrequenzen f(g).A sample of volunteers is considered within a population. First of all, the data to be processed is provided by converting the gene sequences of the genotypes considered previously determined on the test subjects into numerical sequences and the calculation of genotype frequencies f (g).

l.a) Darstellung der Geno- und Haplotypen als Zahlenfolgen Die Genotypen werden als Sequenzen von Symbolen dargestellt. Hier hat es sich als vorteilhaft erwiesen, die Zahlen -1, 0, 1 und 2 zu benutzen. Diese haben an den jeweiligen Positionen der entsprechenden Gensequenzen die folgende Bedeutung:l.a) Representation of the genotypes and haplotypes as sequences of numbers The genotypes are represented as sequences of symbols. It has proven advantageous here to use the numbers -1, 0, 1 and 2. These have the following meaning at the respective positions of the corresponding gene sequences:

-1 <_→ Variante nicht spezifiziert-1 < _→ variant not specified

0 <→ homozygot, nicht mutiert (beide Gene identisch mit einem „Standard")0 <→ homozygous, not mutated (both genes identical to a "standard")

1 <→ heterozygot, ein Gen mutiert1 <→ heterozygous, one gene mutated

2 <→ homozygot, beide Gene mutiert2 <→ homozygous, both genes mutated

Der zum Vergleich herangezogene Standard entspricht in der Regel der ersten in einer öffentlichen Datenbank publizierten Sequenz des Gens .The standard used for comparison usually corresponds to the first sequence of the gene published in a public database.

Gemäß einem weiteren praktischen Gesichtspunkt der Erfindung werden die Haplotypen als Sequenzen dargestellt, die aus den Zahlen 1 und 0 bestehen und die gleiche Länge wie die betrachteten Genotypen aufweisen. Die Zahlenfolgen der Haplotypen zeigen an, wo eine Mutation vorliegt (1) bzw. wo keine Mutation vorliegt (0) .According to a further practical aspect of the invention, the haplotypes are represented as sequences which consist of the numbers 1 and 0 and have the same length as the genotypes under consideration. The number sequences of the haplotypes show where there is a mutation (1) or where there is no mutation (0).

Es werden die Daten aller theoretisch möglichen Haplotypen als 1/0-Zahlenfolgen der Länge 'der betrachteten Genotypen bereitgestellt. Zwischen den Zahlenfolgen der Genotypen und Haplotypen besteht der folgende Zusammenhang:The data of all theoretically possible haplotypes are provided as 1/0-number sequences of the length 'of the genotypes under consideration. The following relationship exists between the number sequences of the genotypes and haplotypes:

Genotyp HaplotypenGenotype haplotypes

-1 beide = 0; beide = 1 oder einer = 0, der andere =1 (Reihenfolge nicht festgelegt)-1 both = 0; both = 1 or the other = 1 (order not determined)

0 beide = 00 both = 0

1 einer = 0, der andere = 11 the other = 1

(Reihenfolge nicht festgelegt)(Order not determined)

2 beide = 12 both = 1

Für die Zahlenfolgen der Genotypen und Haplotypen wird die folgende Arithmetik unter Verwendung der Bezeichnungen x, y für die Haplotypen und g für den Genotypen eingeführt. Das einem Genotypen g entsprechende Haplotypenpaar (x, y) erfüllt die Gleichung:For the numerical sequences of the genotypes and haplotypes, the following arithmetic is introduced using the designations x, y for the haplotypes and g for the genotypes. The haplotype pair (x, y) corresponding to a genotype g fulfills the equation:

g = x + y (1)g = x + y (1)

Dabei ist die Addition positionsweise und unter der folgenden Annahme in Bezug auf die nicht-negativen Positionen in g durchzuführen. Die Gleichheit bezieht sich nur auf die nichtnegativen Positionen g. An den ,,-l"-Positionen in g (sofern vorhanden) is das Ergebnis der Addition beliebig, d. h. es wird stets als Gleichheit gewertet.The addition is to be carried out item by item and under the following assumption with respect to the non-negative items in g. The equality relates only to the non-negative positions g. At the "- 1" positions in g (if available), the result of the addition is arbitrary, i.e. it is always counted as equality.

Die Gleichung (1) besitzt für einen gegebenen Genotypen g stets mindestens eine Lösung. Haplotypen (x) bzw. (y) , für die die Gleichung (1) überhaupt erfüllbar ist, werden als kompatible Haplotypen bezeichnet. Die kompatiblen Haplotypen umfassen also eine Menge von theoretisch möglichen Haplotypen, die zu dem Ge- notypen g beitragen können. Haplotypenpaare (x,y), die die Gleichung (1) erfüllen, werden als mit g kompatible Haplotypenpaare bezeichnet.Equation (1) always has at least one solution for a given genotype g. Haplotypes (x) and (y), for which equation (1) can be fulfilled at all, are called compatible haplotypes. The compatible haplotypes thus comprise a set of theoretically possible haplotypes that lead to the notyp g can contribute. Haplotype pairs (x, y) that satisfy equation (1) are said to be haplotype pairs compatible with g.

Falls g an einer bestimmten Position" keine 1 enthält, ist die Lösung der Gleichung (1) für ein bestimmtes x sogar eindeutig bestimmt. Somit entscheiden die heterozygoten (mutierten) Positionen und die nicht eindeutig bestimmten Positionen im Genotyp über die Kompliziertheit des Schätzproblems.If g does not contain "1" at a certain position, the solution of equation (1) is even uniquely determined for a certain x. Thus, the heterozygous (mutated) positions and the unclearly determined positions in the genotype determine the complexity of the estimation problem.

Für die Umsetzung des unten erläuterten Ξchätzprinzips ist die folgende Eigenschaft der Haplotypen von Bedeutung. Wenn ein Haplotyp x mit g kompatibel ist, so ist der zweite Haplotyp y gemäß Gleichung (1) eindeutig bestimmt. In zwei mit g kompatiblen Haplotypenpaaren gibt es also entweder keinen gemeinsamen Haplotypen oder die Paare sind identisch.The following property of the haplotypes is important for the implementation of the estimation principle explained below. If one haplotype x is compatible with g, the second haplotype y is uniquely determined according to equation (1). In two haplotype pairs compatible with g, there is either no common haplotype or the pairs are identical.

l.b) Berechnung von Genotypfrequenzen f (g)l.b) Calculation of genotype frequencies f (g)

Die Menge der in der Probandenstichprobe festgestellten verschiedenen Genotypen wird mit G bezeichnet. Jeder der Genotypen g kann in der Probandengruppe mehrfach vorkommen, seine relative Häufigkeit wird mit f (g) bezeichnet.The number of different genotypes found in the sample of subjects is designated by G. Each of the genotypes g can occur several times in the test group, its relative frequency is designated by f (g).

2. Schritt: Bestimmung der kompatiblen HaplotypenStep 2: Determine the compatible haplotypes

Es werden alle der zuvor bereitgestellten theoretischen Haplotypen dahingehend getestet, ob sie mit einem oder mehreren der Genotypen der betrachteten Probandengruppe kompatibel sind. Jeder Haplotyp, der mit mindestens einem der Genotypen kompatibel ist, wird gespeichert. Die so gebildete Teilmenge von Haplotypen aus der betrachteten Gesamtheit ist aufgrund des Konstruktionsverfahrens irreduzibel, d.h. es sind keine Haplotypen mehrfach enthalten. Bei dieser Auswahl erfolgt eine Reduzierung der zu verarbeitenden Daten. Es wird für jeden Genotypen g e G die Menge H(g) der mit g kompatiblen Haplotypen x gebildet. Die Menge H(g) umfasst also alle die Haplotypen x, für die es mindestens einen zweiten Haplotypen y gibt, so dass die Gleichung (1) erfüllt ist. Die Anzahl dieser Haplotypen wird mit N(g) bezeichnet." Zur Schätzung der Haplotypen wird ein zum HWE alternatives Schätzprinzip (siehe unten) eingeführt .All of the previously provided theoretical haplotypes are tested to determine whether they are compatible with one or more of the genotypes of the group of subjects under consideration. Any haplotype that is compatible with at least one of the genotypes is saved. The subset of haplotypes thus formed from the totality considered is irreducible due to the construction method, ie no haplotypes are contained more than once. With this selection, the data to be processed is reduced. The quantity H (g) of the haplotypes x compatible with g is formed for each genotype ge G. The set H (g) thus includes all the haplotypes x for which there is at least one second haplotype y, so that equation (1) is satisfied. The number of these haplotypes is denoted by N (g). "To estimate the haplotypes, an estimation principle that is alternative to the HWE is introduced (see below).

Für einen vorgegebenen Genotypen g und einen Haplotypen x wird die apriori-Wahrscheinlichkeit h(g,x) dafür, dass x ein Haplotyp von g ist (x kompatibel mit g ist) , nach einem bestimmten Verteilungsprinzip berechnet:For a given genotype g and a haplotype x, the a priori probability h (g, x) that x is a haplotype of g (x is compatible with g) is calculated according to a certain distribution principle:

h(g,x) = 1/N(g) für alle x aus H(g) und h(g,x) = 0 in allen anderen Fällen.h (g, x) = 1 / N (g) for all x from H (g) and h (g, x) = 0 in all other cases.

Im Anwendungsbeispiel wird also als Verteilungsprinzip die Gleichverteilung verwendet. Ersatzweise könnten auch andere Verteilungen, beispielsweise solche, die den Genotypen auf Grund von a priori Informationen unterschiedliche Gewichte zuordnen, verwendet werden.In the application example, the equal distribution is used as the distribution principle. Alternatively, other distributions could be used, for example those that assign different weights to the genotypes based on a priori information.

3. Schritt : Bestimmung der kompatiblen HaplotypenpaareStep 3: Determine the compatible haplotype pairs

Es werden für jeden Genotypen g diejenigen Haplotypenpaare selektiert, die mit g kompatibel sind. Diese werden abgespeichert.For each genotype g those haplotype pairs are selected which are compatible with g. These are saved.

4. Schritt: HaplotypenvorhersageStep 4: haplotype prediction

4.a) Schätzverfahren4.a) Estimation procedure

Für jedes kompatible Paar (x, y) von Haplotypen werden die folgenden Wahrscheinlichkeitswerte P(x), P(y) und p(x,y) berechnet.For each compatible pair (x, y) of haplotypes, the following probability values P (x), P (y) and p (x, y) are calculated.

Die Wahrscheinlichkeit, den Haplotypen x bei einem beliebig aus der Stichprobe ausgewählten Probanden (ohne Kenntnis seines Ge- notyps) vorzufinden, wird durch die folgende Mittelung über die Stichprobe berechnet:The probability of finding the haplotype x in a randomly selected sample from the sample (without knowing its notyps) is calculated using the following sample averaging:

P(x) Σ <σgeeGG f(g) ^• h(g,x)P (x) Σ <σgeeGG f (g) ^• h (g, x)

P(y) wird entsprechend mit h(g,y) berechnet.P (y) is calculated accordingly with h (g, y).

Danach werden die Wahrscheinlichkeiten p(x,y) dafür, dass ein Proband mit einem Genotypen g die Haplotypen x und y besitzt, berechnet. Mit der Wahrscheinlichkeit p(x,y) wird der Stellenwert dieses Haplotypenpaares (x,y) für den Genotypen g bewertet. Die Wahrscheinlichkeiten p(x,y) können nach verschiedenen Modellannahmen berechnet werden. Beispielsweise kann die Berechnung von p gemäßThen the probabilities p (x, y) for a test subject with a genotype g having the haplotypes x and y are calculated. The significance of this haplotype pair (x, y) for the genotype g is evaluated with the probability p (x, y). The probabilities p (x, y) can be calculated using various model assumptions. For example, the calculation of p according to

p(x,y) = P(x) ^• P(y) erfolgen.p (x, y) = P (x) ^• P (y).

Für die Anwendung des erfindungsgemäßen Verfahrens ist eine Normierung bei der Berechnung von p(x,y) nicht zwingend erforderlich, die Summe der p-Werte für einen bestimmten Genotypen kann also vom Wert 1 abweichen.For the application of the method according to the invention, normalization is not absolutely necessary when calculating p (x, y); the sum of the p values for a specific genotype can therefore deviate from the value 1.

4.b) Vorhersage4.b) Prediction

Für jeden Genotyp werden die Haplotypenpaare (x,y), die eine vorgegebene Mindestwahrscheinlichkeit überschreiten, angegeben, angezeigt bzw. gespeichert.For each genotype, the haplotype pairs (x, y) that exceed a specified minimum probability are specified, displayed or saved.

Vorrichtung zur HaplotypenvorhersageDevice for haplotype prediction

Das erfindungsgemäße Verfahren zur Verarbeitung von Gensequenzen diploider Organismen kann anwendungsabhängig in den verschiedensten Formen, z. B. mit Hilfe eines Computerprogramms oder einer Vorrichtung zur Verarbeitung von Gensequenzen, umgesetzt werden. Die Vorrichtung zur Verarbeitung von Gensequenzen umfasst insbesondere eine Datenkonversionseinrichtung zur Bereitstellung einer der o. g. Genotyp- und Haplotyp-Datenfolgen zu einer Gruppe von Genotypen bzw. Haplotypen, eine ^"Auswahleinrichtung zur Ermittlung der kompatiblen Haplotypen und der kompatiblen Haplotypenpaare, eine Recheneinrichtung zur Durchführung des Schätzverfahrens, und eine Ausgabeeinrichtung zur Speicherung, Ausgabe und/oder Anzeige der geschätzten Haplotypenpaare. Die Vorrichtung kann durch einen Computer oder eine speziell mit den genannten Einrichtungen ausgestattete Schaltkreis- und Speicheranordnung gebildet werden.The method according to the invention for processing gene sequences of diploid organisms can, depending on the application, be in a wide variety of forms, e.g. B. with the help of a computer program or a device for processing gene sequences. The device for processing of genetic sequences comprises in particular a data conversion means for providing one of the above genotype and haplotype data strings to a group of genotypes or haplotypes, a ^'selecting means for determining the compatible haplotypes and the compatible haplotype pairs, a calculating device for performing the estimation method, and an output device for storing, outputting and / or displaying the estimated haplotype pairs The device can be formed by a computer or a circuit and memory arrangement specially equipped with said devices.

Beispielexample

Es wird von einer Beispielstichprobe mit insgesamt 24 Probanden (Pl bis P24) ausgegangen, die 18 verschiedene Genotypen enthält. Der entsprechende Datensatz, der die Genotypen der Probanden enthält, ist im folgenden dargestellt:A sample sample with a total of 24 subjects (Pl to P24) is assumed, which contains 18 different genotypes. The corresponding data set, which contains the genotypes of the test subjects, is shown below:

Pl 1 0 0 2 2 2 0 1 0 1 1 2 1 2 Genotyp 1Pl 1 0 0 2 2 2 0 1 0 1 1 2 1 2 genotype 1

P2 1 1 0 2 2 2 0 1 0 1 0 2 1 1 Genotyp 2P2 1 1 0 2 2 2 0 1 0 1 0 2 1 1 genotype 2

P3 1 1 -1 2 2 2 0 0 0 0 0 0 0 1 Genotyp 3P3 1 1 -1 2 2 2 0 0 0 0 0 0 0 1 genotype 3

P4 0 2 0 2 2 2 0 0 0 0 0 0 0 1 Genotyp 4P4 0 2 0 2 2 2 0 0 0 0 0 0 0 1 genotype 4

P5 2 0 1 2 2 2 0 1 0 1 0 1 1 1 Genotyp 5P5 2 0 1 2 2 2 0 1 0 1 0 1 1 1 genotype 5

P6 1 1 0 2 2 2 0 0 0 0 -1 0 0 1 Genotyp 6P6 1 1 0 2 2 2 0 0 0 0 -1 0 0 1 genotype 6

P7 0 2 0 1 2 2 0 1 1 0 0 0 0 1 Genotyp 7P7 0 2 0 1 2 2 0 1 1 0 0 0 0 1 genotype 7

P8 1 1 0 2 2 2 0 1 0 1 0 2 1 1 Genotyp 2P8 1 1 0 2 2 2 0 1 0 1 0 2 1 1 genotype 2

P9 2 0 1 2 -1 2 0 1 0 1 0 1 1 1 Genotyp 8P9 2 0 1 2 -1 2 0 1 0 1 0 1 1 1 genotype 8

P10 1 1 1 2 2 2 0 0 0 0 0 0 1 2 Genotyp 9P10 1 1 1 2 2 2 0 0 0 0 0 0 1 2 genotype 9

Pll 1 1 1 1 1 2 0 1 0 1 0 1 1 2 Genotyp 10Pll 1 1 1 1 1 2 0 1 0 1 0 1 1 2 genotype 10

P12 0 2 0 1 2 2 0 1 1 0 0 0 0 1 Genotyp 7P12 0 2 0 1 2 2 0 1 1 0 0 0 0 1 genotype 7

P13 1 1 0 2 2 2 0 -1 0 1 0 1 1 2 Genotyp 11P13 1 1 0 2 2 2 0 -1 0 1 0 1 1 2 genotype 11

P14 0 2 0 1 2 2 0 1 1 0 0 0 0 1 Genotyp 7P14 0 2 0 1 2 2 0 1 1 0 0 0 0 1 genotype 7

P15 0 2 0 2 2 2 0 0 0 0 0 0 0 1 Genotyp 4 P16 1 1 0 -1 2 2 0 1 0 1 0 1 1 1 Genotyp 12P15 0 2 0 2 2 2 0 0 0 0 0 0 0 1 genotype 4 P16 1 1 0 -1 2 2 0 1 0 1 0 1 1 1 genotype 12

P17 2 0 1 2 2 2 0 1 0 1 0 1 1 -1 Genotyp 13P17 2 0 1 2 2 2 0 1 0 1 0 1 1 -1 genotype 13

P18 0 2 0 0 1 0 2 2 2 0 0 0 0 0 Genotyp 14P18 0 2 0 0 1 0 2 2 2 0 0 0 0 0 genotype 14

P19 1 1 0 2 2 2 0 1 0 1 0 2 1 1 Genotyp 2P19 1 1 0 2 2 2 0 1 0 1 0 2 1 1 genotype 2

P20 0 2 0 2 2 2 0 0' 0 0 0 0 0 1 Genotyp 4P20 0 2 0 2 2 2 0 0 '0 0 0 0 0 1 genotype 4

P21 1 1 0 1 2 1 1 2 1 1 0 2 0 0 Genotyp 15P21 1 1 0 1 2 1 1 2 1 1 0 2 0 0 genotype 15

P22 1 1 0 1 2 1 1 2 1 0 0 2 1 1 Genotyp 16P22 1 1 0 1 2 1 1 2 1 0 0 2 1 1 genotype 16

P23 1 1 0 1 -l -l 1 2 1 0 0 2 1 1 Genotyp 17P23 1 1 0 1 -l -l 1 2 1 0 0 2 1 1 genotype 17

P24 1 1 0 2 2 2 0 1 0 0 1 0 0 1 Genotyp 18P24 1 1 0 2 2 2 0 1 0 0 1 0 0 1 genotype 18

Das Ergebnis der Berechnung, das auf dem oben beschriebenen Verfahren beruht, sieht wie folgt aus:The result of the calculation, which is based on the method described above, looks as follows:

Genotyp 1: 1 0 0 2 2 2 0 1 0 1 1 2 1 2Genotype 1: 1 0 0 2 2 2 0 1 0 1 1 2 1 2

0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 0 1 0 1 1.000 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 0 1 0 1 1.00

0 0 0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.000 0 0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.00

0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0.960 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0.96

Genotyp 2: 1 1 0 2 2 2 0 1 0 1 0 2 1 1Genotype 2: 1 1 0 2 2 2 0 1 0 1 0 2 1 1

0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.000 1 0 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.00

0 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0.910 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0.91

0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0.880 1 0 1 1 1 0 1 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0.88

Genotyp 3: 1 1 -1 2 2 2 0 0 0 0 0 0 0 1Genotype 3: 1 1 -1 2 2 2 0 0 0 0 0 0 0 1

0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1.00

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.590 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.59

Genotyp 4: 0 2 0 2 2 2 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.00Genotype 4: 0 2 0 2 2 2 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.00

Genotyp 5: 2 0 1 2 2 2 0 1 0 1 0 1 1 1Genotype 5: 2 0 1 2 2 2 0 1 0 1 0 1 1 1

1 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 V 1 1 1 0 0 0 0 0 0 0 1 1.001 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 V 1 1 1 0 0 0 0 0 0 0 1 1.00

1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.73 Genotyp 6: 1 1 0 2 2 2 0 0 0 0 -1 0 0 11 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.73 Genotype 6: 1 1 0 2 2 2 0 0 0 0 -1 0 0 1

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 Ö 0 0 0 0 2 1 0 0^"1 1 1 0 0 0 0 1 0 0 0 0.350 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0.74 0 1 0 1 1 1 0 0 Ö 0 0 0 0 2 1 0 0 ^" 1 1 1 0 0 0 0 1 0 0 0 0.35

Genotyp 7: 0 2 0 1 2 2 0 1 1 0 0 0 0 1Genotype 7: 0 2 0 1 2 2 0 1 1 0 0 0 0 1

0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.000 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1.00

0 1 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0.96 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 0.120 1 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0.96 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 0.12

Genotyp 8: 2 0 1 2 -1 2 0 1 0 1 0 1 1 1Genotype 8: 2 0 1 2 -1 2 0 1 0 1 0 1 1 1

1 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.001 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.00

1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.731 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0.94 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0.73

Genotyp 9: 1 1 1 2 2 2 0 0 0 0 0 0 1 2Genotype 9: 1 1 1 2 2 2 0 0 0 0 0 0 1 2

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1.00

0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0.090 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0.09

0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 0.080 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 0.08

Genotyp 10: 1 1 1 1 1 2 0 1 0 1 0 1 1 2Genotype 10: 1 1 1 1 1 2 0 1 0 1 0 1 1 2

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1 1 1.00

0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0.500 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0.50

0 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0.500 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0.50

Genotyp 11: 1 1 0 2 2 2 0 -1 0 1 0 1 1 2Genotype 11: 1 1 0 2 2 2 0 -1 0 1 0 1 1 2

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.00

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1.00

0 1 0 1 1 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.120 1 0 1 1 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.12

Genotyp 12: 1 1 0 -1 2 2 0 1 0 1 0 1 1 1Genotype 12: 1 1 0 -1 2 2 0 1 0 1 0 1 1 1

0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.000 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1.00

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0.68 0 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.12 Genotyp 13: 2 0 1 2 2 2 0 1 0 1 0 1 1 -10 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0.68 0 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0.12 Genotype 13: 2 0 1 2 2 2 0 1 0 1 0 1 1 -1

1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.001 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 1 1.00

1 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.651 0 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0.65

1 0 0 1 1 1 0 Ϊ Ö 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 0.621 0 0 1 1 1 0 Ϊ Ö 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 0.62

Genotyp 14: 0 2 0 0 1 0 2 2 2 0 0 0 0 0Genotype 14: 0 2 0 0 1 0 2 2 2 0 0 0 0 0

0 1 0 0 0 0 1 1 1 0 .0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 1.000 1 0 0 0 0 1 1 1 0 .0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 1.00

Genotyp 15: 1 1 0 1 2 1 1 2 1 1 0 2 0 0Genotype 15: 1 1 0 1 2 1 1 2 1 1 0 2 0 0

0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 0 1.000 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 1 0 1 0 0 1.00

0 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 0.610 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 0.61

0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0.600 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0.60

Genotyp 16: 1 1 0 1 2 1 1 2 1 0 0 2 1 1Genotype 16: 1 1 0 1 2 1 1 2 1 0 0 2 1 1

0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.000 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.00

0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0.730 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0.73

0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0.510 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0.51

Genotyp 17: 1 1 0 1 -1 -1 1 2 1 0 0 2 1 1Genotype 17: 1 1 0 1 -1 -1 1 2 1 0 0 2 1 1

0 1 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.000 1 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1.00

Genotyp 18: 1 1 0 2 2 2 0 1 0 0 1 0 0 1Genotype 18: 1 1 0 2 2 2 0 1 0 0 1 0 0 1

0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1.000 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1.00

0 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 1 0.960 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 1 0.96

0 1 0 1 1 ϊ 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 0.370 1 0 1 1 ϊ 0 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 0 0.37

Die kompatiblen Haplotypenpaare wurden zunächst nach der Größe der Wahrscheinlichkeiten p(x,y) geordnet. Es werden aber jeweils nur maximal die 3 wahrscheinlichsten davon hier angegeben. Die letzte Spalte zeigt die Werte dieser Wahrscheinlichkeiten relativ zum jeweiligen Maximalwert. The compatible haplotype pairs were initially ordered according to the size of the probabilities p (x, y). However, only a maximum of the 3 most likely of them are given here. The last column shows the values of these probabilities relative to the respective maximum value.

Claims

claims

1. A method for processing gene sequences of diploid organisms, in which the amount of possible haplotype pairs (x, y) and the probability of their occurrence is predicted for each genotype g from a predetermined group of subjects, with the steps:

- determination of the haplotypes x from the total of haplotypes that are compatible with the genotype g,

Determination of pairs (x, y) of g-compatible haplotypes from which the genotype g can be composed,

- determination of the probabilities h (g, x) of the haplotype pairs by a statistical estimation method, and

- Output, storage and / or display of at least one pair of haplotypes that has a predetermined minimum probability.

2. The method according to claim 1, in which h (g, x) is calculated according to the following distribution principle: h (g, x) = 1 / N (g) for all x from H (g) and h (g, x) = 0 in all other cases and the product p (x, y) = P (x) - P (y) with P (x) ⁼ ∑ _eG f (g) 'h to determine the most probable haplotype pair for all compatible haplotype pairs (g, x) is calculated, where h (g, x) is an a priori probability that x is a haplotype compatible with the considered genotype g, and f (g) is the relative frequency of g estimated from the sample was, is.

3. Computer program product which is set up for processing gene sequences according to a method according to one of the preceding claims.

4. Device for processing gene sequences according to a method according to one of the preceding claims, comprising:

a data conversion device for providing a large number of the genotype and haplotype cell sequences,

a selection device for determining the compatible haplotypes and the compatible haplotype pairs,

- a computing device for carrying out the estimation method, and

an output device for storing, outputting and / or displaying the estimated haplotype pairs.