WO2014153757A1 - Procédé, système, et support lisible par ordinateur pour déterminer des informations de base d'une région prédéterminée dans un génome fœtal - Google Patents
Procédé, système, et support lisible par ordinateur pour déterminer des informations de base d'une région prédéterminée dans un génome fœtal Download PDFInfo
- Publication number
- WO2014153757A1 WO2014153757A1 PCT/CN2013/073375 CN2013073375W WO2014153757A1 WO 2014153757 A1 WO2014153757 A1 WO 2014153757A1 CN 2013073375 W CN2013073375 W CN 2013073375W WO 2014153757 A1 WO2014153757 A1 WO 2014153757A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- embryo
- sequencing
- genome
- embryonic
- predetermined region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Definitions
- the present invention relates to methods, systems and computer readable media for determining base region information in a predetermined region of an embryonic genome. Background technique
- Hereditary diseases are diseases caused by changes in genetic material, which are characterized by congenital, familial, lifelong and hereditary. Hereditary diseases can be divided into three broad categories: monogenic genetic diseases, polygenic genetic diseases, and chromosomal abnormalities. Among them, monogenic diseases are caused by abnormal gene function caused by dominant or recessive inheritance of single disease-causing genes; while polygenic genetic diseases are diseases caused by multiple genetic changes, which will be affected to some extent by external environmental factors. The effects of chromosomal abnormalities include abnormal numbers and structural abnormalities. The most common is Down's syndrome caused by the trisomy of chromosome 21, and the children are characterized by congenital features such as congenital and limb shape abnormalities.
- the inventors have found that the number of cells that can be biopsied for pre-implantation testing is limited. For example, there are only 4-8 cells in the blastomere cultured until the third day, and only 1-2 cells can be taken for biopsy sampling. Even in the embryos that were cultured until the fifth day, only 3-10 cells were taken for the biopsy sampling of the outer trophoblast cells. Due to the limited amount of DNA in a single or a small number of embryonic cells, it is difficult to directly perform a comprehensive genetic variation test. Whole Genome Amplification (WGA) is required to achieve a sufficient amount of DNA to perform a comprehensive analysis of genetic variation. Whole-genome amplification is often biased, resulting in a heterozygous deletion or allele drop out, which poses a risk for the genetic variation associated with genetic disease in embryonic cells.
- WGA Whole Genome Amplification
- the present invention aims to solve at least one of the technical problems existing in the prior art.
- Estimating embryos by the method of the invention The ploidy type allows localization of sites where alleles are tripped and can be corrected.
- the embryo haplotype analysis method can be based on genetic information of family members such as parents and probands, or based on genetic information from other embryonic cells of the same parent.
- the invention proposes a method of determining base information for a predetermined region in an embryonic genome.
- the method combines lineage inference and statistical algorithms to determine the haplotype of the embryo, and through the genotype of the relevant individual in the family, that is, the transfer of the chromosomal segment, and the statistical algorithm to infer the haplotype.
- the method comprises the steps of: obtaining a sequencing result of an embryonic cell genomic DNA sample, and a sequencing result of an embryo genetically related individual genome sample; constructing the inheritance of the embryo based on a sequencing result of the embryonic cell genomic DNA sample Sketching to determine an initial genotype of the embryo; determining a haplotype of the embryo parent based on sequencing results of the embryonic genetically related individual genomic sample; and based on the hidden Markov model, using the initial genotype of the embryo as an observation sequence, based on The haplotype of the embryonic parent determines the base information of a predetermined region in the embryonic genome.
- the genomic formation process of the offspring is equivalent to a random recombination of the parental genome (ie, linkage interchange haplotype recombination, and random combination of gametes).
- haplotype of the embryo as the hidden state
- the sequencing data of the whole genome of the embryo single cell is taken as the observations, and the state transition probability is derived by using the Bayesian algorithm with the prior data. Probabilities ) , construct observation sequence probabilities and initial state distribution, and then use the Viterbi algorithm to infer the most likely combination of embryonic haplotypes.
- a nucleic acid sequence of a specific region in the embryo genome can be determined, Therefore, the pre-implantation detection of the genetic information of the embryo genome can be effectively performed.
- Sites or DNA sequences with low coverage, heterozygous deletion or Allele Drop Out in embryonic cell whole genome amplified DNA sequencing results can be accurately inferred by this method. Therefore, by this method, the position or sequence of the heterozygous deletion or Allele Drop Out can be corrected to make the detection result more accurate and reliable.
- the invention provides a system for determining base information for a predetermined region in an embryonic genome.
- the system comprises: a library construction device adapted to separately construct a sequencing library for an embryonic genomic DNA sample and a genomic DNA sample of an embryo genetically related individual; a sequencing device, the sequencing The device is coupled to the library construction device and is adapted to sequence the sequencing library to obtain sequencing results of embryos and embryo genetically related individuals; and an analysis device coupled to the sequencing device and adapted to : constructing a genetic sketch of the embryo based on sequencing results of the embryonic genomic DNA sample to determine an initial genotype of the embryo; determining a haplotype of the embryo parent based on sequencing results of the embryonic genetically related individual genomic sample; and
- the Kraft model uses the embryonic initial genotype as an observation sequence to determine base information of a predetermined region in the embryo genome based on the haplotype of the embryo parent.
- an embryo biopsy device may be further included, the embryo biopsy device being adapted to perform biopsy on an in vitro cultured embryo to sample embryonic cells.
- the method for determining the predetermined region base information in the embryo genome can be effectively implemented, and the hidden Markov model can be used, for example, by using the Viterbi algorithm to refer to the embryo genetic related individual. Genetic information that identifies the nucleic acid sequence of a particular region of the embryo's genome Therefore, the pre-implantation detection of the genetic information of the embryo genome can be effectively performed, thereby effectively determining the genetic information of the embryo genome.
- the invention also provides a computer readable medium.
- a genetic sketch of an embryo is constructed based on a sequencing result of an embryonic cell genomic DNA sample to determine an initial genotype of the embryo; and a haplotype of the embryo parent is determined based on a sequencing result of the embryonic genetic related individual genome sample And determining, based on the hidden Markov model, the initial genotype of the embryo as an observation sequence, based on the haplotype of the parent of the embryo, determining base information of a predetermined region in the embryo genome.
- the stored instructions can be efficiently executed by the processor so as to be able to utilize the hidden Markov model, for example, by using the Viterbi algorithm, based on the sequencing results of the embryonic cells,
- the genetic information of the embryo-related individuals can determine the nucleic acid sequence of a specific region in the embryo genome, thereby effectively performing pre-implantation detection on the genetic information of the embryo genome.
- FIG. 1 is a flow chart showing analysis using a hidden Markov model according to an embodiment of the present invention
- FIG. 2 is a schematic structural diagram of a system for determining a predetermined region nucleic acid sequence in an embryo genome according to an embodiment of the present invention. Detailed description of the invention
- first and second are used for descriptive purposes only, and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defining “first”, “second” may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, “multiple” means two or more unless otherwise stated.
- the invention proposes a method of determining base information for a predetermined region in an embryonic genome. According to an embodiment of the invention, the method comprises the following steps:
- embryonic genetically related individual refers to an individual that is genetically related to an embryo, for example, an embryonic genetic related individual that can be used according to an embodiment of the present invention.
- parent of the embryo such as parents, grandparents, grandparents, and other children of the embryonic parents.
- the other children of the embryo's parents should be understood in a broad sense. They can be either children who have been born, children who are not yet born (fertilized eggs or embryos), embryos that are dead, embryos that are cultured in vitro, or fertilized eggs. As long as the same parents are shared with the embryo to be tested.
- the source of the embryonic cell genomic DNA sample is not particularly limited.
- biopsy sampling may be performed on the embryo, embryonic cells are obtained, and whole cell genome amplification (WGA) is performed on the embryonic cell sample to obtain embryonic cell genomic DNA.
- WGA whole cell genome amplification
- embryonic biopsy refers to the technique of separating a portion of an embryonic cell from an embryo using a technique: or separating a portion of a cell from a fertilized egg/gamete.
- the embryonic cells may be derived from any one of a blastomere, a trophoblast, a fertilized egg, and a gamete, and may be a single cell or 2 to 10 cells.
- whole-genome amplification of any sample of pregnant women containing embryonic nucleic acids can be performed to obtain embryonic genomic DNA samples.
- the embryo's genome can be effectively monitored without affecting the normal development of the embryo.
- “Genome-wide Amplification (WGA)” mainly includes Multiple Displacement Amplification (MDA) and PCR-based WGA technology, which can use the self-developed WGA amplification process or commercialization.
- Kits such as Qiagen's REPLI-g series of kits, Sigma-Aldrich's GenomePlex WGA kit (WGA4), Rubicon Genomics' PicoPlex WGA kit, and GE-Healthcare's illustra Genomiphi WGA kit.
- WGA4 Sigma-Aldrich's GenomePlex WGA kit
- WGA4 Rubicon Genomics' PicoPlex WGA kit
- GE-Healthcare's illustra Genomiphi WGA kit After amplifying the DNA sample, a sequencing library is constructed separately for the amplified DNA sample of the embryonic cell and the genomic DNA sample of the embryonic genetic related individual.
- a method and apparatus for extracting a nucleic acid sample from a biological sample are also not particularly limited, and can be carried out using a commercially available nucleic acid extraction kit.
- the sequencing library is applied to the sequencing instrument, the sequencing library is sequenced, and the corresponding sequencing result is obtained, and the sequencing result is composed of a plurality of sequencing data.
- the method and apparatus that can be used for sequencing according to embodiments of the present invention are not particularly limited, and include, but are not limited to, dideoxy chain termination method; preferably high-throughput sequencing methods, whereby high utilization of these sequencing devices can be utilized
- the characteristics of flux and deep sequencing further improve sequencing efficiency. Thereby, the subsequent analysis of the sequencing data, especially the accuracy and accuracy of the statistical test analysis, can be improved.
- the high throughput sequencing methods include, but are not limited to, second generation sequencing techniques or single molecule sequencing techniques.
- the second generation sequencing platform (Metzker ML. Sequencing technologies-the next generation.
- Nat Rev Genet. 2010 Jan; ll(l): 31-46) includes, but is not limited to, Illumina-Solexa (GATM, HiSeq2000TM, Miseq, etc.). ;), ABI-SOLiD, Life Technologies-Ion Torrent/Proton and Roche-454 (jobe acid sequencing) sequencing platforms; single molecule sequencing platforms (technologies) including but not limited to Helicos's true single molecule sequencing technology (True Single Molecule) DNA sequencing ) , Pacific Biosciences single molecule real-time (SMRTTM), and nanopore sequencing technology from Oxford Nanopore Technologies (Rusk, Nicole (2009-04-01). Cheap Third-Generation Sequencing. Nature Methods 6 (4): 244-245).
- the whole genome sequencing library can be sequenced using at least one selected from the group consisting of Illumina-Solexa, ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454, and a single molecule sequencing device. .
- a genetic sketch of the embryo can be constructed based on the result of the sequencing of the embryonic cell genomic DNA sample to determine the initial genotype of the embryo. Based on the sequencing results of the embryonic genetically related individual genome samples, the haplotypes of the embryo parents are determined.
- a genetic sketch of the embryo is constructed by aligning the sequencing result of the embryonic cell genomic DNA sample with a reference sequence. Determining the genotype of the embryo genetically related individual by comparing the sequencing result of the embryo genetically related individual genomic sample with a reference sequence; and determining a single of the embryo parent based on the genotype of the embryo genetically related individual Ploid type.
- a known human reference genome can be used as a reference sequence.
- the human reference genome used is NCBI 36.3, HG18.
- the method of performing the comparison is not particularly limited.
- SOAP can be used for comparison.
- the base information of the predetermined region in the embryo genome is determined according to the hidden Markov model.
- predetermined region as used herein shall be taken broadly to mean any region of a nucleic acid molecule that contains a site at which a predetermined event may occur.
- SNP analysis it can refer to the region containing the SNP site.
- the predetermined region refers to the full length or portion of the chromosome to be analyzed, i.e., all sequencing data from that chromosome is selected.
- the method of selecting the sequencing data from the corresponding region from the sequencing results can be not particularly limited. According to an embodiment of the present invention, sequencing data from a predetermined region can be obtained by aligning all of the obtained sequencing data with a known nucleic acid reference sequence.
- the predetermined area may also be a plurality of discrete points on the basis group that are discontinuous.
- the type of reference sequence that can be used is not particularly limited and may be any known sequence containing a region of interest.
- base information of the predetermined region can be determined based on a hidden Markov model based on a result of sequencing of the embryonic cells in combination with genetic information of an embryo genetically related individual.
- base information of a specific region in the embryo genome can be determined by means of a hidden Markov model using a Viterbi algorithm. Thereby, the pre-implantation detection of the genetic information of the embryo genome can be effectively performed.
- an embryonic genetic related individual refers to genetically An individual having a relationship with an embryo, for example, an embryonic genetic related individual that can be used according to an embodiment of the present invention is a parent of an embryo such as a parent, a grandparent, a grandparent, and other children of an embryonic parent.
- the other children of the embryo parents mentioned here should be understood in a broad sense. They can be either children born or unborn children (embryo or fertilized eggs), dead embryos, or embryos cultured in vitro. Or fertilized eggs, as long as they share the same parents as the embryos to be tested.
- the genomic formation process of the offspring is equivalent to a random recombination of the parental genome (ie, linkage-interchange haplotype recombination, and random combination of gametes).
- the embryo's haplotype as a hidden state
- the embryo Sequencing data after single-cell whole genome amplification is used as observations, and the transition probabilities are derived by using prior data, and the observation symbol probabilities and initial state probability distribution (initial state probability distribution) are constructed. Distribution ), and then the most probable combination of embryonic haplotypes can be inferred by the Viterbi algorithm.
- a hidden Markov model for example, by using a Viterbi algorithm, referring to genetic information of an embryo genetically related individual, a nucleic acid sequence of a specific region in the embryo genome can be determined, Therefore, the pre-implantation detection of the genetic information of the embryo genome can be effectively performed.
- the number of sites that need to be detected is N.
- the hidden state set is: " ⁇ 1", , which defines which parent's chromosome is passed on to the offspring. 0 represents the chromosome that was inherited to the first i-positive, and 1 represents the chromosome that was not inherited to the proband.
- the observation state set is: ' _ 1", ⁇ , 1 means that the chromosome of the inheritance to the proband is consistent with the genotype of the embryo genetic sketch, and 0 represents inconsistency.
- the first step is to construct the initial state probability distribution vector, haploid recombination. Transfer matrix and observation sequence probability matrix
- the initial state probability distribution is recorded as ⁇ ) ⁇ , ' ⁇ 0 ' 5 ),
- Nr and Np respectively represent the expected number of recombination and single nucleotide polymorphism sites. According to the prior data Nr, a natural number between 20 and 40 can be taken to calculate the transition probability of the hidden state, that is, calculate the base combination of each base. The probability that a haplotype will recombine.
- determining the base information of the predetermined region in the embryo genome according to the hidden Markov model further includes:
- the initial state probability distribution vector, the probability matrix of the hidden state transition, and the observed sequence probability matrix are constructed; the final state is determined by the Whitby algorithm and the optimal path is traced back to determine the base information of the predetermined region in the embryo genome.
- the hidden Markov model uses the following parameters:
- #sites (L>0, Must-hom.) is the number of loci that must be homozygous for the offspring. #sites (L>0, Must-horn, or Must-het.) must be homozygous for the offspring. The number of loci and the sum of the number of progeny must be the number of heterozygous loci; the probability of the local swell is ieil-Wj, and the reverse pointer is Te ⁇ l...N ⁇ ⁇
- the final state is recursively based on the Whitby algorithm.
- the terms "local probability ( )” and “reverse pointer ⁇ ; ( )” are used in the classic definition of the Viterbi algorithm.
- this method has the following technical advantages, mainly in terms of accuracy and the amount of genetic information available:
- both the paternal and maternal sites of the embryo can be well detected, the detection accuracy can be as high as 95% or more, and multiple variant types can be detected, and the disease is expanded. The scope of the test.
- genetic disease mapping can be performed.
- it can be directly inferred from the information of other sites, and the amount of information available at one time is large, which is more instructive for clinical detection.
- the method for determining base information of a predetermined region in an embryo genome is not limited to a genetic polymorphism site such as SNP or STR, and may be used for all genetic polymorphism sites. Applicable, and multiple sites can be used simultaneously to verify each other.
- System for determining predetermined regional information in the embryonic genome is not limited to a genetic polymorphism site such as SNP or STR, and may be used for all genetic polymorphism sites. Applicable, and multiple sites can be used simultaneously to verify each other.
- the invention provides a system for determining a predetermined region of a nucleic acid sequence in an embryonic genome.
- the system 1000 can include: a library construction device 100, a sequencing device 200, and an analysis device 400.
- library construction device 100 is adapted to construct a sequencing library separately for embryonic genomic DNA samples and genomic DNA samples of embryo genetically related individuals.
- the sequencing device 200 is coupled to the library construction device 100 and is adapted to sequence the sequencing library to obtain sequencing results for embryos and embryo genetically related individuals.
- a DNA sample separation device and a DNA amplification device may be further included, and the DNA sample separation device is adapted to perform biopsy sampling on the embryo to obtain embryonic cells.
- the embryonic cells may be derived from any one of a blastomere, a trophoblast, a fertilized egg, and a gamete, and may be a single cell, or may be a cell, such as 2-10 cells, or may contain any embryo.
- the DNA amplification device is adapted for whole genome amplification of embryonic cells obtained by biopsy sampling to obtain a sufficient amount of DNA for subsequent detection analysis. Thus, the embryo's genome can be effectively monitored without affecting the normal development of the embryo.
- Methods and procedures for whole-genome amplification including Multiple Displacement Amplification (MDA) and PCR-based WGA technology, can be used in the self-developed WGA amplification process, or can use commercial reagents.
- MDA Multiple Displacement Amplification
- WGA PCR-based WGA technology
- the method and apparatus for extracting a nucleic acid sample from a biological sample are also not particularly limited, and can be carried out using a commercially available nucleic acid extraction kit.
- the method and apparatus that can be used for sequencing according to embodiments of the present invention are not particularly limited, and include, but are not limited to, dideoxy chain termination method; preferably high-throughput sequencing methods, whereby high utilization of these sequencing devices can be utilized
- the characteristics of flux and deep sequencing further improve sequencing efficiency. Thereby, the subsequent analysis of the sequencing data, especially the accuracy and accuracy of the statistical test analysis, can be improved.
- the high throughput sequencing methods include, but are not limited to, second generation sequencing techniques or single molecule sequencing techniques.
- the second generation sequencing platform (Metzker ML.
- the whole genome sequencing library can be sequenced using at least one selected from the group consisting of Illumina-Solexa, ABI-SOLiD, Life Technologies-Ion Torrent/Proton, Roche-454, and single molecule sequencing devices.
- Alignment device 300 may also be included in accordance with an embodiment of the present invention.
- the comparison device 300 is coupled to the sequencing device 200 and is adapted to align the resulting sequencing results with a reference sequence for comparison of the sequencing results of the embryonic cell genomic DNA samples with a reference sequence Constructing a genetic sketch of the embryo; determining a genotype of the embryo genetically related individual by comparing the sequencing result of the embryo genetically related individual genomic sample with a reference sequence; and a gene based on the embryo genetically related individual Type, determining the haplotype of the parent of the embryo.
- the type of reference sequence that can be used is not particularly limited and may be any known sequence containing a region of interest.
- a known human reference genome can be used as a reference sequence.
- the human reference genome used is NCBI 36.3, HG18.
- the method of performing the comparison is not particularly limited.
- SOAP can be used for comparison.
- the analyzing device 400 is adapted to: construct a genetic sketch of the embryo based on the sequencing result of the embryonic cell genomic DNA sample, to determine an initial genotype of the embryo; and sequence the result based on the genome sample of the embryo genetically related individual Determining a haplotype of the embryonic parent; and determining, based on the hidden Markov model, the base information of the predetermined region in the embryo genome based on the initial genotype of the embryo as an observation sequence, based on the haplotype of the parent of the embryo.
- base information of a predetermined region in an embryo genome is determined according to a hidden Markov model Further includes:
- the initial state probability distribution vector, the probability matrix of the hidden state transition, and the observed sequence probability matrix are constructed; the final state is determined by the Whitby algorithm and the optimal path is traced back to determine the base information of the predetermined region in the embryo genome.
- the hidden Markov model uses the following parameters:
- Nr Np respectively represent the expected number of recombination and single nucleotide polymorphism sites
- #sites (L>0, Must-hom.) is the number of loci that must be homozygous for the offspring. #sites (L>0, Must-horn, or Must-het.) must be homozygous for the offspring. The number of loci and the sum of the number of progeny that must be heterozygous;
- the probability of the bureau Shao is ieil-Wj
- the final state of recursion is , backtracking the optimal path, determining the base information of the most likely predetermined region of the embryo as
- the data analysis section has been described in detail above, and is of course applicable to a system for determining a predetermined region nucleic acid sequence in the embryonic genome. No longer.
- the method for determining a predetermined region nucleic acid sequence in the embryo genome can be effectively implemented, and the embryonic genome can be determined by, for example, a Viterbi algorithm decoding by means of a hidden Markov model.
- the predetermined region is a site where the genetic polymorphism is known to exist, and the genetic polymorphism is at least one selected from the group consisting of a single nucleotide polymorphism and an STR.
- the term "connected” as used herein shall be understood broadly and may be either directly connected or indirectly connected as long as the above functional connections are achieved.
- the invention provides a computer readable medium.
- instructions are stored on the computer readable medium, the instructions being adapted to be executed by a processor to:
- a genetic sketch of the embryo is constructed to determine the initial genotype of the embryo
- Determining the haplotype of the embryo parent based on the sequencing result of the genetic reference sample of the embryo related to the embryo; and determining the embryo genome by the initial genotype of the embryo as the observation sequence according to the hidden Markov model, based on the haplotype of the parent of the embryo Base information of a predetermined area.
- the method described above can be effectively implemented, so that base information of a specific region in the embryo genome can be determined by, for example, a Viterbi algorithm using a hidden Markov model.
- base information of a specific region in the embryo genome can be determined by, for example, a Viterbi algorithm using a hidden Markov model.
- the pre-implantation detection of the genetic information of the embryonic genome can be effectively performed.
- determining the base information of the predetermined region in the embryo genome according to the hidden Markov model further includes:
- the initial state probability distribution vector, the probability matrix of the hidden state transition, and the observed sequence probability matrix are constructed; the final state is determined by the Whitby algorithm and the optimal path is traced back to determine the base information of the predetermined region in the embryo genome.
- the hidden Markov model uses the following parameters:
- Nr and Np respectively represent the expected number of recombination and single nucleotide polymorphism sites, Nr can take a natural number between 20-40, and observe the sequence probability matrix for its towel.
- #sites (L>0, Must-hom.) is the number of loci that must be homozygous for the offspring. #sites (L>0, Must-horn, or Must-het.) must be homozygous for the offspring. The number of loci and the sum of the number of progeny that must be heterozygous; The probability of the bureau Shao is ieil-Wj
- the reverse pointer is te ⁇ l...N ⁇ ⁇
- the final state is the backtracking optimal path, and the most likely embryo predetermined region is determined.
- the terms local probability ( ) and inverse pointer ⁇ ; ( ) are used in the classic definition of the Viterbi algorithm. For a detailed description of the definition of this parameter, see Lawrence R. Rabiner, PROCEEDINGS OF THE IEEE, Vol.77, No .2, February 1989, incorporated herein by reference in its entirety.
- the data analysis section has been described in detail above, and is of course applicable to a system for determining a predetermined region nucleic acid sequence in the embryonic genome. No longer.
- the method for determining a predetermined region nucleic acid sequence in the embryo genome can be effectively implemented, and the specificity of the embryo genome can be determined by, for example, a Viterbi algorithm using a hidden Markov model.
- the base information of the region thereby enabling pre-implantation detection of the genetic information of the embryonic genome.
- the predetermined region is a site where the genetic polymorphism is known to exist, and the genetic polymorphism is at least one selected from the group consisting of a single nucleotide polymorphism and an STR.
- a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by the instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
- the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as for example by optical scanning of paper or other media, followed by editing, interpretation or other suitable if necessary The method proceeds to obtain the program electronically and then store it in computer memory.
- portions of the invention may be implemented in hardware, software, firmware or a combination thereof.
- multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
- a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented with any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
- each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
- the integrated modules, if implemented in the form of software functional modules and sold or used as separate products, may also be stored in a computer readable storage medium.
- the solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J.
- the main steps of the embodiment of the present invention include:
- DNA sequencing data is filtered and compared to the human genome reference sequence.
- the embryo haplotype was inferred from the parental haplotype results.
- the hidden Markov model used in the information analysis part of the embodiment of the present invention and the detailed analysis steps are as follows:
- the number of sites to be detected is N.
- the hidden state set is: Two ⁇ ,: ⁇ , which defines which parent's chromosome is passed on to the offspring. 0 means inheritance first The chromosome of the witness.
- the first step is to construct the initial state probability distribution vector, the haploid recombination transfer matrix and the observed sequence probability matrix.
- Nr, Np represent the expected number of recombination and single nucleotide polymorphism sites, respectively.
- the second step is to construct a local probability matrix, and a reverse pointer.
- IVF-PGD treatment cycle a single-cell biopsy of the blastomere embryo was taken on the third day, and the blastomere after the biopsy was further cultured until the fifth day of the embryonic stage, and the embryo was vitrified for cryopreservation.
- IVF-PGD treatment After completing IVF-PGD treatment, select the correct embryo to be implanted into the uterus. After successful pregnancy, pregnant women with amniotic fluid (PED7-2-OAF) can be taken after 14 weeks, or embryonic cord blood can be collected, DNA extracted, and sequenced to verify The result of typing of embryonic cells. Embryo single-cell whole-genome amplification products DNA was disrupted to a 200-bp fragment using a CovarisTM interrupter, and the library was constructed according to the requirements of the illumia® HiSeq2000TM sequencer.
- PED7-2-OAF amniotic fluid
- the HiSeq2000TM sequencer was sequenced and the sequencing strategy was Pair End 90 index (ie, bidirectional 90bp index sequencing;), sequencing depth 30X, whole genome sequencing.
- the instrument's parameter settings and operating methods are in accordance with the illumina® operating manual (available at http://wwwillumina.com/support/documentation.ilmn).
- the library was captured by All in One chip.
- the All in One chip is an autonomously designed target region capture chip with a target region including a full exon region, a 1M SNP site region, and an MHC gene region.
- the chip-captured library was quality-controlled by the Agilent® Bioanalyzer 2100 and Q-PCR methods.
- the qualified library was further sequenced by the illumina® HiSeq2000TM sequencer.
- the sequencing strategy was Pair End 90 index, and the sequencing depth was 30-50X in the target area.
- the instrument's parameter settings and operating methods are in accordance with the illumina® operating manual (available at http://wwwillumina.com/support/documentation.ilmn). Second, sequencing data analysis:
- the haplotype is inferred from the core family, and can be specifically referred to in BL Browning and SR Browning. Am J Hum Genet 84: 210-223. (2009). 4. Inferring the haplotype of embryonic cells based on parental haplotype:
- ketch hybrid means a site that is heterozygous before and after recombination;
- Correct heterozygous means homozygous before reconstitution and heterozygous after recombination.
- the correct embryos ie, ⁇ -thalassemia-negative and HLA-type embryos matched with the proband, are selected for cryopreservation and implantation into the uterus, thereby preventing Pregnancy in sick children.
- Example 2 the correct embryos, ie, ⁇ -thalassemia-negative and HLA-type embryos matched with the proband, are selected for cryopreservation and implantation into the uterus, thereby preventing Pregnancy in sick children.
- One couple were carriers of beta-thalassaemia and received IVF-PGD treatment. After an IVF-PGD treatment cycle, 8 blastomere embryos were obtained on the third day, and single-cell biopsy samples of blastomere embryos were taken. The blastomeres after biopsy were cultured until the fifth day of embryogenesis. Embryo vitrification is cryopreserved. Whole genomic amplification of 8 blastomere single cell samples obtained by biopsy, whole genome amplification, using Qiagen's REPLI-g MDA amplification kit, operating strictly according to the kit instructions, amplification products for a new generation Sequencing and other analyses.
- the peripheral blood of the father and mother in this family was collected and genomic DNA was extracted.
- the correct embryo is implanted into the uterus.
- the pregnant women's amniotic fluid can be taken after 14 weeks, or the embryonic cord blood can be collected, DNA extracted, and sequenced to verify the typing results of the embryonic cells.
- the All in One chip is a self-designed target region capture chip with a target region including a full exon region, a 1M SNP site region, and an MHC gene region.
- the chip-captured library was quality-controlled by Agilent® Bioanalyzer 2100 and Q-PCR methods.
- the qualified library was further sequenced by the illumina® HiSeq2000TM sequencer.
- the sequencing strategy was Pair End 90 index (ie, bidirectional 90bp index sequencing), and the sequencing depth was 30-50X of the target area.
- the instrument's parameter settings and operating methods are in accordance with the illumina® operating manual (available at http://www.illumina.com/support/documentation.ilmn).
- BWA Burrows-Wheeler Aligner
- the genotyping accuracy of the embryos to be tested is as follows: Type (female X-father) Total number of sites SNP Call rate accuracy (hybrid) Accuracy (homozygous)
- the correct embryos ie, ⁇ -thalassemia-negative and HLA-type embryos matched with the pre-existing animals, are selected for cryopreservation and implantation into the uterus, thereby preventing Pregnancy in sick children.
- the method for determining base region information of a predetermined region in an embryo genome, the system for determining base information of a predetermined region in an embryo genome, and the computer readable medium of the present invention can be effectively applied to analyze a nucleic acid sequence of a predetermined region in an embryo genome.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201380074395.3A CN105051208B (zh) | 2013-03-28 | 2013-03-28 | 确定胚胎基因组中预定区域碱基信息的方法、系统和计算机可读介质 |
| PCT/CN2013/073375 WO2014153757A1 (fr) | 2013-03-28 | 2013-03-28 | Procédé, système, et support lisible par ordinateur pour déterminer des informations de base d'une région prédéterminée dans un génome fœtal |
| HK16101707.4A HK1213945B (en) | 2013-03-28 | Method, system, and computer readable medium for determining base information of predetermined area in fetal genome |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2013/073375 WO2014153757A1 (fr) | 2013-03-28 | 2013-03-28 | Procédé, système, et support lisible par ordinateur pour déterminer des informations de base d'une région prédéterminée dans un génome fœtal |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2014153757A1 true WO2014153757A1 (fr) | 2014-10-02 |
Family
ID=51622393
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2013/073375 Ceased WO2014153757A1 (fr) | 2013-03-28 | 2013-03-28 | Procédé, système, et support lisible par ordinateur pour déterminer des informations de base d'une région prédéterminée dans un génome fœtal |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN105051208B (fr) |
| WO (1) | WO2014153757A1 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110349631A (zh) * | 2019-07-30 | 2019-10-18 | 苏州亿康医学检验有限公司 | 确定子代对象的单体型的分析方法和装置 |
| CN110846310A (zh) * | 2018-08-21 | 2020-02-28 | 深圳华大法医科技有限公司 | Snp位点集及胚胎核酸样本进行亲缘鉴定的方法和用途 |
| CN111739584A (zh) * | 2020-07-01 | 2020-10-02 | 苏州贝康医疗器械有限公司 | 一种用于pgt-m检测的基因分型评估模型的构建方法及装置 |
| WO2021032060A1 (fr) * | 2019-08-16 | 2021-02-25 | The Chinese University Of Hong Kong | Détermination de modifications de bases d'acides nucléiques |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2020047694A1 (fr) * | 2018-09-03 | 2020-03-12 | 深圳华大智造科技有限公司 | Procédé et dispositif de détermination du statut génétique d'une nouvelle mutation dans un embryon |
| CN109522378A (zh) * | 2018-10-10 | 2019-03-26 | 深圳韦格纳医学检验实验室 | 遗传出生地概率分布的显示方法及显示设备 |
| WO2021067417A1 (fr) * | 2019-09-30 | 2021-04-08 | Myome, Inc. | Score de risque polygénique pour la fécondation in vitro |
| CN115052994B (zh) * | 2020-05-22 | 2025-09-19 | 深圳华大智造科技股份有限公司 | 确定胚胎细胞染色体中预定位点碱基类型的方法及其应用 |
| CN115064210B (zh) * | 2022-07-27 | 2022-11-18 | 北京大学第三医院(北京大学第三临床医学院) | 一种鉴定二倍体胚胎细胞中染色体交叉互换位置的方法及应用 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102127818A (zh) * | 2010-12-15 | 2011-07-20 | 张康 | 利用孕妇外周血建立胎儿dna文库的方法 |
| CN102317473A (zh) * | 2008-12-11 | 2012-01-11 | 加利福尼亚太平洋生物科学股份有限公司 | 核酸模板的分类 |
-
2013
- 2013-03-28 CN CN201380074395.3A patent/CN105051208B/zh active Active
- 2013-03-28 WO PCT/CN2013/073375 patent/WO2014153757A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102317473A (zh) * | 2008-12-11 | 2012-01-11 | 加利福尼亚太平洋生物科学股份有限公司 | 核酸模板的分类 |
| CN102127818A (zh) * | 2010-12-15 | 2011-07-20 | 张康 | 利用孕妇外周血建立胎儿dna文库的方法 |
Non-Patent Citations (1)
| Title |
|---|
| YU , WEIDONG ET AL.: "Construction of cDNA Library from Rabbit Single Preimplantation Cloned Embryos.", CHINESE JOURNAL OF BIOCHEMISTRY AND MOLECULAR BIOLOGY, vol. 20, no. 2, April 2004 (2004-04-01), pages 162 - 165 * |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110846310A (zh) * | 2018-08-21 | 2020-02-28 | 深圳华大法医科技有限公司 | Snp位点集及胚胎核酸样本进行亲缘鉴定的方法和用途 |
| CN110349631A (zh) * | 2019-07-30 | 2019-10-18 | 苏州亿康医学检验有限公司 | 确定子代对象的单体型的分析方法和装置 |
| GB2590854A (en) * | 2019-08-16 | 2021-07-07 | Univ Hong Kong Chinese | Determination of base modifications of nucleic acids |
| WO2021032060A1 (fr) * | 2019-08-16 | 2021-02-25 | The Chinese University Of Hong Kong | Détermination de modifications de bases d'acides nucléiques |
| CN112752853A (zh) * | 2019-08-16 | 2021-05-04 | 香港中文大学 | 测定核酸的碱基修饰 |
| GB2590032A (en) * | 2019-08-16 | 2021-06-16 | Univ Hong Kong Chinese | Determination of base modifications of nucleic acids |
| US11091794B2 (en) | 2019-08-16 | 2021-08-17 | The Chinese University Of Hong Kong | Determination of base modifications of nucleic acids |
| GB2590032B (en) * | 2019-08-16 | 2021-12-08 | Univ Hong Kong Chinese | Determination of base modifications of nucleic acids |
| AU2020323958B2 (en) * | 2019-08-16 | 2022-02-03 | The Chinese University Of Hong Kong | Determination of base modifications of nucleic acids |
| GB2590854B (en) * | 2019-08-16 | 2022-03-30 | Univ Hong Kong Chinese | Determination of base modifications of nucleic acids |
| AU2022202791B2 (en) * | 2019-08-16 | 2022-06-16 | The Chinese University Of Hong Kong | Determination of base modifications of nucleic acids |
| US11466308B2 (en) | 2019-08-16 | 2022-10-11 | The Chinese University Of Hong Kong | Determination of base modifications of nucleic acids |
| AU2022202791C1 (en) * | 2019-08-16 | 2022-11-03 | The Chinese University Of Hong Kong | Determination of base modifications of nucleic acids |
| CN111739584A (zh) * | 2020-07-01 | 2020-10-02 | 苏州贝康医疗器械有限公司 | 一种用于pgt-m检测的基因分型评估模型的构建方法及装置 |
| CN111739584B (zh) * | 2020-07-01 | 2024-02-09 | 苏州贝康医疗器械有限公司 | 一种用于pgt-m检测的基因分型评估模型的构建方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| HK1213945A1 (zh) | 2016-07-15 |
| CN105051208B (zh) | 2017-04-19 |
| CN105051208A (zh) | 2015-11-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2014153757A1 (fr) | Procédé, système, et support lisible par ordinateur pour déterminer des informations de base d'une région prédéterminée dans un génome fœtal | |
| US20240271218A1 (en) | Maternal plasma transcriptome analysis by massively parallel rna sequencing | |
| US20180320235A1 (en) | Method, system and computer readable medium for determining base information in predetermined area of fetus genome | |
| TWI640636B (zh) | A method for simultaneously performing gene locus, chromosome and linkage analysis | |
| TWI874374B (zh) | 確定循環核酸之線性及環狀形式 | |
| CN112840404B (zh) | 清除噪音遗传数据、单体型定相、重构子代基因组的方法、系统和其用途 | |
| CN105648045B (zh) | 确定胎儿目标区域单体型的方法和装置 | |
| BR112013020220B1 (pt) | Método para determinar o estado de ploidia de um cromossomo em um feto em gestação | |
| CN105555970B (zh) | 同时进行单体型分析和染色体非整倍性检测的方法和系统 | |
| TWI675918B (zh) | 基於單倍型之通用非侵入性單基因疾病產前檢測 | |
| CN110541025A (zh) | 杜氏肌营养不良基因缺陷的检测方法、引物组合物及试剂盒 | |
| US11869630B2 (en) | Screening system and method for determining a presence and an assessment score of cell-free DNA fragments | |
| CN105648044B (zh) | 确定胎儿目标区域单体型的方法和装置 | |
| KR20150132216A (ko) | 다태 임신에 대한 태아 게놈의 결정 | |
| HK1213945B (en) | Method, system, and computer readable medium for determining base information of predetermined area in fetal genome | |
| US20150105264A1 (en) | Method and system for identifying types of twins | |
| HK1196401B (en) | Method, system and computer readable medium for determining base information in predetermined area of fetus genome | |
| HK1195104B (zh) | 非侵入性产前亲子鉴定方法 | |
| CN107988343A (zh) | 非侵入性产前倍性识别方法 | |
| BR122020014914B1 (pt) | Métodos para identificar uma mutação de novo no genoma de um feto não nascido de uma fêmea grávida, para determinar uma concentração fracionária de dna fetal em uma amostra biológica tirada de uma fêmea grávida e para determinar uma proporção de um genoma fetal que foi sequenciado a partir de uma amostra biológica tirada de uma fêmea grávida, e, meio legível por computador não transitório |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 201380074395.3 Country of ref document: CN |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13879982 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10/02/2016) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 13879982 Country of ref document: EP Kind code of ref document: A1 |