Detailed Description
Example 1
The NLRP5 gene mutation of a primary female infertility family is detected.
The experimental method comprises the following steps:
1. the collection of the family clinical resources and the establishment of a genetic resource library:
the ancestor in the family is a patient (IV-2), and the female is diagnosed in a maternal-child health care institute of Nanjing city two years ago and is diagnosed with primary infertility; the fertility of sister and brother is normal; the parents have been married but the mothers have passed. (family diagram is shown in FIG. 1). The clinical data mainly includes personal medical history, family history, birth history, etc. Clinical data were collected from patients (IV-2) in this family and blood samples from the father and two fertile sisters. And blood genomic DNA of the patient (IV-4) and the father (III-1) and the sister (IV-4, IV-6) was extracted using a blood genomic DNA extraction kit (Qiagen, Hilden, Germany).
2. Pathogenic mutations in this family were explored by means of high-throughput next generation sequencing:
2.1 design and order of Capture chips:
2.1.1NLRP5 Gene and transcript sequence information:
the gene capturing chip is a full exome capturing chip and can detect all known genes at present, wherein the known genes comprise all known early embryo block related pathogenic genes which are researched at present. The gene number of the NLRP5 gene referred to by us in the Ensemble database is ENSG00000171487 (note: the number is from Ensemble database, www.ensembl.org, and the detailed information of the gene code and the gene sequence can be input).
2.1.2 selection of transcripts:
specific transcripts are selected aiming at different genes, each gene contains a plurality of transcripts, and when the transcripts are selected, the transcripts with NLRP5 encoding proteins are considered firstly, if one gene has a plurality of transcripts encoding proteins, the transcripts corresponding to the proteins with the most amino acids are selected firstly, and if the amino acids of the plurality of transcripts are the same, the transcripts with the most bases are further selected. Based on the above principle, the number of the transcript of the NLRP5 gene screened by the method is ENST00000390649.3 (note: the number is from Ensemble database, www.ensembl.org, and the transcript code can be input to search the detailed information and the sequence of the transcript).
2.1.2 design of hybridization probes:
the design standard of the hybridization probe is (1) the probe covers the target area of all candidate genes, namely the exon area and the splicing part of the exon and the intron (100 bp respectively at the upstream and the downstream of the exon); (2) and (3) removing repeated sequences, namely removing the high-frequency repeated sequences appearing in the genome and the repeated fragments with the frequency of 2-5 times lower than that of the human genome, so as to avoid capturing other homologous genes, increase false positives and reduce the detection efficiency. Applicants aligned the target regions of all candidate genes to human genomic DNA sequences, removing 2.5% of the repeat sequences altogether; (3) in the process of designing the probe, specific integration is carried out on adjacent exons, and the integration standard of the adjacent probes is that when the sum of the integrated target regions of the adjacent exons (namely the upstream 100bp of the former exon to the downstream 100bp of the latter exon) is less than 600bp, the adjacent exons are integrated into one probe so as to complete the capture of a plurality of pairs of exon regions by one pair of probes; (4) when the designed probe sequence is smaller than 250bp, on the basis that the two ends of the designed probe sequence respectively contain 100bp introns at the upstream and the downstream, the introns with the same bp number are continuously added, so that the size of the probe reaches 250 bp. According to the design principle, the probe sequence designed aiming at the NLRP5 gene is shown as SEQ ID N0.1-SEQ ID NO. 31.
2.2 target region capture and deep sequencing:
firstly, genome DNA is fragmented, an A is marked at the end of the DNA, the DNA is connected with an Illumina PE joint-oligonucleotide mixture, and a connection product is enriched through PCR to obtain a DNA library. Then the DNA library is hybridized with the known pathogenic gene capturing chip, eluted and purified to obtain the coding sequence. Finally, paired ends were created and the target sequence was sequenced on the HiSeqTM 2000(Illumina, Inc., San Diego, Calif., USA) platform.
2.3, performing bioinformatics analysis on the sequencing data, and screening out candidate pathogenic genes:
2.3.1 Mosaik software (http:// bioinformatics. bc. edu/marthlab/Mosaik) was used to process the raw sequencing data (paired end data) of Gollumina, resulting in a.bam type file. The bam file is input into a GATK, a single nucleotide variant (single nucleotide variant) and small insertions or deletions (insertions/deletions) are detected by using the GATK, quality evaluation is carried out simultaneously, downstream bioinformatics analysis is facilitated, and finally a vcf type file is generated.
2.3.2 Included the sequencing results of patients
dbSNP144(http:// hgdownload. cse. ucsc. edu/goldenPath/hgl9/database/snpl44.txt. gz.), the HapMa program (ftp:// ftp. ncbi. nlm. nih. gov/hapmap), 1000Genome Project (ftp:// ftp. l. 000genome. ebi. ac. uk/voll/ftp), the Yanhuang database ((http:// yh. Genome. org. cnj) and the Exome Variant Server (http:// EVS. gs. washing. edu/EVS /) are screened in five Single Nucleotide Polymorphism (SNP) databases, all known SNP sites are filtered;
2.3.3 comparing and analyzing the gene sequences corresponding to the sequencing results of the patients, and preferentially analyzing the insertion/deletion mutation, the nonsense mutation and the missense mutation, wherein the results can be divided into three types including the known mutation, the new mutation of the known gene and the mutation of the new gene.
2.4 through Sanger sequencing verification, the pathogenic gene is identified:
the PCR method is respectively used for amplifying the screened mutation sites and adjacent DNA sequences in corresponding families, and the used Primer sequences are designed by adopting Primer 3(http:// frodo. wi. mit. edu /) Primer design software. The PCR reaction system (20. mu.L system) used was 5. mu.L buffer, 25mM MgCl 22. mu.L, DNA L. mu.L, forward primer F1. mu.L, reverse primer R1. mu.L, 10mM dNTP 0.4. mu.L, Taq enzyme 0.1. mu.L, ddH2O 10.5.5. mu.L. PCR program of 98 deg.C for 5min,35 cycles (98 deg.C for 10s, 60 deg.C for 15s, 72 deg.C for 1min), 72 deg.C for 7min, 4 deg.C for 5 min. Detecting by 2% agarose gel electrophoresis, cutting PCR product gel under an ultraviolet gel cutting instrument, and purifying. All PCR products were sequenced with forward and reverse primers, respectively, and the sequencing results were further analyzed, using NCBI online comparison tool BLAST (http:// BLAST. NCBI. nlm. nih. gov /), to exclude false positive results, and to screen out mutation sites co-segregating in families. Wherein the primer sequence for detecting the chr 1956552373-56552493 nucleotide site is SEQ ID N0.31.
The experimental results are as follows:
1. family clinical data:
after the clinical specialist in reproductive medicine has performed detailed and comprehensive clinical examination on proband (IV-2) in the family, the clinical diagnosis of primary infertility is made for the patient, and IVF treatment is recommended, and the specific ovum obtaining condition and embryo culture condition are shown in Table 1 and FIG. 2.
TABLE 1 endocrine examination results and ovum acquisition situation of patients with early embryo developmental disorder (patient IV-2)
2. The genetic detection result of the family is as follows:
after whole exome sequencing and bioinformatics analysis of family members, we found that this patient carried the suspected homozygous deletion mutation NLRP5c.2945delt p.l982fs. The sequencing results are shown in FIG. 3. The homozygous deletion mutation NLRP5c.2945delT p.L982fs existing in the patient is positioned on chromosome 19, and the physical position is that 1 base T is deleted at chr 19-56552446; at the RNA level, 2945 th base of the NLRP5 gene coding RNA is deleted; protein level: the NLRP5 gene encodes protein with 982 th amino acid frame shift mutation and advanced stop codon, and the relevant deletion mutation of the gene is never found in primary infertility patients.
According to the screening process designed by us, by means of the gene chip and deep sequencing technology designed by us, we successfully prove that the detected NLRP5c.2945delT p.L982fs is a new pathogenic site of the disease.
Example 2:
biological analyses were carried out on the pathogenic mutation NLRP5c.2945delt p.l982fs detected in example 1.
The experimental results are as follows:
1. conservation analysis:
conservative evaluation and prediction are carried out on the screened NLRP5 p.L982fs mutation in a plurality of species of human, mice, orangutan, sheep, dogs and horses by using an NCBI homoGene database (http:// www.ncbi.nlm.nih.gov/HomoloGene), the site is found to be conserved in the evolutionary process, and the mutation of the site is laterally proved to possibly cause more serious pathological diseases, and the result is shown in figure 3.
2. Protein crystal structure change study:
the structures of NLRP5 wild-type protein and mutant protein carrying c.2945delT p.Leu982fs are respectively predicted by adopting SWISS MODEL (http:// swissmodel.expasy.org /) prediction software, and the change of the protein structure caused by mutation is evaluated. The prediction result of the protein crystal structure shows that c.2945delT p.L982fs mutation exposes a hydrophobic region of an LRR unit in which 982 is positioned, the folding of the protein is influenced or aggregation is caused, the LRR has the function of recognizing ligand, and the early termination mutation of leu982 can destroy the recognition of the ligand, and the result is shown in figure 4.
3. Mouse experimental results:
we have several aspects from the following: comparison NLRP5WTAnd NLRP5MUDissimilarity of mRNA injected group mouse embryos:
1) mouse embryo development rate: we performed NLRP5 on mouse zygotesWTAnd NLRP5MUIn vitro culture after mRNA injection compares embryo development rate, and NLRP5 is foundWTAnd NLRP5MUThe mouse embryos of the mRNA injection group do not have obvious difference from the development stage to the 2 cell stage, but the subsequent 4 cell, morula and blastocyst stage NLRP5MUThe mRNA injection group showed a significant decrease in the development rate, and many embryo blocks were not developing further at the 2-cell stage, which is similar to the results of the patient's embryo development as shown in FIG. 5.
2) Alteration of NLRP5 protein localization in mouse embryos: we compared NLRP5WTAnd NLRP5MUAfter mRNA injectionLocalization in mouse 2-cell embryos, NLRP5 was foundMUAfter mRNA injection, the mutated NLRP5 protein no longer specifically localized to the subcortical space, but appeared to be cytoplasmic in distribution, a result that is also consistent with the context of the structural prediction part, see figure 6.
4. Cell experiment results:
effect of mutations on binding of NLRP5 to other SCMC members:
we used MYC-NLRP5 separatelyWTOr MYC-NLRP5MU293T cell lines were transfected with V5-KHDC3L, FLAG-TLE6, and HA-OOEP high expression plasmids, and the mutations were found by co-immunoprecipitation experiments to attenuate the interaction of NLRP5 with SCMC components TLE6, OOEP, KHDC3L (FIG. 8). This result suggests that NLRP5 mutation may lead to SCMC formation failure.
In conclusion, we confirmed the specific damage of NLRP5c.2945delT p.L982fs mutation to embryonic development from multiple aspects through experimental conclusion of genetics and animal experiments, and clarified the association with primary infertility. The NLRP5 mutation site or the NLRP5 protein can be applied to preparation of a primary infertility detection reagent or detection equipment.
Sequence listing
<110> Nanjing university of medical science
<120> a pathogenic mutation of hereditary primary infertility and detection reagent thereof
<160> 39
<170> SIPOSequenceListing 1.0
<210> 1
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
gttcagttac tttccatggc gatgttatca tgaaggttgc aggaggactt gaacttggag 60
ctgctgctct gctctcagca tcaccacgtg cgtcgacagc ctcttagagt tacctatgag 120
<210> 2
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
tgatgttagt tgtatctact tgagaatttg ctgcaagatc ctcttttaag tcttgtcact 60
ctttccacag gtcctacttg ctctatatta ccaaagaatc cacttttccc ccaaaacctg 120
<210> 3
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
agctctcagc cttgtatcaa gatggaagga gacaaatcgc tcaccttttc cagctacggg 60
ctgcaatggt gtctctatga gctagacaag gaagaatttc agacattcaa ggaattacta 120
<210> 4
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
aagaagaaat cttcagaatc gaccacatgc tctattccac agtttgaaat cgagaatgcc 60
aacgtggaat gtctggcact cctcttgcat gagtattatg gagcatcgct ggcctgggct 120
<210> 5
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
acgtccatta gcatctttga aaacatgaac ctgcgaaccc tctcggagaa ggcacgggat 60
gacatgaaaa gtaagcgaga cttgggacaa gtctagggca gggaggggag gtgatttcag 120
<210> 6
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
cagtctccct ttttctttgt cttccaggac attcaccaga agatcctgaa gcaacgatga 60
ctgaccaagg accaagcaag gaaaaagtgc caggttagag gggtggagtt gggggaaata 120
<210> 7
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
ttgtgtttat tcttctccct tctcttttgc aggaatttca caagctgtgc aacaagatag 60
tgccacagct gcagagacaa aagaacaagg tgaatgaaat agatctattc atttgttgcc 120
<210> 8
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
caagtatgtt ggaattcatt cttcttttgc agaaatttca caagctatgg aacaagaagg 60
tgccacagca gcagagacag aagaacaagg tgaggaaaat agatgtattc cttggttgcc 120
<210> 9
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
acacaagtat gttggaattc attcttttgc agaaatttca caagctatgg aacaagaagg 60
tgccacagca gcagagacag aagaacaagg tgaggaaaat agatgtattc cttggttgcc 120
<210> 10
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
tggtctgttt cctgattttc attctaccct ctctgactcc aggacatgga ggtgacacat 60
gggactacaa gagtcacgtg atgaccaaat tcgctgagga ggaggatgta cgtcgtagtt 120
<210> 11
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
acagccactg tgaggagtac tacaccttct tccacctcag tctccaggac ttctgtgccg 60
ccttgtacta cgtgttagag ggcctggaaa tcgagccagc tctctgccct ctgtacgttg 120
<210> 12
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
agaagacaaa gaggtccatg gagcttaaac aggcaggctt ccatatccac tcgctttgga 60
tgaagcgttt cttgtttggc ctcgtgagcg aagacgtaag gaggccactg gaggtcctgc 120
<210> 13
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
tgggctgtcc cgttcccctg ggggtgaagc agaagcttct gcactgggtc tctctgttgg 60
gtcagcagcc taatgccacc accccaggag acaccctgga cgccttccac tgtcttttcg 120
<210> 14
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
agactcaaga caaagagttt gttcgcttgg cattaaacag cttccaagaa gtgtggcttc 60
cgattaacca gaacctggac ttgatagcat cttccttctg cctccagcac tgtccgtatt 120
<210> 15
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 15
tgcggaaaat tcgggtggat gtcaaaggga tcttcccaag agatgagtcc gctgaggcat 60
gtcctgtggt ccctctatgg tgagtacccc aggcagtttt atcctatgcc gtgtgctgag 120
<210> 16
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 16
ttgaaaacac tgctgctgac tggccggaaa tgcaaacgtt ggctggtgct tttgattcag 60
accggtgggg cttccggcct cgcacggtgg ttctgcacgg aaagtcagga attgggaaat 120
<210> 17
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 17
cggctctagc cagaaggatc gtgctgtgct gggcgcaagg tggactctac cagggaatgt 60
tctcctacgt cttcttcctc cccgttagag agatgcagcg gaagaaggag agcagtgtca 120
<210> 18
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 18
cagagttcat ctccagggag tggccagact cccaggctcc ggtgacggag atcatgtccc 60
gaccagaaag gctgttgttc atcattgacg gtttcgatga cctgggctct gtcctcaaca 120
<210> 19
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 19
atgacacaaa gctctgcaaa gactgggctg agaagcagcc tccgttcacc ctcatacgca 60
gtctgctgag gaaggtcctg ctccctgagt ccttcctgat cgtcaccgtc agagacgtgg 120
<210> 20
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 20
gcacagagaa gctcaagtca gaggtcgtgt ctccccgtta cctgttagtt agaggaatct 60
ccggggaaca aagaatccac ttgctccttg agcgcgggat tggtgagcat cagaagacac 120
<210> 21
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 21
aagggttgcg tgcgatcatg aacaaccgtg agctgctcga ccagtgccag gtgcccgccg 60
tgggctctct catctgcgtg gccctgcagc tgcaggacgt ggtgggggag agcgtcgccc 120
<210> 22
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 22
ccttcaacca aacgctcaca ggcctgcacg ccgcttttgt gtttcatcag ctcacccctc 60
gaggcgtggt ccggcgctgt ctcaatctgg aggaaagagt tgtcctgaag cgcttctgcc 120
<210> 23
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 23
gtatggctgt ggagggagtg tggaatagga agtcagtgtt tgacggtgac gacctcatgg 60
ttcaaggact cggggagtct gagctccgtg ctctgtttca catgaacatc cttctcccag 120
<210> 24
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 24
ccatgagccc atgtttctat cccccctgac ataggatgcg ggataagacc ctcattgagg 60
agcagtggga agatttctgc tccatgcttg gcacccaccc acacctgcgg cagctggacc 120
<210> 25
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 25
tgggcagcag catcctgaca gagcgggcca tgaagaccct gtgtgccaag ctgaggcatc 60
ccacctgcaa gatacagacc ctgatgtaag gctgcccgcc ccctacgaga gaatcccttc 120
<210> 26
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 26
accagtgtcg aatgtgtctc cccttcccca ttgcaggttt agaaatgcac agattacccc 60
tggtgtgcag cacctctgga gaatcgtcat ggccaaccgt aacctaagat ccctcaactt 120
<210> 27
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 27
gggaggcacc cacctgaagg aagaggatgt aaggatggcg tgtgaagcct taaaacaccc 60
aaaatgtttg ttggagtctt tgaggtacgt ctctggtaga gcttttgcct tgtttttctt 120
<210> 28
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 28
gagctgggat tgcttcatgc tgcctgtctc tgcaggctgg attgctgtgg attgacccat 60
gcctgttacc tgaagatctc ccaaatcctt acgacctccc ccagcctgaa atctctgagc 120
<210> 29
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 29
ctggcaggaa acaaggtgac agaccaggga gtaatgcctc tcagtgatgc cttgagagtc 60
tcccagtgcg ccctgcagaa gctgatgtga gtgccacttc ctttccacca ggattatcgt 120
<210> 30
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 30
tcatcatgtc ctctctgggg ctctcttctt gcagactgga ggactgtggc atcacagcca 60
cgggttgcca gagtctggcc tcagccctcg tcagcaaccg gagcttgaca cacctgtgcc 120
<210> 31
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 31
tatccaacaa cagcctgggg aacgaaggtg taaatctact gtgtcgatcc atgaggcttc 60
cccactgtag tctgcagagg ctgatgtgag tctggcttgc tcccctgcaa ggacttccta 120
<210> 32
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 32
tcacgatctt ctttcccatg tgtgtgcccc acaggctgaa tcagtgccac ctggacacgg 60
ctggctgtgg ttttcttgca cttgcgctta tgggtaactc atggctgacg cacctgagcc 120
<210> 33
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 33
ttagcatgaa ccctgtggaa gacaatggcg tgaagcttct gtgcgaggtc atgagagaac 60
catcttgtca tctccaggac ctggagtgag tttcccatgg gcgttgggtc aactctatca 120
<210> 34
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 34
gaggcagact ctctctattc cccgcctctt gcaggttggt aaagtgtcat ctcaccgccg 60
cgtgctgtga gagtctgtcc tgtgtgatct cgaggagcag acacctgaag agcctggatc 120
<210> 35
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 35
tcacggacaa tgccctgggt gacggtgggg ttgctgcgct gtgcgaggga ctgaagcaaa 60
agaacagtgt tctggcgaga ctcgggtaac ttcctggggc gcctctttgc gggccgggct 120
<210> 36
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 36
ttggggttat tttctgggtg tcctgaccct gcaggttgaa ggcatgtgga ctgacttctg 60
attgctgtga ggcactctcc ttggcccttt cctgcaaccg gcatctgacc agtctaaacc 120
<210> 37
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 37
tggtgcagaa taacttcagt cccaaaggaa tgatgaagct gtgttcggcc tttgcctgtc 60
ccacgtctaa cttacagata attgggtaag tcgccagcaa ttgtcttctg agatacagac 120
<210> 38
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 38
ggagaagagt cgaaagcaac atctcagtaa cgagtcctct ctctgcctcc ccaggctgtg 60
gaaatggcag taccctgtgc aaataaggaa gctgctggag gaagtgcagc tactcaagcc 120
<210> 39
<211> 120
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 39
ccgagtcgta attgacggta gttggcattc ttttgatgaa gatgaccggt actggtggaa 60
aaactgaaga tacggaaacc tgccccactc acacccatct gatggaggaa ctttaaacgc 120