CN116356065A - Molecular marker for breeding and identifying peanut protein and fat content and application thereof - Google Patents
Molecular marker for breeding and identifying peanut protein and fat content and application thereof Download PDFInfo
- Publication number
- CN116356065A CN116356065A CN202310183234.3A CN202310183234A CN116356065A CN 116356065 A CN116356065 A CN 116356065A CN 202310183234 A CN202310183234 A CN 202310183234A CN 116356065 A CN116356065 A CN 116356065A
- Authority
- CN
- China
- Prior art keywords
- peanut
- protein
- molecular marker
- arahy
- snp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 235000020232 peanut Nutrition 0.000 title claims abstract description 57
- 235000010777 Arachis hypogaea Nutrition 0.000 title claims abstract description 55
- 235000017060 Arachis glabrata Nutrition 0.000 title claims abstract description 54
- 235000018262 Arachis monticola Nutrition 0.000 title claims abstract description 54
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 23
- 239000003147 molecular marker Substances 0.000 title claims abstract description 20
- 238000009395 breeding Methods 0.000 title claims abstract description 17
- 230000001488 breeding effect Effects 0.000 title claims abstract description 17
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 17
- 241001553178 Arachis glabrata Species 0.000 title claims abstract 11
- 239000000463 material Substances 0.000 claims abstract description 50
- 210000000349 chromosome Anatomy 0.000 claims abstract description 25
- 235000004213 low-fat Nutrition 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 5
- 238000001816 cooling Methods 0.000 claims description 3
- 239000003153 chemical reaction reagent Substances 0.000 claims description 2
- 238000012098 association analyses Methods 0.000 abstract description 13
- 244000105624 Arachis hypogaea Species 0.000 description 47
- 238000012360 testing method Methods 0.000 description 22
- 238000012163 sequencing technique Methods 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 108020004414 DNA Proteins 0.000 description 5
- 230000002068 genetic effect Effects 0.000 description 5
- 238000003205 genotyping method Methods 0.000 description 5
- 239000003921 oil Substances 0.000 description 5
- 235000019198 oils Nutrition 0.000 description 5
- 238000011160 research Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 241000196324 Embryophyta Species 0.000 description 3
- 101000988395 Homo sapiens PDZ and LIM domain protein 4 Proteins 0.000 description 3
- 238000007844 allele-specific PCR Methods 0.000 description 3
- 230000002860 competitive effect Effects 0.000 description 3
- 238000010219 correlation analysis Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000003908 quality control method Methods 0.000 description 3
- 108700028369 Alleles Proteins 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 102000054766 genetic haplotypes Human genes 0.000 description 2
- 238000003306 harvesting Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 241000726221 Gemma Species 0.000 description 1
- 101001018064 Homo sapiens Lysosomal-trafficking regulator Proteins 0.000 description 1
- 102100033472 Lysosomal-trafficking regulator Human genes 0.000 description 1
- 244000038561 Modiola caroliniana Species 0.000 description 1
- 235000010703 Modiola caroliniana Nutrition 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 230000009418 agronomic effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000035558 fertility Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000002373 gas-phase electrophoretic mobility molecular analysis Methods 0.000 description 1
- 238000012214 genetic breeding Methods 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000003973 irrigation Methods 0.000 description 1
- 230000002262 irrigation Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000007671 third-generation sequencing Methods 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 235000015112 vegetable and seed oil Nutrition 0.000 description 1
- 239000008158 vegetable oil Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/13—Plant traits
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A50/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
- Y02A50/30—Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Botany (AREA)
- Mycology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a molecular marker for breeding and identifying peanut protein and fat content and application thereof. The whole genome association analysis is utilized to find an important SNP locus related to quality traits, the molecular marker is SNP locus Arahy.08_49538603, the SNP locus is positioned on peanut chromosome 8, and the sequence of the SNP locus is shown as SEQ ID NO. 1. The depth and the breadth of genotype data in association analysis are superior to those of the former, the number of call SNP is the largest, and abundant and high-quality SNP provides guarantee for the accuracy of association analysis sites. The SNP locus Arahy.08_49538603 marked by the invention can be directly used for identifying peanut offspring materials, the genotype of which is AA is a high-protein and low-fat material, and the genotype of which is CC is a low-protein and high-fat material.
Description
Technical Field
The invention relates to a molecular marker for breeding and identifying peanut protein and fat content and application thereof, belonging to the field of plant genetic breeding.
Background
Peanut (Arachis hypogaea L.) is used as important oil crops and economic crops in China, and the average total yield in recent years breaks through 1800 ten thousand tons and keeps a continuous growth situation. With the improvement of the living standard and the upgrade of the consumption level of people, the high-quality vegetable oil is more and more favored by consumers. In order to further improve the international competitiveness of peanuts in China and meet the increasing consumption demands of people, and the cultivation of high-quality peanut varieties becomes a main goal of peanut quality breeding.
The whole genome association analysis is a method for detecting genetic loci and allelic variation thereof in natural populations based on a Linkage Disequilibrium (LD) method, and analyzing the genetic effect of the genetic loci and allelic variation by correlating the allelic variation with target traits, and is first used for plant genetic research in 2001. The significance site for controlling the quality of the peanut can be effectively obtained by utilizing the whole genome association analysis, the quality character key site is discovered, and technical support is provided for the selection and breeding of new peanut varieties with different types of protein content and different types of fat content.
Protein content and fat content determine the quality of peanut varieties, which are quantitative trait loci, peanut is an heterotetraploid crop (AABB), and research on quality traits is mainly focused on A05, A07, A08, A09, B01, B04 and B09 chromosomes. 7 QTLs related to protein content were found by Sarvamangal et al using RILs populations of 146 families, 1.5% -10.70% of the phenotypic variation could be explained, and 78 fat QTL sites were detected by Pandey et al using 2 RIL populations. Sun et al explored that qA05.1 had a significant effect on fat and protein using 318 RIL populations. The results of positioning of different group materials are different, the positioning research of peanut quality-related characters is widely carried out by utilizing different group materials, it is important to discover more main effective sites for controlling the quality characters, the molecular markers of the quality-related SNP are developed, the high-yield and high-quality breeding theory of the peanuts is enriched, and theoretical and technical support is provided for efficient breeding of new high-quality peanut varieties in the future.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a molecular marker for breeding and identifying peanut protein and fat content and application thereof. The invention utilizes whole genome association analysis to obtain an important SNP locus Arahy.08_49538603 for controlling protein content and fat content, develops a molecular marker, can be directly used for molecular identification of peanut quality breeding materials, and improves breeding efficiency.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
1. planting natural peanut population (more than 100 parts) under the condition of multiple years, inspecting the protein content and fat content characters, removing error values and abnormal values, correcting fertility differences by using a control variety, and correcting the protein content and fat content characters by using a mixed linear model to calculate BLUP values (optimal linear unbiased predicted values).
2. Peanut cultivar selection 016 was de novo sequenced, assembled, and each material in the population was re-sequenced for a second generation (10×deep) to select 016 reference genomes for polymorphic variation locus detection. The quality control standard is as follows: the deletion rate Miss < = 0.2 of the SNP locus in the sample, and the minor allele frequency Maf > = 0.05.
3. And carrying out whole genome association analysis by combining the phenotype data and the genotype data, and exploring significance association sites related to peanut protein content and fat content traits.
4. Summary statistical analysis, phenotypic variation analysis, and linkage Block analysis were performed on the significant SNPs, locking the SNP site arahy.08_49538603.
5. Extracting genotypes of all materials in the population at the locus, and carrying out box-line pattern analysis on the salient locus, wherein the genotypes of the high-protein and low-fat materials are AA, and the genotypes of the low-protein and high-fat materials are CC.
6. Genotyping validation was performed using the KASP (competitive allele-specific PCR) technique, developing molecular markers that control protein content and fat content.
7. And molecular markers are used for identifying the breeding materials, so that the high-protein low-fat and high-fat low-protein peanut materials are rapidly screened, and the high-quality peanut breeding efficiency is improved.
The invention provides a molecular marker for breeding and identifying peanut protein and fat content, which is SNP locus Arahy.08_49538603 and is positioned on peanut chromosome 8; the 200bp sequences before and after the SNP locus Arahy.08_49538603 are shown as SEQ ID NO. 1.
The primer group is as follows:
primer_X:GAAGGTGACCAAGTTCATGCTTCTTCTCTGATTCCTCATTGAAAATGTT;
primer_Y:GAAGGTCGGAGTCAACGGATTCTTCTCTGATTCCTCATTGAAAATGTG;
primer_C:CCCTAATAGATAAAATCAGCTAAATATTTAAGTATTC。
detection reagents or kits for the KASP primer set.
The method for identifying peanut protein and fat content by using the molecular marked KASP primer group comprises the following steps:
(1) Extracting DNA of peanut materials to be identified, and carrying out PCR identification by using a KASP primer group of molecular markers;
(2) If the genotype of the molecular marker Arahy.08_49538603 site is AA, the peanut material to be identified is a high-protein and low-fat material; if the genotype of the molecular marker Arahy.08_49538603 site is CC, the peanut material to be identified is a low-protein and high-fat material.
The PCR reaction program is as follows: a) 94 ℃ for 15min; b) Cooling at 94 ℃,20 s,61 ℃ and 60s at a speed of 0.6 ℃/cycle for 10 times; c) 94 ℃,20 s,55 ℃, 60s,26 cycles; d) 94 ℃,20 s,57 ℃, 60s,3 times of circulation.
The molecular marker is applied to the identification of peanut protein and fat content breeding.
The invention has the beneficial effects that:
1. the invention discovers an important SNP locus related to quality traits by utilizing whole genome association analysis, the depth and the breadth of genotype data in association analysis exceed those of the former, the number of call SNPs is maximum, up to 631,988, and abundant and high-quality SNPs provide guarantee for the accuracy of the association analysis locus.
2. The 199 parts of the material are all derived materials of peanut cultivar selection 016, the selection 016 is subjected to de novo sequencing and assembly, and the correlation analysis is carried out by taking the selection 016 as a reference genome, so that the yield trait locus is easier to obtain.
3. The invention verifies the significant SNP loci with higher P value (P value) and PVE (phenotypic variation interpretation rate) more than 8%, and discovers the unique excellent marker loci in the research materials. On the one hand, genotyping of extreme phenotypic material was used to explore the genotype distribution (box plot), and on the other hand, 1 pair Kasp (competitive allele-specific PCR) primers were designed for significant sites with significant genotyping, with 199 materials in this study for genotyping.
4. The SNP locus Arahy.08_49538603 marked by the invention can be directly used for identifying peanut offspring materials, the genotype of which is AA is a high-protein and low-fat material, and the genotype of which is CC is a low-protein and high-fat material.
Drawings
FIG. 1 shows a normal distribution diagram of protein content and fat content.
Wherein, the abscissa PC is protein content, and OC is fat content; the ordinate is the frequency of the phenotype values. E1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point.
FIG. 2 Density distribution of SNPs on peanut chromosomes.
Wherein, the distribution of SNP sites in the 1M window is shown; chr1-20 is 20 peanut chromosomes; the right color level is the density of SNPs on the chromosome.
FIG. 3 Manhattan and QQ plot of protein content in 4 environments.
Wherein, the left side is Manhattan diagram, and PC is protein content; chromosome is Chromosome 1-20; the horizontal dashed line is the significance threshold; e1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point. The right side is a QQplot, the abscissa represents theoretical P values, and the ordinate represents actual P values.
FIG. 4 Manhattan and QQ plot of fat content in 4 environments.
Wherein, the left side is Manhattan diagram, and OC is fat content; chromosome is Chromosome 1-20; the horizontal dashed line is the significance threshold; e1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point. The right side is a QQplot, the abscissa represents theoretical P values, and the ordinate represents actual P values.
Fig. 5. Block linkage diagram of arahy.08_4956803.
Wherein, the blue horizontal bar is the 218.56kb region of chromosome 8, the upper green bar is SNP, the mauve dot is Arahy.08_49538603 locus, it and other 17 SNPs form 1 larger block, the SNP in the block is closely linked inheritance.
FIG. 6. Phenotype differences between the two base types at Arahy.08_4956803.
Wherein PC is protein content; OC is fat content; the abscissa CC/AA is the different genotypes at arahy.08_ 49538603; the ordinate is the phenotypic observations; e1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point.
FIG. 7 KASP verification of SNP typing at Arahy.08_4956803.
Wherein, the upper left circular signal is CC genotype; the lower right circular signal is AA genotype; the remainder were blank and samples with no signal detected.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to examples.
The information of 199 parts of peanut materials involved in this experiment is shown in Table 1 below, wherein the peanut varieties or lines from Kaifeng are bred by Kaifeng institute of agriculture and forestry, jihua series peanut varieties from Shijizhuang are provided by Hebei national institute of agriculture and forestry, zhonghua series peanut varieties from Wuhan are provided by oil crop institute of China, and K198 (AT 1-1) is introduced from George Asia in the United states.
Table 1 199 parts peanut material information
Example 1 phenotypic data processing
1. Design of field test
199 parts of the material were unsealed in 2019 (E1) and in 2019 Xinyang, respectively(E2) The test fields unsealed in 2020 (E3) and 2021 (E4) were planted. The 4 groups of test environments all adopt random group arrangement test design, and the area of each material planting cell is 13.34m 2 (6.67 m.times.2m), hole spacing 20cm, row spacing 40cm,3 replicates. Tian Feili, the water drainage irrigation is convenient, the topography is flat, and the sandy loam is suitable. During peanut growth, field management and harvesting are performed in time.
2. Agronomic trait investigation and quality determination
After harvesting and sun-drying, quality detection was performed by using a german near infrared analyzer Perton DA7250, and the properties of 199 parts of material were examined for Protein Content (PC) and fat content (oil content, OC).
3. Phenotype data processing
The phenotype data (PC, OC) was sorted, calculated using Microsoft Excel 2010, error values and outliers were deleted, ensuring that the phenotype data fit into a normal distribution (fig. 1). The blup values (best linear unbiased prediction ) for each trait were calculated as 3 replicates per environment using a mixed linear model of the Genstat 18th Edition software.
Example 2 genotype data processing
Genomic DNA was extracted from young leaves at seedling stage using a plant genomic DNA kit. And (3) evaluating the integrity and quality of the DNA by agarose gel electrophoresis and NanoDrop, and ensuring that the requirements of genome sequencing and database construction quality are met.
1. Sequencing and Assembly of reference genome open 016
The sequencing assembly method is as follows:
1. the third generation technology: three generations of sequencing were performed using the pacbriosequenci II platform, requiring a sequencing depth of no less than 100×.
2. Second generation Illumina data: second generation sequencing is performed by utilizing an Illumina nova-seq PE150 platform, wherein the sequencing depth is required to be not less than 100×, Q20 is not less than 90%, and Q30 is not less than 85%.
3. Hi-C data: according to species information, four-base enzyme or six-base enzyme is selected to construct a Hi-C library, and the sequencing depth is required to be not less than 100X, Q20 is not less than 85%, and Q30 is not less than 80%.
Sequencing assembly results were as follows:
1. the 016 third generation sequencing was 297.92G with a depth of 109.77X combined with the second generation sequencing together with 549.80G sequencing data.
2. The survey analysis was performed using kmer17 software: the genome size was 2,703.87mbp, the corrected 2,686.33mbp, the heterozygosity ratio was 0.13%, and the repeat sequence ratio was 84.15%.
3. Sequencing of peanut genome denovo assembly was performed with the following results: the total length 2.53Gbp,contig N50 of the contigs reaches 11.48Mbp; the overall length 2.53Gbp,scaffold N50 of the scaffold reaches 11.48Mbp.
4. Chromosome is mounted by using Hi-C data to obtain a chromosome level genome.
5. The assembly quality was assessed for consistency, sequence integrity, EST sequence, RNA sequence, CEGMA and BUSCO.
The comparison rate of all small fragment reads to the genome is about 99.65%, the coverage rate is about 99.80%, and the reads and the genome obtained by assembly are proved to have good consistency; the 1614 orthologous single copy genes assemble 99.2% of complete single copy genes, which indicates that the assembly result is complete; 248 CEGs (Core Eukaryotic Genes) assemble 241 genes with the proportion of 97.18%, which indicates that the assembling result is complete;
2. sequencing of 199 parts peanut Material
Re-sequencing 199 parts of the material with depth of 10 x by adopting an Illumina second-generation sequencing platform, and performing quality control on the sequencing data, wherein high-quality SNP is reserved, and the quality control standard is as follows: the deletion rate Miss < = 0.2 of the SNP locus in the sample, and the minor allele frequency Maf > = 0.05. The peanut cultivar selection 016 is used as a reference genome to carry out call SNP, 631,988 SNPs are obtained in total, and the high-quality SNP loci are the most loci in the peanut association analysis at present, which is closely related to the genetic diversity among 199 materials.
As can be seen from FIG. 2, there are 48,821 SNP sites on chromosome 3 at most, followed by chromosome 11 (43, 292 SNPs); the minimum number of SNPs on chromosome 8 was 13,143, followed by chromosome 10 (13,848 SNPs), and the average density of SNPs on the chromosome was 251.71/M.
EXAMPLE 3 Whole genome correlation analysis
1. Significance site detection
Whole genome correlation analysis was performed using GEMMA 0.94.1 version (Whole genome efficient hybrid model correlation) software package, formula y=Xα+Sβ+Kμ+e. Where y corresponds to the phenotype (phenotype data obtained in example 1), X corresponds to the genotype (genotype data obtained in example 2), S corresponds to the fixed factor intercept in the model, and K is the affinity matrix calculated from the SNPs. Xα and sβ represent the fixed effect and kμ and e represent the random effect. We set the threshold for whole genome association analysis to-lg (0.05/631988) =7.10 using Bonferroni test, resulting in manhattan and QQplot for PC and OC (fig. 3-4). From the figure, in the 4-group environment, the protein content and fat content traits each detected a distinct signal site on chromosome 8, indicating that the site (gene) controlling the quality trait was located on chromosome 8.
SNP site summary statistics and phenotypic variation interpretation rate analysis
Summarizing the significant SNP loci in the association analysis results (Table 2), respectively detecting 44, 63, 38 and 24 SNP loci in 4 environments by 2 quality traits, mainly focusing on chromosome 8, and identifying 154 non-redundant association loci and 13 repeatability loci in total.
TABLE 2 quantity of significant SNP in four environments for yield traits
Statistical analysis was performed on the loci of each trait (table 3), with 10 SNP loci contributing to 2 traits of protein and fat, with quality trait loci concentrated predominantly on chromosome 8. These sites were analyzed for their phenotypic variation interpretation using the R language, with arahy.084956803 having a maximum of 14.06% phenotypic variation interpretation under different circumstances.
TABLE 3 SNP detected in relation to multiple yield traits
| Chromosome of the human body | Position of | Reference site | Mutation site | Property (number of environments where signal is detected) |
| 8 | 38378278 | C | T | PC(1),OC(1) |
| 8 | 44879304 | C | T | PC(1),OC(1) |
| 8 | 48994448 | T | C | PC(1),OC(1) |
| 8 | 49296302 | A | G | PC(1),OC(1) |
| 8 | 49338631 | A | G | PC(1),OC(1) |
| 8 | 49385915 | T | C | PC(1),OC(1) |
| 8 | 49538603 | A | C | PC(2),OC(1) |
| 8 | 49587942 | T | C | PC(1),OC(1) |
| 16 | 104322766 | G | T | PC(1),OC(1) |
| 16 | 107768318 | G | T | PC(1),OC(1) |
SNP locus Block analysis
The LD haplotype block map analysis was performed using ldblockshow1.40 software for each 115kb region (population material half-life 115 kb) upstream and downstream of arahy.08_49538603, looking for blocks. The results show that there are a large and small 2 blocks (black triangles) in the 218.56kb region of chromosome 8, wherein Arahy.08_49538603 (purple dots) are in the left larger block, the blocks contain 18 SNPs in total, arahy.08_49538603 and 17 SNPs nearby are in a highly linkage disequilibrium state, haplotypes are formed (figure 5), false positives of significant sites are eliminated, and reliability is high.
The sequence of 200bp before and after each SNP locus Arahy.08_49538603 is shown as SEQ ID NO. 1.
CTGATTGAAACCTGTTTCTTACTCAATCAAGTCATCAAATTAGAATTCATGTAGACACACTAACCACAAGTGAATCGTTTGTCATCAAGATCAGATGTCCAATCATAATCTGAAAAGGCAAACAAGTGAAAATCAGTGCTAGAATGAAAAATTAACCCCCTAATAGATAAAATCAGCTAAATATTTAAGTATTCTCTTCAACATTTTCAATGAGGAATCAGAGAAGAGTGCATAAATTAGTTGACTTTATTAACAGATTATGCAATTTCAGGTCTCATAAAAGTGACATATTGTAAAGATCCTATAATTGATCTATAAGAGTTCTTAAACAACTCAACCTTTTGACTTGTCAACTTAGTGGTGGAGACCATTGGTGTTGAAACACATTTCGAATGAGCCA (SEQ ID NO.1, underlined is the position of this site).
Example 4 association site validation
1. Case diagram verification
And carrying out box diagram verification on salient sites with higher P value and PVE of more than 8% by using box plot packages in the R language. Protein content is more than or equal to 25%, protein content is less than or equal to 23.6%, oil content is more than or equal to 50%, oil content is less than or equal to 49%, and 40 extreme phenotypic characters are used for making a box diagram. Genotypes corresponding to different protein content/fat content indexes (ordinate form values) are different, the high protein genotype is AA, and the low protein genotype is CC at a position point Arahy.08_ 49538603; the high fat genotype was CC, the low fat genotype was AA, and the two traits of protein content and fat content were inversely related (fig. 6), consistent with previous study results. Therefore, when the gene type of the Arahy.08_49538603 locus in the peanut material is AA, the peanut material is a high-protein low-fat material; when the genotype is CC, the material is a low-protein high-fat material.
2. Genotype verification and molecular marker development
We extracted the sequence of 100bp before and after Arahy.08_49538603 locus, designed KASP markers by KASP (competitive allele specific PCR) technique, amplified, sequenced, detected in 199 material populations. The results show that the AA genotype materials are gathered together, the CC genotype materials are gathered together, arahy.08_49538603 has unique genotyping (figure 7), so that the locus is further verified to be an SNP locus for controlling the quality traits of peanuts, and the designed molecular marker can be directly used for the quality identification of the peanut materials. The DNA sequence of the peanut sample was amplified and sequenced using the following KASP primers, and the genotype of the Arahy.08_49538603 locus was AA, the high protein and low fat material, and the genotype was CC, the low protein and high fat material.
Wherein, KASP primer sequence of molecular marker arahy.08_49538603 is utilized:
primer_X:GAAGGTGACCAAGTTCATGCTTCTTCTCTGATTCCTCATTGAAAATGTT(SEQ ID NO.2);
primer_Y:GAAGGTCGGAGTCAACGGATTCTTCTCTGATTCCTCATTGAAAATGTG(SEQ ID NO.3);
primer_C:CCCTAATAGATAAAATCAGCTAAATATTTAAGTATTC(SEQ ID NO.4)。
the PCR reaction procedure was: a) 94 ℃ for 15min; b) Cooling at 94 ℃,20 s,61 ℃ and 60s at a speed of 0.6 ℃/cycle for 10 times; c) 94 ℃,20 s,55 ℃, 60s,26 cycles; d) 94 ℃,20 s,57 ℃, 60s,3 times of circulation.
The reaction system: 5. Mu.L of DNA (20-80 ng/. Mu.L); 2 XKASP Master Mix 5. Mu.L; KASP primer mix (50. Mu. Mol/L) 0.14. Mu.L, ddH 2 O 3μL。
Wherein, 2×KASP Master Mix is a LGC (Laboratory of the Government Chemist) company universal kit, which is suitable for all KASP tests and operates according to the product specifications.
Claims (6)
1. A molecular marker for breeding and identifying peanut protein and fat content, wherein the molecular marker is SNP locus Arahy.08_49538603, and is positioned on peanut chromosome 8; the 200bp sequences before and after the SNP locus Arahy.08_49538603 are shown as SEQ ID NO. 1.
2. A set of KASP primers for use in detecting the molecular marker of claim 1, wherein said set of primers is: primer_x: GAAGGTGACCAAGTTCATGCTTCTTCTCTGATTCCTCATTGAAAATGTT;
primer_Y:GAAGGTCGGAGTCAACGGATTCTTCTCTGATTCCTCATTGAAAATGTG;
primer_C:CCCTAATAGATAAAATCAGCTAAATATTTAAGTATTC。
3. a detection reagent or kit comprising the KASP primer set of claim 2.
4. A method for identifying peanut protein and fat content using the molecularly labeled KASP primer set of claim 2, comprising the steps of:
(1) Extracting DNA of peanut materials to be identified, and carrying out PCR identification by using a KASP primer group of molecular markers;
(2) If the genotype of the molecular marker Arahy.08_49538603 site is AA, the peanut material to be identified is a high-protein and low-fat material; if the genotype of the molecular marker Arahy.08_49538603 site is CC, the peanut material to be identified is a low-protein and high-fat material.
5. The method of claim 4, wherein the PCR reaction procedure is:
a) 94 ℃ for 15min; b) Cooling at 94 ℃,20 s,61 ℃ and 60s at a speed of 0.6 ℃/cycle for 10 times; c) 94 ℃,20 s,55 ℃, 60s,26 cycles; d) 94 ℃,20 s,57 ℃, 60s,3 times of circulation.
6. Use of the molecular marker of claim 1 in breeding for identifying peanut protein and fat content.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310183234.3A CN116356065B (en) | 2023-03-01 | 2023-03-01 | Molecular marker for breeding and identifying peanut protein and fat content and application thereof |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310183234.3A CN116356065B (en) | 2023-03-01 | 2023-03-01 | Molecular marker for breeding and identifying peanut protein and fat content and application thereof |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116356065A true CN116356065A (en) | 2023-06-30 |
| CN116356065B CN116356065B (en) | 2023-10-13 |
Family
ID=86940607
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310183234.3A Active CN116356065B (en) | 2023-03-01 | 2023-03-01 | Molecular marker for breeding and identifying peanut protein and fat content and application thereof |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116356065B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118773362A (en) * | 2024-07-01 | 2024-10-15 | 辽宁省沙地治理与利用研究所 | A molecular marker related to peanut protein content and related applications |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090075829A1 (en) * | 2000-03-29 | 2009-03-19 | Bush David F | Plant polymorphic markers and uses thereof |
| CN112094937A (en) * | 2020-09-27 | 2020-12-18 | 中国农业科学院油料作物研究所 | SNP Molecular Markers Associated with Pod and Seed Size on Peanut A06 Chromosome and Its Application |
| CN112626260A (en) * | 2021-01-15 | 2021-04-09 | 中国农业科学院油料作物研究所 | Molecular marker linked with peanut kernel weight major QTL (quantitative trait locus) and application thereof |
| CN113897450A (en) * | 2021-10-08 | 2022-01-07 | 山东省花生研究所 | Linkage molecular marker, primer composition, identification method and application of peanut multi-kernel pod number major effect site |
-
2023
- 2023-03-01 CN CN202310183234.3A patent/CN116356065B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090075829A1 (en) * | 2000-03-29 | 2009-03-19 | Bush David F | Plant polymorphic markers and uses thereof |
| CN112094937A (en) * | 2020-09-27 | 2020-12-18 | 中国农业科学院油料作物研究所 | SNP Molecular Markers Associated with Pod and Seed Size on Peanut A06 Chromosome and Its Application |
| CN112626260A (en) * | 2021-01-15 | 2021-04-09 | 中国农业科学院油料作物研究所 | Molecular marker linked with peanut kernel weight major QTL (quantitative trait locus) and application thereof |
| CN113897450A (en) * | 2021-10-08 | 2022-01-07 | 山东省花生研究所 | Linkage molecular marker, primer composition, identification method and application of peanut multi-kernel pod number major effect site |
Non-Patent Citations (1)
| Title |
|---|
| 严玫等: "中国主要花生品种品质性状关联分析", 植物遗传资源学报, vol. 14, no. 6, pages 1064 - 1071 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118773362A (en) * | 2024-07-01 | 2024-10-15 | 辽宁省沙地治理与利用研究所 | A molecular marker related to peanut protein content and related applications |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116356065B (en) | 2023-10-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109735652B (en) | Wheat stripe rust resistant gene QYr.nwafu-6BL.2 linked KASP molecular marker, primer and application | |
| CN118755873B (en) | A molecular marker related to resistance to corn gray leaf spot and its application | |
| De Souza et al. | Linkage disequilibrium and population structure in wild and cultivated populations of rubber tree (Hevea brasiliensis) | |
| CN117887885B (en) | Soybean oil content-related major single nucleotide polymorphism site and application thereof | |
| CN116004898A (en) | Peanut 40K liquid-phase SNP chip PeannitGBTS 40K and application thereof | |
| CN119842971B (en) | KASP molecular markers, primer pairs, kits and their applications for identifying dwarf traits in melon | |
| CN116479164A (en) | SNP sites, molecular markers, amplification primers and their applications related to soybean hundred-grain weight and size | |
| CN116287393B (en) | SNP (Single nucleotide polymorphism) marker related to peanut yield traits and application thereof | |
| CN116356065B (en) | Molecular marker for breeding and identifying peanut protein and fat content and application thereof | |
| CN118600072A (en) | A KASP molecular marker primer for identifying rice grain length and its application | |
| CN117106965A (en) | Wheat spike length related molecular marker and application thereof | |
| CN116240307B (en) | Molecular marker for high-yield breeding identification of peanuts and application thereof | |
| CN116254364B (en) | SNP (Single nucleotide polymorphism) marker related to peanut fat content traits and application thereof | |
| CN104789648B (en) | Identify molecular labeling and its application of the section haplotypes of rice CMS restoring genes Rf 1 | |
| CN112760399B (en) | A major QTL locus controlling grain length in wheat and its tightly linked KASP primers and applications | |
| CN118360292A (en) | Gene and KASP (kaSP) mark related to melon seedling stem color and application thereof | |
| CN115948591B (en) | Identification of corn seedling drought tolerance related monomer ZmC10.HapDR and application thereof | |
| CN107447022B (en) | SNP molecular marker for predicting corn heterosis and application thereof | |
| CN109593871A (en) | Corn KASP molecular labeling for distinguishing Heterosis of Maize Hybrid group combines and its development approach and application | |
| CN118957132B (en) | Application of Zm00001d012005 gene in regulating starch content in corn kernels | |
| Takele et al. | Genetic Diversity and Population Structure of Sorghum mutant genotypes revealed through genetic Characterization | |
| CN111100946A (en) | Molecular marker primer of rape grain weight character major gene locus and application | |
| CN116622874B (en) | A molecular marker for low-temperature male sterility in Brassica napus and its primer pair and application | |
| CN117248061B (en) | InDel locus related to soybean seed oil content, molecular marker, primer and application thereof | |
| CN119753223B (en) | SNP molecular markers linked to the major QTL for wax gourd flesh thickness and their application |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |