CN116356065A

CN116356065A - Molecular marker for breeding and identifying peanut protein and fat content and application thereof

Info

Publication number: CN116356065A
Application number: CN202310183234.3A
Authority: CN
Inventors: 邓丽; 任丽; 苗建利; 王培云; 李阳; 殷君华; 郭敏杰; 芦振华; 李绍伟; 胡俊平; 谷建中; 姚潜; 申卫国; 蔡君玲; 李传强
Original assignee: Kaifeng Academy Of Agriculture And Forestry
Current assignee: Kaifeng Academy Of Agriculture And Forestry
Priority date: 2023-03-01
Filing date: 2023-03-01
Publication date: 2023-06-30
Anticipated expiration: 2043-03-01
Also published as: CN116356065B

Abstract

The invention relates to a molecular marker for breeding and identifying peanut protein and fat content and application thereof. The whole genome association analysis is utilized to find an important SNP locus related to quality traits, the molecular marker is SNP locus Arahy.08_49538603, the SNP locus is positioned on peanut chromosome 8, and the sequence of the SNP locus is shown as SEQ ID NO. 1. The depth and the breadth of genotype data in association analysis are superior to those of the former, the number of call SNP is the largest, and abundant and high-quality SNP provides guarantee for the accuracy of association analysis sites. The SNP locus Arahy.08_49538603 marked by the invention can be directly used for identifying peanut offspring materials, the genotype of which is AA is a high-protein and low-fat material, and the genotype of which is CC is a low-protein and high-fat material.

Description

Molecular marker for breeding and identifying peanut protein and fat content and application thereof

Technical Field

The invention relates to a molecular marker for breeding and identifying peanut protein and fat content and application thereof, belonging to the field of plant genetic breeding.

Background

Peanut (Arachis hypogaea L.) is used as important oil crops and economic crops in China, and the average total yield in recent years breaks through 1800 ten thousand tons and keeps a continuous growth situation. With the improvement of the living standard and the upgrade of the consumption level of people, the high-quality vegetable oil is more and more favored by consumers. In order to further improve the international competitiveness of peanuts in China and meet the increasing consumption demands of people, and the cultivation of high-quality peanut varieties becomes a main goal of peanut quality breeding.

The whole genome association analysis is a method for detecting genetic loci and allelic variation thereof in natural populations based on a Linkage Disequilibrium (LD) method, and analyzing the genetic effect of the genetic loci and allelic variation by correlating the allelic variation with target traits, and is first used for plant genetic research in 2001. The significance site for controlling the quality of the peanut can be effectively obtained by utilizing the whole genome association analysis, the quality character key site is discovered, and technical support is provided for the selection and breeding of new peanut varieties with different types of protein content and different types of fat content.

Protein content and fat content determine the quality of peanut varieties, which are quantitative trait loci, peanut is an heterotetraploid crop (AABB), and research on quality traits is mainly focused on A05, A07, A08, A09, B01, B04 and B09 chromosomes. 7 QTLs related to protein content were found by Sarvamangal et al using RILs populations of 146 families, 1.5% -10.70% of the phenotypic variation could be explained, and 78 fat QTL sites were detected by Pandey et al using 2 RIL populations. Sun et al explored that qA05.1 had a significant effect on fat and protein using 318 RIL populations. The results of positioning of different group materials are different, the positioning research of peanut quality-related characters is widely carried out by utilizing different group materials, it is important to discover more main effective sites for controlling the quality characters, the molecular markers of the quality-related SNP are developed, the high-yield and high-quality breeding theory of the peanuts is enriched, and theoretical and technical support is provided for efficient breeding of new high-quality peanut varieties in the future.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a molecular marker for breeding and identifying peanut protein and fat content and application thereof. The invention utilizes whole genome association analysis to obtain an important SNP locus Arahy.08_49538603 for controlling protein content and fat content, develops a molecular marker, can be directly used for molecular identification of peanut quality breeding materials, and improves breeding efficiency.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

1. planting natural peanut population (more than 100 parts) under the condition of multiple years, inspecting the protein content and fat content characters, removing error values and abnormal values, correcting fertility differences by using a control variety, and correcting the protein content and fat content characters by using a mixed linear model to calculate BLUP values (optimal linear unbiased predicted values).

2. Peanut cultivar selection 016 was de novo sequenced, assembled, and each material in the population was re-sequenced for a second generation (10×deep) to select 016 reference genomes for polymorphic variation locus detection. The quality control standard is as follows: the deletion rate Miss < = 0.2 of the SNP locus in the sample, and the minor allele frequency Maf > = 0.05.

3. And carrying out whole genome association analysis by combining the phenotype data and the genotype data, and exploring significance association sites related to peanut protein content and fat content traits.

4. Summary statistical analysis, phenotypic variation analysis, and linkage Block analysis were performed on the significant SNPs, locking the SNP site arahy.08_49538603.

5. Extracting genotypes of all materials in the population at the locus, and carrying out box-line pattern analysis on the salient locus, wherein the genotypes of the high-protein and low-fat materials are AA, and the genotypes of the low-protein and high-fat materials are CC.

6. Genotyping validation was performed using the KASP (competitive allele-specific PCR) technique, developing molecular markers that control protein content and fat content.

7. And molecular markers are used for identifying the breeding materials, so that the high-protein low-fat and high-fat low-protein peanut materials are rapidly screened, and the high-quality peanut breeding efficiency is improved.

The invention provides a molecular marker for breeding and identifying peanut protein and fat content, which is SNP locus Arahy.08_49538603 and is positioned on peanut chromosome 8; the 200bp sequences before and after the SNP locus Arahy.08_49538603 are shown as SEQ ID NO. 1.

The primer group is as follows:

primer_X：GAAGGTGACCAAGTTCATGCTTCTTCTCTGATTCCTCATTGAAAATGTT；

primer_Y：GAAGGTCGGAGTCAACGGATTCTTCTCTGATTCCTCATTGAAAATGTG；

primer_C：CCCTAATAGATAAAATCAGCTAAATATTTAAGTATTC。

detection reagents or kits for the KASP primer set.

The method for identifying peanut protein and fat content by using the molecular marked KASP primer group comprises the following steps:

(1) Extracting DNA of peanut materials to be identified, and carrying out PCR identification by using a KASP primer group of molecular markers;

(2) If the genotype of the molecular marker Arahy.08_49538603 site is AA, the peanut material to be identified is a high-protein and low-fat material; if the genotype of the molecular marker Arahy.08_49538603 site is CC, the peanut material to be identified is a low-protein and high-fat material.

The PCR reaction program is as follows: a) 94 ℃ for 15min; b) Cooling at 94 ℃,20 s,61 ℃ and 60s at a speed of 0.6 ℃/cycle for 10 times; c) 94 ℃,20 s,55 ℃, 60s,26 cycles; d) 94 ℃,20 s,57 ℃, 60s,3 times of circulation.

The molecular marker is applied to the identification of peanut protein and fat content breeding.

The invention has the beneficial effects that:

1. the invention discovers an important SNP locus related to quality traits by utilizing whole genome association analysis, the depth and the breadth of genotype data in association analysis exceed those of the former, the number of call SNPs is maximum, up to 631,988, and abundant and high-quality SNPs provide guarantee for the accuracy of the association analysis locus.

2. The 199 parts of the material are all derived materials of peanut cultivar selection 016, the selection 016 is subjected to de novo sequencing and assembly, and the correlation analysis is carried out by taking the selection 016 as a reference genome, so that the yield trait locus is easier to obtain.

3. The invention verifies the significant SNP loci with higher P value (P value) and PVE (phenotypic variation interpretation rate) more than 8%, and discovers the unique excellent marker loci in the research materials. On the one hand, genotyping of extreme phenotypic material was used to explore the genotype distribution (box plot), and on the other hand, 1 pair Kasp (competitive allele-specific PCR) primers were designed for significant sites with significant genotyping, with 199 materials in this study for genotyping.

4. The SNP locus Arahy.08_49538603 marked by the invention can be directly used for identifying peanut offspring materials, the genotype of which is AA is a high-protein and low-fat material, and the genotype of which is CC is a low-protein and high-fat material.

Drawings

FIG. 1 shows a normal distribution diagram of protein content and fat content.

Wherein, the abscissa PC is protein content, and OC is fat content; the ordinate is the frequency of the phenotype values. E1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point.

FIG. 2 Density distribution of SNPs on peanut chromosomes.

Wherein, the distribution of SNP sites in the 1M window is shown; chr1-20 is 20 peanut chromosomes; the right color level is the density of SNPs on the chromosome.

FIG. 3 Manhattan and QQ plot of protein content in 4 environments.

Wherein, the left side is Manhattan diagram, and PC is protein content; chromosome is Chromosome 1-20; the horizontal dashed line is the significance threshold; e1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point. The right side is a QQplot, the abscissa represents theoretical P values, and the ordinate represents actual P values.

FIG. 4 Manhattan and QQ plot of fat content in 4 environments.

Wherein, the left side is Manhattan diagram, and OC is fat content; chromosome is Chromosome 1-20; the horizontal dashed line is the significance threshold; e1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point. The right side is a QQplot, the abscissa represents theoretical P values, and the ordinate represents actual P values.

Fig. 5. Block linkage diagram of arahy.08_4956803.

Wherein, the blue horizontal bar is the 218.56kb region of chromosome 8, the upper green bar is SNP, the mauve dot is Arahy.08_49538603 locus, it and other 17 SNPs form 1 larger block, the SNP in the block is closely linked inheritance.

FIG. 6. Phenotype differences between the two base types at Arahy.08_4956803.

Wherein PC is protein content; OC is fat content; the abscissa CC/AA is the different genotypes at arahy.08_ 49538603; the ordinate is the phenotypic observations; e1 is an unsealing test point in 2019; e2 is a 2019 Xinyang test point; e3 is an unsealing test point in 2020; e4 is a 2021 opening test point.

FIG. 7 KASP verification of SNP typing at Arahy.08_4956803.

Wherein, the upper left circular signal is CC genotype; the lower right circular signal is AA genotype; the remainder were blank and samples with no signal detected.

Detailed Description

The following describes the embodiments of the present invention in further detail with reference to examples.

The information of 199 parts of peanut materials involved in this experiment is shown in Table 1 below, wherein the peanut varieties or lines from Kaifeng are bred by Kaifeng institute of agriculture and forestry, jihua series peanut varieties from Shijizhuang are provided by Hebei national institute of agriculture and forestry, zhonghua series peanut varieties from Wuhan are provided by oil crop institute of China, and K198 (AT 1-1) is introduced from George Asia in the United states.

Table 1 199 parts peanut material information

Example 1 phenotypic data processing

1. Design of field test

199 parts of the material were unsealed in 2019 (E1) and in 2019 Xinyang, respectively(E2) The test fields unsealed in 2020 (E3) and 2021 (E4) were planted. The 4 groups of test environments all adopt random group arrangement test design, and the area of each material planting cell is 13.34m ² (6.67 m.times.2m), hole spacing 20cm, row spacing 40cm,3 replicates. Tian Feili, the water drainage irrigation is convenient, the topography is flat, and the sandy loam is suitable. During peanut growth, field management and harvesting are performed in time.

2. Agronomic trait investigation and quality determination

After harvesting and sun-drying, quality detection was performed by using a german near infrared analyzer Perton DA7250, and the properties of 199 parts of material were examined for Protein Content (PC) and fat content (oil content, OC).

3. Phenotype data processing

The phenotype data (PC, OC) was sorted, calculated using Microsoft Excel 2010, error values and outliers were deleted, ensuring that the phenotype data fit into a normal distribution (fig. 1). The blup values (best linear unbiased prediction ) for each trait were calculated as 3 replicates per environment using a mixed linear model of the Genstat 18th Edition software.

Example 2 genotype data processing

Genomic DNA was extracted from young leaves at seedling stage using a plant genomic DNA kit. And (3) evaluating the integrity and quality of the DNA by agarose gel electrophoresis and NanoDrop, and ensuring that the requirements of genome sequencing and database construction quality are met.

1. Sequencing and Assembly of reference genome open 016

The sequencing assembly method is as follows:

1. the third generation technology: three generations of sequencing were performed using the pacbriosequenci II platform, requiring a sequencing depth of no less than 100×.

2. Second generation Illumina data: second generation sequencing is performed by utilizing an Illumina nova-seq PE150 platform, wherein the sequencing depth is required to be not less than 100×, Q20 is not less than 90%, and Q30 is not less than 85%.

3. Hi-C data: according to species information, four-base enzyme or six-base enzyme is selected to construct a Hi-C library, and the sequencing depth is required to be not less than 100X, Q20 is not less than 85%, and Q30 is not less than 80%.

Sequencing assembly results were as follows:

1. the 016 third generation sequencing was 297.92G with a depth of 109.77X combined with the second generation sequencing together with 549.80G sequencing data.

2. The survey analysis was performed using kmer17 software: the genome size was 2,703.87mbp, the corrected 2,686.33mbp, the heterozygosity ratio was 0.13%, and the repeat sequence ratio was 84.15%.

3. Sequencing of peanut genome denovo assembly was performed with the following results: the total length 2.53Gbp,contig N50 of the contigs reaches 11.48Mbp; the overall length 2.53Gbp,scaffold N50 of the scaffold reaches 11.48Mbp.

4. Chromosome is mounted by using Hi-C data to obtain a chromosome level genome.

5. The assembly quality was assessed for consistency, sequence integrity, EST sequence, RNA sequence, CEGMA and BUSCO.

The comparison rate of all small fragment reads to the genome is about 99.65%, the coverage rate is about 99.80%, and the reads and the genome obtained by assembly are proved to have good consistency; the 1614 orthologous single copy genes assemble 99.2% of complete single copy genes, which indicates that the assembly result is complete; 248 CEGs (Core Eukaryotic Genes) assemble 241 genes with the proportion of 97.18%, which indicates that the assembling result is complete;

2. sequencing of 199 parts peanut Material

Re-sequencing 199 parts of the material with depth of 10 x by adopting an Illumina second-generation sequencing platform, and performing quality control on the sequencing data, wherein high-quality SNP is reserved, and the quality control standard is as follows: the deletion rate Miss < = 0.2 of the SNP locus in the sample, and the minor allele frequency Maf > = 0.05. The peanut cultivar selection 016 is used as a reference genome to carry out call SNP, 631,988 SNPs are obtained in total, and the high-quality SNP loci are the most loci in the peanut association analysis at present, which is closely related to the genetic diversity among 199 materials.

As can be seen from FIG. 2, there are 48,821 SNP sites on chromosome 3 at most, followed by chromosome 11 (43, 292 SNPs); the minimum number of SNPs on chromosome 8 was 13,143, followed by chromosome 10 (13,848 SNPs), and the average density of SNPs on the chromosome was 251.71/M.

EXAMPLE 3 Whole genome correlation analysis

1. Significance site detection

Whole genome correlation analysis was performed using GEMMA 0.94.1 version (Whole genome efficient hybrid model correlation) software package, formula y=Xα+Sβ+Kμ+e. Where y corresponds to the phenotype (phenotype data obtained in example 1), X corresponds to the genotype (genotype data obtained in example 2), S corresponds to the fixed factor intercept in the model, and K is the affinity matrix calculated from the SNPs. Xα and sβ represent the fixed effect and kμ and e represent the random effect. We set the threshold for whole genome association analysis to-lg (0.05/631988) =7.10 using Bonferroni test, resulting in manhattan and QQplot for PC and OC (fig. 3-4). From the figure, in the 4-group environment, the protein content and fat content traits each detected a distinct signal site on chromosome 8, indicating that the site (gene) controlling the quality trait was located on chromosome 8.

SNP site summary statistics and phenotypic variation interpretation rate analysis

Summarizing the significant SNP loci in the association analysis results (Table 2), respectively detecting 44, 63, 38 and 24 SNP loci in 4 environments by 2 quality traits, mainly focusing on chromosome 8, and identifying 154 non-redundant association loci and 13 repeatability loci in total.

TABLE 2 quantity of significant SNP in four environments for yield traits

Statistical analysis was performed on the loci of each trait (table 3), with 10 SNP loci contributing to 2 traits of protein and fat, with quality trait loci concentrated predominantly on chromosome 8. These sites were analyzed for their phenotypic variation interpretation using the R language, with arahy.084956803 having a maximum of 14.06% phenotypic variation interpretation under different circumstances.

TABLE 3 SNP detected in relation to multiple yield traits

Chromosome of the human body	Position of	Reference site	Mutation site	Property (number of environments where signal is detected)
					8	38378278	C	T	PC(1)，OC(1)
8	44879304	C	T	PC(1)，OC(1)
					8	48994448	T	C	PC(1)，OC(1)
8	49296302	A	G	PC(1)，OC(1)
					8	49338631	A	G	PC(1)，OC(1)
8	49385915	T	C	PC(1)，OC(1)
					8	49538603	A	C	PC(2)，OC(1)
8	49587942	T	C	PC(1)，OC(1)
					16	104322766	G	T	PC(1)，OC(1)
16	107768318	G	T	PC(1)，OC(1)

SNP locus Block analysis

The LD haplotype block map analysis was performed using ldblockshow1.40 software for each 115kb region (population material half-life 115 kb) upstream and downstream of arahy.08_49538603, looking for blocks. The results show that there are a large and small 2 blocks (black triangles) in the 218.56kb region of chromosome 8, wherein Arahy.08_49538603 (purple dots) are in the left larger block, the blocks contain 18 SNPs in total, arahy.08_49538603 and 17 SNPs nearby are in a highly linkage disequilibrium state, haplotypes are formed (figure 5), false positives of significant sites are eliminated, and reliability is high.

The sequence of 200bp before and after each SNP locus Arahy.08_49538603 is shown as SEQ ID NO. 1.

CTGATTGAAACCTGTTTCTTACTCAATCAAGTCATCAAATTAGAATTCATGTAGACACACTAACCACAAGTGAATCGTTTGTCATCAAGATCAGATGTCCAATCATAATCTGAAAAGGCAAACAAGTGAAAATCAGTGCTAGAATGAAAAATTAACCCCCTAATAGATAAAATCAGCTAAATATTTAAGTATTCTCTTCA

ACATTTTCAATGAGGAATCAGAGAAGAGTGCATAAATTAGTTGACTTTATTAACAGATTATGCAATTTCAGGTCTCATAAAAGTGACATATTGTAAAGATCCTATAATTGATCTATAAGAGTTCTTAAACAACTCAACCTTTTGACTTGTCAACTTAGTGGTGGAGACCATTGGTGTTGAAACACATTTCGAATGAGCCA (SEQ ID NO.1, underlined is the position of this site).

Example 4 association site validation

1. Case diagram verification

And carrying out box diagram verification on salient sites with higher P value and PVE of more than 8% by using box plot packages in the R language. Protein content is more than or equal to 25%, protein content is less than or equal to 23.6%, oil content is more than or equal to 50%, oil content is less than or equal to 49%, and 40 extreme phenotypic characters are used for making a box diagram. Genotypes corresponding to different protein content/fat content indexes (ordinate form values) are different, the high protein genotype is AA, and the low protein genotype is CC at a position point Arahy.08_ 49538603; the high fat genotype was CC, the low fat genotype was AA, and the two traits of protein content and fat content were inversely related (fig. 6), consistent with previous study results. Therefore, when the gene type of the Arahy.08_49538603 locus in the peanut material is AA, the peanut material is a high-protein low-fat material; when the genotype is CC, the material is a low-protein high-fat material.

2. Genotype verification and molecular marker development

We extracted the sequence of 100bp before and after Arahy.08_49538603 locus, designed KASP markers by KASP (competitive allele specific PCR) technique, amplified, sequenced, detected in 199 material populations. The results show that the AA genotype materials are gathered together, the CC genotype materials are gathered together, arahy.08_49538603 has unique genotyping (figure 7), so that the locus is further verified to be an SNP locus for controlling the quality traits of peanuts, and the designed molecular marker can be directly used for the quality identification of the peanut materials. The DNA sequence of the peanut sample was amplified and sequenced using the following KASP primers, and the genotype of the Arahy.08_49538603 locus was AA, the high protein and low fat material, and the genotype was CC, the low protein and high fat material.

Wherein, KASP primer sequence of molecular marker arahy.08_49538603 is utilized:

primer_X：GAAGGTGACCAAGTTCATGCTTCTTCTCTGATTCCTCATTGAAAATGTT(SEQ ID NO.2)；

primer_Y：GAAGGTCGGAGTCAACGGATTCTTCTCTGATTCCTCATTGAAAATGTG(SEQ ID NO.3)；

primer_C：CCCTAATAGATAAAATCAGCTAAATATTTAAGTATTC(SEQ ID NO.4)。

the PCR reaction procedure was: a) 94 ℃ for 15min; b) Cooling at 94 ℃,20 s,61 ℃ and 60s at a speed of 0.6 ℃/cycle for 10 times; c) 94 ℃,20 s,55 ℃, 60s,26 cycles; d) 94 ℃,20 s,57 ℃, 60s,3 times of circulation.

The reaction system: 5. Mu.L of DNA (20-80 ng/. Mu.L); 2 XKASP Master Mix 5. Mu.L; KASP primer mix (50. Mu. Mol/L) 0.14. Mu.L, ddH ₂ O 3μL。

Wherein, 2×KASP Master Mix is a LGC (Laboratory of the Government Chemist) company universal kit, which is suitable for all KASP tests and operates according to the product specifications.

Claims

1. A molecular marker for breeding and identifying peanut protein and fat content, wherein the molecular marker is SNP locus Arahy.08_49538603, and is positioned on peanut chromosome 8; the 200bp sequences before and after the SNP locus Arahy.08_49538603 are shown as SEQ ID NO. 1.

2. A set of KASP primers for use in detecting the molecular marker of claim 1, wherein said set of primers is: primer_x: GAAGGTGACCAAGTTCATGCTTCTTCTCTGATTCCTCATTGAAAATGTT;

primer_Y：GAAGGTCGGAGTCAACGGATTCTTCTCTGATTCCTCATTGAAAATGTG；

primer_C：CCCTAATAGATAAAATCAGCTAAATATTTAAGTATTC。

3. a detection reagent or kit comprising the KASP primer set of claim 2.

4. A method for identifying peanut protein and fat content using the molecularly labeled KASP primer set of claim 2, comprising the steps of:

5. The method of claim 4, wherein the PCR reaction procedure is:

a) 94 ℃ for 15min; b) Cooling at 94 ℃,20 s,61 ℃ and 60s at a speed of 0.6 ℃/cycle for 10 times; c) 94 ℃,20 s,55 ℃, 60s,26 cycles; d) 94 ℃,20 s,57 ℃, 60s,3 times of circulation.

6. Use of the molecular marker of claim 1 in breeding for identifying peanut protein and fat content.