[go: up one dir, main page]

WO2025217495A1 - Glycinin variants for improving the nutritional value of soybeans - Google Patents

Glycinin variants for improving the nutritional value of soybeans

Info

Publication number
WO2025217495A1
WO2025217495A1 PCT/US2025/024242 US2025024242W WO2025217495A1 WO 2025217495 A1 WO2025217495 A1 WO 2025217495A1 US 2025024242 W US2025024242 W US 2025024242W WO 2025217495 A1 WO2025217495 A1 WO 2025217495A1
Authority
WO
WIPO (PCT)
Prior art keywords
seed
methionine
polypeptide
seq
plant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/024242
Other languages
French (fr)
Inventor
Kiheon BAEK
Zhenglin Hou
Abhiman Saraswathi
Zhan-Bin Liu
Keith R. Roesler
Laura WAYNE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pioneer Hi Bred International Inc
Original Assignee
Pioneer Hi Bred International Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pioneer Hi Bred International Inc filed Critical Pioneer Hi Bred International Inc
Publication of WO2025217495A1 publication Critical patent/WO2025217495A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8242Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits
    • C12N15/8243Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits involving biosynthetic or metabolic pathways, i.e. metabolic engineering, e.g. nicotine, caffeine
    • C12N15/8251Amino acid content, e.g. synthetic storage proteins, altering amino acid biosynthesis

Definitions

  • sequence listing is submitted electronically via Patent Center as an XML formatted sequence listing with a file named 210933_SequenceListing created on March 19, 2024 and having a size of 554,789 bytes and is filed concurrently with the specification.
  • sequence listing comprised in this XML formatted document is part of the specification and is herein incorporated by reference in its entirety.
  • Livestock feed rations are commonly comprised of a mixture of soybean meal and maize. Such feed can have suboptimal amounts of the two sulfur-containing amino acids, methionine (met) and cysteine, as well as the amino acids lysine, tryptophan, and threonine.
  • the most limiting of these amino acids for poultry feed are the sulfur amino acids, and consequently synthetic methionine made from petroleum is commonly added to poultry feed, significantly increasing the expense of the feed.
  • Creating proteins that are enriched in these limiting amino acids, and that are capable of accumulating to abundant levels in soybean seeds, may improve the nutritional value of soybeans for feed.
  • plants and seeds comprising a modified polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11 , 20, 21 , 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100
  • 373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
  • a plant or seed comprising a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising at least 30 methionine residues and an AlphaFold2 predicted structure having a TM-score of at least 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, or 1 as compared to the AlphaFold2 predicted structure of SEQ ID NO: 4.
  • a method of producing a plant producing seed having increased methionine content comprising introducing into a regenerable plant cell a polynucleotide encoding modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80,
  • a method for generating high-methionine seed storage protein variants comprising generating an in silica population of high-methionine seed storage protein variants by inputting the 3D structural coordinates and/or amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage polypeptide 3D structure and/or sequential information; calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silica population; and selecting from the in silica population one or more candidate high-methionine seed storage polypeptide variants, the one or more selected candidate high-methionine seed storage polypeptide variants having (i) a predicted solubility score that is at least 80% of a predicted solubility score for the candidate seed storage protein, (ii) a predicted stability score
  • a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112,
  • a modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143,
  • 373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
  • protein compositions produced from the seeds or seeds of the plants described herein and methods of feeding an animal comprising administering a feed comprising the protein composition to the animal in a feeding regimen.
  • Figs. 1A-1E provide pictures of experimental data showing the expression of six-histidine tagged wild-type proglycinin and Set 1 high-methionine proglycinin variants in E. coli.
  • Fig. 1 A provides the wild-type proglycinin (SEQ ID NO: 237) expression.
  • Fig. IB provides the expression of proglycinin variant GY1 ALT4 (SEQ ID NO: 21).
  • Fig. 1C provides the expression of proglycinin variant GY1 ALT3 (SEQ ID NO: 20).
  • Fig. ID provides the expression of proglycinin variant GY1 ALT2 (SEQ ID NO: 19).
  • Fig. 1 A provides the wild-type proglycinin (SEQ ID NO: 237) expression.
  • Fig. IB provides the expression of proglycinin variant GY1 ALT4 (SEQ ID NO: 21).
  • Fig. 1C provides the expression of proglycinin variant GY1 ALT3
  • IE provides the expression of proglycinin variant GY1 ALT1 (SEQ ID NO: 18).
  • the prominent bands near the 50 kD marker that are present in the induced lanes, but not in the uninduced lanes, are the proglycinin polypeptides.
  • the full-length preproglycinin sequence that includes the signal peptide corresponding to position 1 -19 for each sequence and lacking the six-histidine tag is provided.
  • Figs. 2A-2L provide pictures of experimental data showing the solubility of six-histidine tagged high-methionine proglycinin variants expressed in E. coli.
  • Fig. 2A provides data for wildtype proglycinin (SEQ ID NO: 237) along with the proglycinin variants G1_ALT4 (SEQ ID NO: 21), G1 ALT3 (SEQ ID NO: 20), G1 ALT2 (SEQ ID NO: 19), G1 ALT1 (SEQ ID NO: 18).
  • SEQ ID NO: 237 wildtype proglycinin
  • G1_ALT4 SEQ ID NO: 21
  • G1 ALT3 SEQ ID NO: 20
  • G1 ALT2 SEQ ID NO: 19
  • G1 ALT1 SEQ ID NO: 18
  • Fig. 2B provides data for the proglycinin variants ALT4 1 (SEQ ID NO: 22), ALT4 2 (SEQ ID NO: 23), ALT4 3 (SEQ ID NO: 24), and ALT4_4 (SEQ ID NO: 25).
  • Fig. 2C provides data for the proglycinin variants ALT4 5 (SEQ ID NO: 26), ALT4 6 (SEQ ID NO: 27), ALT4 7 (SEQ ID NO: 28), ALT4 8 (SEQ ID NO: 29), ALT4 9 (SEQ ID NO: 30), ALT4 10 (SEQ ID NO: 31), ALT4 11 (SEQ ID NO: 32), ALT4 12 (SEQ ID NO: 33), and ALT4 13 (SEQ ID NO: 34).
  • 2D provides data for the proglycinin variants ALT4 14 (SEQ ID NO: 35), ALT4 15 (SEQ ID NO: 36), ALT4 16 (SEQ ID NO: 37), ALT4 17 (SEQ ID NO: 38), ALT4 18 (SEQ ID NO: 39), ALT4 19 (SEQ ID NO: 40), ALT4 20 (SEQ ID NO: 41), ALT4 21 (SEQ ID NO: 42), and ALT4 22 (SEQ ID NO: 43).
  • ALT4 14 SEQ ID NO: 35
  • ALT4 15 SEQ ID NO: 36
  • ALT4 16 SEQ ID NO: 37
  • ALT4 17 SEQ ID NO: 38
  • ALT4 18 SEQ ID NO: 39
  • ALT4 19 SEQ ID NO: 40
  • ALT4 20 SEQ ID NO: 41
  • ALT4 21 SEQ ID NO: 42
  • ALT4 22 SEQ ID NO: 43
  • Fig. 2E provides data for the proglycinin variants ALT4 23 (SEQ ID NO: 44), ALT4 24 (SEQ ID NO: 45), ALT4 25 (SEQ ID NO: 46), ALT4 26 (SEQ ID NO: 47), and ALT4_27 (SEQ ID NO: 48).
  • Fig. 2F provides data for the proglycinin variants ALT4 28 (SEQ ID NO: 49), ALT4 29 (SEQ ID NO: 50), ALT4 30 (SEQ ID NO: 51), and ALT4_31 (SEQ ID NO: 52).
  • 2G provides data for the proglycinin variants ALT4_32 (SEQ ID NO: 53), ALT4 33 (SEQ ID NO: 54), ALT4 34 (SEQ ID NO: 55), ALT4 35 (SEQ ID NO: 56), ALT4 36 (SEQ ID NO: 57), ALT4 37 (SEQ ID NO: 58), ALT4 38 (SEQ ID NO: 59), ALT4 39 (SEQ ID NO: 60), and ALT4 40 (SEQ ID NO: 61).
  • Fig. 21 provides data for the proglycinin variants G1 AI 1 (SEQ ID NO: 69), G1 AI 2 (SEQ ID NO: 70), and G1 AI 3 (SEQ ID NO: 71).
  • FIG. 2J provides data for the proglycinin variants G1 AI 4 (SEQ ID NO: 72), G1 AI 5 (SEQ ID NO: 73), G1 AI 6 (SEQ ID NO: 74), G1_AI_7 (SEQ ID NO: 75), and G1_AI_8 (SEQ ID NO: 76).
  • Fig. 2K provides data for the proglycinin variants G1 AI 9 (SEQ ID NO: 77), G1 AI 10 (SEQ ID NO: 78), G1 AI 11 (SEQ ID NO: 79), G1 AI 12 (SEQ ID NO: 80), and G1 AI 13 (SEQ ID NO: 81).
  • 2L provides data for the proglycinin variants G1_AI_14 (SEQ ID NO: 82), G1 AIJ 5 (SEQ ID NO: 83), G1 AIJ6 (SEQ ID NO: 84), G1 AI J7 (SEQ ID NO: 85), and G1 AI 18 (SEQ ID NO: 86).
  • the full-length preproglycinin sequence that includes the signal peptide corresponding to position 1-19 for each sequence and lacking the six-histidine tag is provided.
  • Figs. 3A-3C provide graphs of experimental data depicting the stability of six-histidine tagged high-methionine proglycinin variants against unfolding by guanidine hydrochloride.
  • Fig. 3A provides data for wild-type proglycinin WT (SEQ ID NO: 237), G1 ALT4 (SEQ ID NO: 21), G1 ALT4 5 (SEQ ID NO: 26), G1 ALT4 29 (SEQ ID NO: 50), G1 ALT4 39 (SEQ ID NO: 60), G1 ALT4 40 (SEQ ID NO: 61), G1 ALT4 47 (SEQ ID NO: 68).
  • Fig. 3B provides data for WT (SEQ ID NO: 4), G1 AI 4 (SEQ ID NO: 72), G1 AI 5 (SEQ ID NO: 73), G1 AI 7 (SEQ ID NO: 75), and G1 AI 8 (SEQ ID NO: 76).
  • Fig. 3C provides data for wildtype proglycinin WT (SEQ ID NO: 237), G1_AI_9 (SEQ ID NO: 77), G1_AI_11 (SEQ ID NO: 79), G1 AI 12 (SEQ ID NO: 80), G1 AI 14 (SEQ ID NO: 82), and G1 AI 17 (SEQ ID NO: 85).
  • Figs. 4A-4P provide experimental data showing the stability of six-histidine tagged high- methionine proglycinin variants against digestion by trypsin. Digests were done at 25°C for the indicated times with a 1:500 ratio (wt:wt) of trypsimproglycinin variant. The 0-minute control lanes contained the proglycinin variant without trypsin.
  • Fig. 4A provides data for the wild-type proglycinin (SEQ ID NO: 237).
  • Fig. 4B provides data for the high-methionine proglycinin variant GY1 ALT4 (SEQ ID NO: 21).
  • FIG. 4C provides data for the high-methionine proglycinin variant GY1 ALT4 5 (SEQ ID NO: 26).
  • Fig. 4D provides data for the high-methionine proglycinin variant GY1_ALT4_29 (SEQ ID NO: 50).
  • Fig. 4E provides data for the high- methionine proglycinin variant GY1 ALT4 39 (SEQ ID NO: 60).
  • Fig. 4F provides data for the high-methionine proglycinin variant GY1 ALT4 40 (SEQ ID NO: 61).
  • Fig. 4G provides data for the high-methionine proglycinin variant GY1 ALT4 47 (SEQ ID NO: 68).
  • FIG. 4H provides data for the high-methionine proglycinin variant GY1 AI 4 (SEQ ID NO: 72).
  • Fig. 41 provides data for the high-methionine proglycinin variant GY1 AI 5 (SEQ ID NO: 73).
  • Fig. 4J provides data for the high-methionine proglycinin variant GY1 AI 7 (SEQ ID NO: 75).
  • Fig. 4K provides data for the high-methionine proglycinin variant GY1 AI 8 (SEQ ID NO: 76).
  • Fig. 4L provides data for the high-methionine proglycinin variant GY1 AI 9 (SEQ ID NO: 77).
  • Fig. 4M provides data for the high-methionine proglycinin variant GY1 AI 11 (SEQ ID NO: 79).
  • Fig. 4N provides data for the high-methionine proglycinin variant GY1 AI 12 (SEQ ID NO: 80).
  • Fig. 40 provides data for the high-methionine proglycinin variant GY1 AI 14 (SEQ ID NO: 82).
  • Fig. 4P provides data for the high-methionine proglycinin variant GY1 AI 17 (SEQ ID NO: 85).
  • the full-length preproglycinin sequence that includes the signal peptide corresponding to position 1-19 for each sequence and lacking the six-histidine tag is provided.
  • Figs. 5A-5B provide a sequence alignment of the soybean glycinin family members G1 (SEQ ID NO: 237), G2 (SEQ ID NO: 8), G3 (SEQ ID NO: 157), G4 (SEQ ID NO: 158), G5 (SEQ ID NO: 159), and G7 (SEQ ID NO: 160). Full-length preproglycinin sequences, including signal peptides, are shown in the alignment.
  • Fig. 6 provides the predicted structure of the glycinin trimer.
  • the wild-type proglycinin sequence of SEQ ID NO: 4 was used to generate each monomer.
  • the structures were predicted using AlphaFold2.
  • Fig. 7A provides the predicted structure of the wild-type proglycinin monomer (SEQ ID NO: 4).
  • Fig. 7B provides the predicted structure of the proglycinin variant GY1 ALT4 monomer without the signal peptide (positions 20-495 of SEQ ID NO: 21).
  • Fig. 7C provides the predicted structure of the proglycinin variant GY1_AI_14 monomer without the signal peptide (positions 20-495 of SEQ ID NO: 82).
  • the structures were predicted using AlphaFold2 and the methionine residues in each structure are represented as sticks.
  • Figs. 8A-8D provide an analysis of GY1_ALT4 protein in seed from transgenic events and gene edited seed by non-reducing SDS-PAGE and anti-glycinin immunoblots.
  • Fig 8A depicts a non-reducing SDS-PAGE of GY1 ALT4 (SEQ ID NO: 21) in transgenic seed.
  • Fig. 8B depicts a non-reducing SDS-PAGE of GY1 ALT4 (SEQ ID NO: 21) in gene edited seed.
  • Fig. 8C depicts anti-glycinin immunoblots of GY1 ALT4 (SEQ ID NO: 21) in transgenic seed.
  • 8D depicts anti-glycinin immunoblots of GY1 ALT4 (SEQ ID NO: 21) in gene edited seed.
  • the arrows denote: 1, wild-type glycinin family with acid and basic chains disulfide-bonded together; 2, Gyl_ALT4 acidic and basic chains disulfide-bonded together; 3, wild-type glycinins, acidic chain only; 4, Gyl_ALT4 acidic chain only; 5, presumed proteolytic fragment of glycinin acidic chain.
  • the e followed by a number is the event code as found in Table 17A.
  • Figs. 9A-9F provides an analysis of protein in transgenic T2 seed expressing high-Met Gyl variants GY1 ALT4 47 (SEQ ID NO: 68), GY1 AI 4 (SEQ ID NO: 72), GY1 AI 14 (SEQ ID NO: 82), GY1 ALT4 40 (SEQ ID NO: 61), GY1 ALT4 39 (SEQ ID NO: 60), GY1 AI 5 (SEQ ID NO: 73), GY1 AI 7 (SEQ ID NO: 75), and GY1 AI 9 (SEQ ID NO: 77).
  • GY1 ALT4 47 SEQ ID NO: 68
  • GY1 AI 4 SEQ ID NO: 72
  • GY1 AI 14 SEQ ID NO: 82
  • GY1 ALT4 40 SEQ ID NO: 61
  • GY1 ALT4 39 SEQ ID NO: 60
  • GY1 AI 5 SEQ ID NO: 73
  • GY1 AI 7 SEQ ID NO: 75
  • GY1 AI 9 SEQ
  • FIG. 9A depicts a stained gel of T2 seed expressing high-Met Gyl variants GY1_ALT4_47 (SEQ ID NO: 68), GY1 AI 4 (SEQ ID NO: 72), GY1 AI 14 (SEQ ID NO: 82) in a modified CGS background.
  • Fig. 9B depicts anti-glycinin immunoblot of T2 seed expressing high-Met Gyl variants GY1 ALT4 47 (SEQ ID NO: 68), GY1 AI 4 (SEQ ID NO: 72), GY1 AI 14 (SEQ ID NO: 82) in a modified CGS background.
  • Fig. 9A depicts a stained gel of T2 seed expressing high-Met Gyl variants GY1_ALT4_47 (SEQ ID NO: 68), GY1 AI 4 (SEQ ID NO: 72), GY1 AI 14 (SEQ ID NO: 82) in a modified CGS background.
  • Fig. 9B depicts anti-glycinin immunoblot of
  • FIG. 9C depicts a stained gel of T2 seed expressing high-Met Gyl variants GY1 ALT4 47 (SEQ ID NO: 68), GY1 ALT4 40 (SEQ ID NO: 61), and GY1_ALT4_39 (SEQ ID NO: 60) in a wild-type background.
  • Fig. 9D anti-glycinin immunoblot of T2 seed expressing high-Met Gyl variants GY1_ALT4_47 (SEQ ID NO: 68), GY1 ALT4 40 (SEQ ID NO: 61), and GY1 ALT4 39 (SEQ ID NO: 60) in a wild-type background.
  • Fig. 9D anti-glycinin immunoblot of T2 seed expressing high-Met Gyl variants GY1_ALT4_47 (SEQ ID NO: 68), GY1 ALT4 40 (SEQ ID NO: 61), and GY1 ALT4 39 (SEQ ID NO: 60) in a wild-type background.
  • FIG. 9E depicts a stained gel of T2 seed expressing high-Met Gyl variants GY1 AI 4 (SEQ ID NO: 72), GY1 AI 5 (SEQ ID NO: 73), GY1 AI 7 (SEQ ID NO: 75), GY1 AI 14 (SEQ ID NO: 82), and GY1 AI 9 (SEQ ID NO: 77) in a wild-type background.
  • GY1 AI 4 SEQ ID NO: 72
  • GY1 AI 5 SEQ ID NO: 73
  • GY1 AI 7 SEQ ID NO: 75
  • GY1 AI 14 SEQ ID NO: 82
  • GY1 AI 9 SEQ ID NO: 77
  • Figs. 10A and 10B provides an analysis of GY1_AI_4 (SEQ ID NO: 72) gene replacement at the GY J locus in a wild-type background (Gyl_AI_4) and CGS background (Gy l_AI_4 + CGS).
  • Fig. 10A depicts a non-reducing SDS-PAGE.
  • Fig 10B depicts anti-glycinin immunoblots.
  • Fig. 1 1 provides a non-reducing SDS-PAGE analysis of GY1 ALT4 47 (SEQ ID NO:
  • Glycinin Polynucleotides and Polypeptides [0022] Provided are glycinin polynucleotides encoding modified glycinin polypeptides having at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity with SEQ ID NO: 4, 8, 18-86, 157-160, and 237 and a modification described herein.
  • glycinin polynucleotides having at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity with SEQ ID NO: 3, 7, 87-156, 161-164 and 236 and encoding a modification described herein.
  • Soybean glycinins are a family of abundant seed storage proteins. Most of the total glycinin protein is encoded by five genes named Gyl, Gy2, Gy3, Gy4, and Gy5 (Nielsen et al., 1989, Plant Cell 1 :313-328). The corresponding proteins encoded by these genes were named GY1 (also referred to herein as Gl), GY2 (also referred to herein as G2), GY3 (also referred to herein as G3), GY4 (also referred to herein as G4), and GY5 (also referred to herein as G5).
  • GY1 also referred to herein as Gl
  • GY2 also referred to herein as G2
  • GY3 also referred to herein as G3
  • GY4 also referred to herein as G4
  • G5 also referred to herein as G5
  • glycinin polypeptides are initially translated on cytosolic ribosomes as the precursor polypeptide preproglycinin (e.g., SEQ ID NO: 237 for wild-type Glycinin 1), and upon entry into the endoplasmic reticulum, the signal peptide (e g., positions 1-19 of SEQ ID NO: 237 for wild-type Glycinin 1) is removed, resulting in the formation of proglycinin polypeptides (e.g., SEQ ID NO: 4 for proglycinin 1) that form trimers.
  • the precursor polypeptide preproglycinin e.g., SEQ ID NO: 237 for wild-type Glycinin 1
  • proglycinin trimers then move to the protein storage vacuole where a specific protease, the vacuolar processing enzyme, cleaves the proglycinin polypeptide into acidic and basic polypeptides. This cleavage facilitates the interaction of two proglycinin trimers to form glycinin hexamers.
  • Proglycinin trimers and glycinin hexamers can exist either as homo-oligomers, or as hetero-oligomers that include multiple glycinin family members.
  • GY1 or “Gl” is included in the names of preproglycinin 1, proglycinin 1, or glycinin 1 polypeptides or the corresponding CDS that encode them, and “Gy7” or “GF is included in the name of the corresponding glycinin 1 genomic DNA sequences, including introns.
  • One aspect of the disclosure provides a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 modifications selected from the group consisting of a methionine at the position corresponding to position 11 of SEQ ID NO: 4, a methionine at the
  • the modified glycinin polypeptide comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, or 30 introduced methionine residues as compared to the glycinin polypeptide of SEQ ID NO: 4.
  • At least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17) of the 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more modifications comprises a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 51, 53, 61, 66, 70, 71, 80, 145, 162, 164, 172, 209, 313, 335, 341, 343, 351, 354, 356, 361, 412, 414, 416, 436, 442, 462, and 468.
  • a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17) modification selected from the group consisting of a methionine at the position corresponding to position 51 of SEQ ID NO: 4, a methionine at the position corresponding to position 61 of SEQ ID NO: 4, a methionine at the position corresponding to position 66 of SEQ ID NO: 4, a methionine at the position corresponding to position 70 of SEQ ID NO
  • the modified glycinin polypeptide further comprises at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) additional modification comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 53, 71, 80, 209, 335, 343, 351, 436, 442, and 468.
  • additional modification comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 53, 71, 80, 209, 335, 343, 351, 436, 442, and 468.
  • polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising at least one modification selected from the group consisting of a methionine at the position corresponding to position 53 of SEQ ID NO: 4, a methionine at the position corresponding to position 71 of SEQ ID NO: 4, a methionine at the position corresponding to position 80 of SEQ ID NO: 4, a methionine at the position corresponding to position 209 of SEQ ID NO: 4, a methionine at the position corresponding to position 335 of SEQ ID NO: 4, a
  • percent (%) sequence identity with respect to a reference sequence (subject) is determined as the percentage of amino acid residues or nucleotides in a candidate sequence (query) that are identical with the respective amino acid residues or nucleotides in the reference sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any amino acid conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2.
  • sequence identity/ similarity values refer to the value obtained using the BLAST 2.0 suite of programs using default parameters (Altschul, et al., (1997) Nucleic Acids Res. 25:3389-402).
  • the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18-86.
  • the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to amino acid positions 20 to 495 of any one of SEQ ID NOs: 18-86.
  • the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18-86 or positions 20 to 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 positions corresponding to SEQ ID NO: 4 described herein.
  • the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18-86 or positions 20 to 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more positions corresponding to amino acid position 51, 61, 66, 70, 145, 162, 164, 172, 313, 341, 354, 356, 361, 412, 414, 416, and 462 of SEQ ID NO: 4.
  • the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18- 86 or positions 20 to 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at the position corresponding to amino acid position 51, 61, 66, 70, 145, 162, 164, 172, 313, 341, 354, 356, 361, 412, 414, 416, and 462 of SEQ ID NO: 4.
  • the modified glycinin polypeptides described herein comprise a modification described herein and comprise an amino acid sequence that is at least, or at least about, 10%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical SEQ ID NO: 4 and comprises an AlphaFold2 predicted structure having a TM-score of at least 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, or 1 as compared to the AlphaFol d2 predicted structure of SEQ ID NO: 4 or amino acid positions 20 to 495 of any one of SEQ ID NOs: 18-86.
  • AlphaFold is a computational method that can predict protein structures with atomic accuracy even in cases in which no similar structure is known.
  • the AlphaFold network directly predicts the 3D coordinates of all heavy atoms for a given protein using the primary amino acid sequence and aligned sequences of homologues as inputs.
  • the AlphaFold methods are scalable to very long proteins with accurate domains and domain-packing, and the model provides precise, per-residue estimates of its reliability that enables confident use of its structure predictions. (Jumper, J., Evans, R., Pritzel, A. et al., Nature 596, 583-589 (2021)).
  • TM-score structural similarity score
  • structural similarity score refers to a measure of structural similarity between two protein tertiary structures. TM- scores range from 0 (no structural identity) to 1 (perfect structural identity) and can be computed using multiple different publicly available approaches, including TM-align (zhanggroup.org/TM- align/) or Foldseek (search.foldseek.com/search), accessible using the prefix “www” on the internet. Those skilled in the art can determine appropriate parameters for optimally aligning structures.
  • TM-scores refer to the value obtained using the TM-align program using default parameters to compare protein structures predicted using AlphaFold2 version 2.3.1 in the monomer mode with relaxed model prediction and the “uniref90” reference database.
  • the modified glycinin polypeptides described herein comprise a signal peptide operably linked to the modified glycinin polypeptide.
  • the signal peptide is operably linked at the N-terminus of the modified glycinin polypeptide.
  • the signal peptide is operably linked at the C-terminus of the modified glycinin polypeptide.
  • the modified glycinin polypeptide comprises 2 or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more) signal peptides. The 2 or more signal peptides may be operably linked the N-terminus, the C-terminus, or a combination thereof.
  • the signal peptide comprises an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to amino acid positions 1-19 of SEQ ID NO: 237.
  • the modified glycinin polypeptide comprises a linker sequence between the modified glycinin polypeptide and the signal peptide. The sequence and length of the linker is not particularly limited so long as the signal peptide can direct the modified glycinin polypeptide to the desired location
  • polynucleotides encoding modified glycinin polypeptides comprising 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more methionine residues and comprising an amino acid sequence that is at least, or at least about, 10%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 4 and comprising an AlphaFold2 predicted structure having a TM-score of at least 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, or 1 as compared to the AlphaFold2 predicted structure of SEQ ID NO: 4 or
  • the position of the methionine residues is not particularly limited. In certain embodiments, at least 5 (e.g., at least 5, 6, 7, 8, 9, or 10) of the 10 or more methionine residues introduced within a beta barrel motif of the AlphaFold2 predicted structure of SEQ ID NO: 4.
  • At least five methionine residues are introduced at a position corresponding to position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61 , 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154,
  • At least five methionine residues are introduced at a position corresponding to position 24, 26, 28, 33, 35, 49, 51, 53, 55, 59, 61, 64, 66, 70, 72, 74, 81, 83, 115, 117, 121, 123, 125, 131, 133, 135, 141, 143,
  • encoding means comprising the information for translation into the specified protein.
  • a nucleic acid encoding a protein may comprise non-translated sequences (e.g., introns) within translated regions of the nucleic acid, or may lack such intervening non-translated sequences (e.g., as in cDNA).
  • the information by which a protein is encoded is specified by the use of codons.
  • the amino acid sequence is encoded by the nucleic acid using the “universal” genetic code.
  • variants of the universal code such as is present in some plant, animal and fungal mitochondria, the bacterium Mycoplasma capricolum (Yamao, et al., (1985) Proc. Natl. Acad. Sci. USA 82:2306-9) or the ciliate Macronucleus, may be used when the nucleic acid is expressed using these organisms.
  • nucleic acid sequences of the present invention may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific codon preferences and GC content preferences of monocotyledonous plants or dicotyledonous plants as these preferences have been shown to differ (Murray, et al., (1989) Nucleic Acids Res. 17:477-98 and herein incorporated by reference).
  • polynucleotide includes reference to a deoxyribopolynucleotide, ribopolynucleotide or analogs thereof that have the essential nature of a natural ribonucleotide in that they hybridize, under stringent hybridization conditions, to substantially the same nucleotide sequence as naturally occurring nucleotides and/or allow translation into the same amino acid(s) as the naturally occurring nucleotide(s).
  • a polynucleotide can be full-length or a subsequence of a structural or regulatory gene. Unless otherwise indicated, the term includes reference to the specified sequence as well as the complementary sequence thereof.
  • DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein.
  • DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples are polynucleotides as the term is used herein. It will be appreciated that a great variety of modifications have been made to DNA and RNA that serve many useful purposes known to those of skill in the art.
  • polynucleotide as it is employed herein embraces such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including inter alia, simple and complex cells.
  • polypeptide “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues.
  • the terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers.
  • the polynucleotide encoding the modified glycinin polypeptide is operably linked to at least one (e.g., at least 1, 2, 3, 4, 5, 6, 7 or more) regulatory element.
  • the regulatory element is a promoter.
  • the regulatory element is a heterologous regulatory element (e.g., heterologous promoter).
  • the heterologous regulatory element is heterologous to the polynucleotide sequence encoding the polypeptide.
  • in which the polynucleotide operably linked to the heterologous regulatory element is introduced in a cell the regulatory element is heterologous to the cell.
  • operably linked is intended to mean a functional linkage between two or more elements.
  • an operable linkage between a polynucleotide of interest and a regulatory sequence is a functional link that allows for expression of the polynucleotide of interest.
  • Operably linked elements may be contiguous or non-contiguous. When used to refer to the joining of two protein coding regions, operably linked is intended that the coding regions are in the same reading frame.
  • regulatory element generally refers to a transcriptional regulatory element involved in regulating the transcription of a nucleic acid molecule such as a gene or a target gene.
  • the regulatory element is a nucleic acid and may include a promoter, an enhancer, an intron, expression modulating elements (EMEs), a 5 ’-untranslated region (5’-UTR, also known as a leader sequence), or a 3’-UTR, or a combination thereof.
  • EMEs expression modulating elements
  • a regulatory element may act in "cis” or “trans”, and generally it acts in "cis”, i.e., it activates expression of genes located on the same nucleic acid molecule, e.g., a chromosome, where the regulatory element is located.
  • An “enhancer” element is any nucleic acid molecule that increases transcription of a nucleic acid molecule when functionally linked to a promoter regardless of its relative position.
  • Various enhancers are known in the art including for example, introns with gene expression enhancing properties in plants, the ubiquitin intron (i.e., the maize ubiquitin intron 1 (see, for example, NCBI sequence S94464)), the omega enhancer or the omega prime enhancer (Gallie, et al., (1989) Molecular Biology ofRNA ed.
  • a “repressor” (also sometimes called herein silencer) is defined as any nucleic acid molecule which inhibits the transcription when functionally linked to a promoter regardless of relative position.
  • the term "cis-element” generally refers to a transcriptional regulatory element that affects or modulates expression of an operably linked transcribable polynucleotide, where the transcribable polynucleotide is present in the same DNA sequence.
  • a cis-element may function to bind transcription factors, which are trans-acting polypeptides that regulate transcription.
  • an “intron” is an intervening sequence in a gene that is transcribed into RNA but is then excised in the process of generating the mature mRNA. The term is also used for the excised RNA sequences.
  • An “exon” is a portion of the sequence of a gene that is transcribed and is found in the mature mRNA derived from the gene but is not necessarily a part of the sequence that encodes the final gene product.
  • the 5' untranslated region (5’UTR) also known as a translational leader sequence or leader RNA
  • This region is involved in the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes.
  • the “3' non-coding sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression.
  • the polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3' end of the mRNA precursor.
  • promoter refers to a region of DNA upstream from the start of transcription and involved in recognition and binding of RNA polymerase and other proteins to initiate transcription.
  • a “plant promoter” is a promoter capable of initiating transcription in plant cells.
  • the polynucleotides described herein are operably linked to a promoter that drives expression in a plant cell. Any promoter known in the art can be used in the methods of the present disclosure including, but not limited to, constitutive promoters, pathogeninducible promoters, wound-inducible promoters, tissue-preferred promoters, and chemical- regulated promoters.
  • promoter may depend on the desired timing and location of expression in the transformed plant as well as other factors, which are known to those of skill in the art.
  • constitutive promoters include, for example, the core promoter of the Rsyn7 promoter and other constitutive promoters disclosed in WO 99/43838 and U.S. Patent No.
  • a wound-inducible promoter can be used in the constructions of the disclosure.
  • woundinducible promoters include potato proteinase inhibitor (pin II) gene, wunl and wun2, winl and win2, systemin, WIP1, MPI gene, and the like.
  • Chemi cal -regulated promoters can be used to modulate the expression of a gene in a plant through the application of an exogenous chemical regulator.
  • the promoter can be a chemical-inducible promoter, where application of the chemical induces gene expression, or a chemical-repressible promoter, where application of the chemical represses gene expression.
  • Chemical -inducible promoters are known in the art and include, but are not limited to, the maize In2-2 promoter, which is activated by benzenesulfonamide herbicide safeners, the maize GST promoter, which is activated by hydrophobic electrophilic compounds that are used as pre-emergent herbicides, and the tobacco PR-la promoter, which is activated by salicylic acid.
  • steroid-responsive promoters e.g., the glucocorticoid-inducible promoter, and tetracycline-inducible and tetracycline-repressible promoters.
  • Tissue-preferred promoters can be utilized to target enhanced expression of the target genes or proteins within a particular plant tissue.
  • tissue-preferred promoters include, but are not limited to, leaf-preferred promoters, root-preferred promoters, seed-preferred promoters, and stem-preferred promoters.
  • Tissue-preferred promoters include Yamamoto et al. (1997) Plant J. 12(2): 255 -265; Kawamata et al. (1997) Plant Cell Physiol. 38(7):792-803; Hansen et al. (1997) Mol. Gen Genet. 254(3):337-343; Russell et al. (1997) Transgenic Res. 6(2): 157-168; Rinehart et al. (1996) Plant Physiol. 112(3): 1331-1341; Van Camp et al. (1996) Plant Physiol.
  • Leaf-specific promoters are known in the art. See, for example, Yamamoto et al. (1997) Plant J. 12(2)255-265; Kwon et al. (1994) Plant Physiol. 105:357-67; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778; Gotor et al. (1993) Plant J. 3:509-18; Orozco et al. (1993) Plant Mol. Biol. 23(6): 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90(20):9586-9590.
  • seed-specific promoters include both “seed-specific” promoters (those promoters active during seed development such as promoters of seed storage proteins) as well as “seedgerminating” promoters (those promoters active during seed germination).
  • seed-preferred promoters include, but are not limited to, Ciml (cytokinin-induced message), cZ19Bl (maize 19 kDa zein), milps (myo-inositol- 1 -phosphate synthase), and celA (cellulose synthase) (see WO 00/11177, herein incorporated by reference).
  • Gama-zein is a preferred endosperm-specific promoter.
  • Glob-1 is a preferred embryo-specific promoter.
  • seed-specific promoters include, but are not limited to, bean P-phaseolin, napin, -conglycinin, soybean lectin, cruciferin, and the like.
  • seed-specific promoters include, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, g-zein, waxy, shrunken 1, shrunken 2, globulin 1, etc. See also WO 00/12733, where seed-preferred promoters from endl and end2 genes are disclosed; herein incorporated by reference.
  • the polynucleotides of the present disclosure can involve the use of the intact, native glycinin genes, wherein the expression is driven by a cognate 5' upstream promoter sequence(s).
  • the glycinin polynucleotides encoding the modified glycinin polypeptides described above are inserted into a recombinant DNA construct.
  • the recombinant DNA construct further comprises at least one regulatory element.
  • the at least one regulatory element of the recombinant DNA construct comprises a promoter, preferably a heterologous promoter.
  • the recombinant DNA construct, described herein is expressed in a plant or seed.
  • the plant or seed is a soybean plant or soybean seed.
  • a “recombinant DNA construct” comprises two or more operably linked DNA segments which are not found operably linked in nature.
  • Non-limiting examples of recombinant DNA constructs include a polynucleotide of interest operably linked to heterologous sequences, also referred to as “regulatory elements,” which aid in the expression, autologous replication, and/or genomic insertion of the sequence of interest.
  • Such regulatory elements include, for example, promoters, termination sequences, enhancers, etc., or any component of an expression cassette; a plasmid, cosmid, virus, autonomously replicating sequence, phage, or linear or circular single-stranded or double-stranded DNA or RNA nucleotide sequence; and/or sequences that encode heterologous polypeptides.
  • the modified glycinin described herein can be provided for expression in a plant of interest or an organism of interest.
  • the cassette can include 5' and 3' regulatory sequences operably linked to a modified glycinin polynucleotide.
  • the promoter of the recombinant DNA constructs of the invention can be any type or class of promoter known in the art, such that any one of a number of promoters can be used to express the various modified glycinin sequences disclosed herein, including the native promoter of the polynucleotide sequence of interest.
  • the promoters for use in the recombinant DNA constructs of the invention can be selected based on the desired outcome.
  • the polynucleotides encoding the modified glycinin polypeptides described herein are provided in expression cassettes (e.g., a plasmid, cosmid, virus, autonomously replicating sequence, phage, or linear or circular single-stranded or doublestranded DNA or RNA nucleotide sequence) for expression in a plant of interest or any organism of interest.
  • the cassette can include 5' and 3' regulatory sequences operably linked to a polynucleotide encoding the modified glycinin polypeptide.
  • the cassette may additionally contain at least one additional gene to be cotransformed into the organism. Alternatively, the additional gene(s) can be provided on multiple expression cassettes.
  • Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotide encoding the modified glycinin polypeptide to be under the transcriptional regulation of the regulatory regions.
  • the expression cassette may additionally contain selectable marker genes.
  • the expression cassette can include in the 5'-3 ' direction of transcription, a transcriptional and translational initiation region (e.g., a promoter), a polynucleotide encoding the modified glycinin polypeptide, and a transcriptional and translational termination region (e.g., termination region) functional in plants.
  • the regulatory regions e.g., promoters, transcriptional regulatory regions, and translational termination regions
  • the polynucleotide encoding the modified glycinin polypeptide may be native/analogous to the host cell or to each other.
  • the regulatory regions and/or the polynucleotide encoding the modified glycinin polypeptide may be heterologous to the host cell or to each other.
  • the termination region may be native with the transcriptional initiation region, with the plant host, or may be derived from another source (i.e., foreign or heterologous) than the promoter, the polynucleotide encoding the modified glycinin polypeptide, the plant host, or any combination thereof.
  • the expression cassette may additionally contain a 5' leader sequences.
  • leader sequences can act to enhance translation.
  • Translation leaders are known in the art and include viral translational leader sequences.
  • the expression cassette can comprise a selectable marker gene for the selection of transformed cells.
  • Selectable marker genes are utilized for the selection of transformed cells or tissues.
  • Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compounds, such as glyphosate, glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D).
  • NEO neomycin phosphotransferase II
  • HPT hygromycin phosphotransferase
  • genes conferring resistance to herbicidal compounds such as glyphosate, glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D).
  • the various DNA fragments may be manipulated, to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
  • adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
  • in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
  • the nucleic acid construct or expression cassette, described herein is expressed in a plant or seed.
  • the plant or seed is a soybean plant or soybean seed.
  • the nucleic acid constructs or expression cassettes disclosed herein may be used for transformation of any plant species.
  • the present disclosure further provides plants, plant parts, plant cells and seeds expressing any of the modified glycinin polypeptides described herein.
  • the term “plant” includes plant protoplasts, plant cell tissue cultures from which plants can be regenerated, plant calli, plant clumps, and plant cells that are intact in plants or parts of plants such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, and the like. Grain is intended to mean the mature seed produced by commercial growers for purposes other than growing or reproducing the species. Progeny, variants, and mutants of the regenerated plants are also included within the scope of the disclosure, provided that these parts comprise the polynucleotide encoding the modified glycinin polypeptide.
  • the plant species of the compositions and methods of the present disclosure can be any plant species for which improvement of the nutrition value (e.g., increased methionine content or increased essential amino acid content) is desired, including, but not limited to, monocots and di cots.
  • improvement of the nutrition value e.g., increased methionine content or increased essential amino acid content
  • Examples of plant species of interest include, but are not limited to, maize (Zea mays), Brassica sp. (e.g., B. napus, B. rapa, B.juncea), alfalfa (Medicago sativa), rice (Oryza saliva), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum), peanuts (Arachis hypogaea), coconut (Cocos nucifera), olive (Olea europaea),
  • plants of the compositions and methods described herein are plants used to produce protein compositions for animal feed or human food including, but not limited to, legume crop species, (including, but not limited to, alfalfa, clover or trefoil, pea, including, pigeon pea, cowpea and Lathyrus spp., bean (Fabaceae ox Leguminosae), lentil, lupin, mesquite, carob, soybean, peanut, or tamarind), safflower, sunflower, Brassica, maize, palm, and coconut.
  • legume crop species including, but not limited to, alfalfa, clover or trefoil, pea, including, pigeon pea, cowpea and Lathyrus spp.
  • bean Fabaceae ox Leguminosae
  • lentil lupin
  • mesquite carob
  • soybean peanut, or tamarind
  • safflower sunflower
  • Brassica Brassica
  • maize maize
  • the plants of the compositions and methods described herein is a legume crop species, including, but not limited to, alfalfa (Medicago sativa),' clover or trefoil (Trifolium sppf, pea, including (Pisum satinum), pigeon pea (Cajanus cajan), cowpea (Cigna unguiculata) and Lathyrus spp.
  • alfalfa Medicago sativa
  • Trifolium sppf Trifolium sppf
  • pea including (Pisum satinum), pigeon pea (Cajanus cajan), cowpea (Cigna unguiculata) and Lathyrus spp.
  • the plants of the compositions and methods described herein are elite plant lines (e.g., elite soybean line or elite pea line).
  • elite line refers to any line that has resulted from breeding and selection for superior agronomic performance that allows a producer to harvest a product of commercial significance. Numerous elite lines are available and known to those of skill in the art of plant breeding (e.g., soybean breeding).
  • the seeds or seeds of the plants of the compositions and methods described herein comprise at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methionine on a dry weight of seed basis, as compared to a control seed (e.g., seed not comprising the modified glycinin polypeptide).
  • a control seed e.g.,
  • the seed or seeds of the plants of the compositions and methods described herein comprises at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight of seed basis.
  • total methionine refers to the overall quantity of methionine both as free methionine and proteogenic methionine (methionine as part of a protein or polypeptide), such that total methionine equals free methionine plus proteogenic methionine.
  • a total amino acid such as cysteine, threonine, lysine or tryptophan refers to the overall quantity of each amino acid, both as free amino acid and proteogenic amino acid (amino acid as part of a protein or polypeptide).
  • Total methionine or other amino acid in the seed can be expressed on a dry weight basis of the seed.
  • the seeds or seeds of the plants of the composition and methods described herein comprise at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a control seed, as compared to a corresponding control seed (e.g., seed or seed of a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a control seed e.g., seed or seed of a plant not comprising the modified glycinin polypeptide
  • pp percentage point
  • a modified seed may contain 20% by weight of a component and the corresponding unmodified control seed may contain 15% by weight of that component.
  • the difference in the component between the control and transgenic seed would be expressed as 5 percentage points.
  • the seeds or seeds of the plants of the composition and methods described herein comprise a modified glycinin content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of corresponding control seed (e.g., wild-type glycinin from a seed or seed of a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • corresponding control seed e.g., wild-type glycinin from a seed or seed of a plant not comprising the modified glycinin polypeptide
  • At least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed or seed of the plants of the composition and methods described herein comprises a modified glycinin polypeptide described herein and the seed or seed of the plant comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a corresponding control seed (e.g., seed or seed of a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a corresponding control seed e.g., seed or seed of a plant not comprising the modified glycinin polypeptide
  • the plants or plants generated from the plant parts, plant cells and seeds of the compositions and method described herein have a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the introduced mutations.
  • yield refers to the amount of agricultural production harvested per unit of land and may include reference to bushels per acre or kilograms per hectare of a crop at harvest, as adjusted for grain moisture. Grain moisture is measured in the grain at harvest. The adjusted test weight of grain is determined to be the weight in pounds per bushel or kilogram, adjusted for grain moisture level at harvest.
  • a polynucleotide encoding a modified glycinin polypeptide described herein is introduced in an endogenous glycinin gene (e.g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy8) locus.
  • the modified glycinin polypeptide is introduced in an endogenous glycinin gene locus by modifying the endogenous glycinin gene (e g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy 8) sequence to encode a modified glycinin protein described herein.
  • the modified glycinin polypeptide is introduced in an endogenous glycinin gene (e.g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy8) locus by replacing the endogenous glycinin gene with a polynucleotide encoding a modified glycinin polypeptide described herein.
  • an endogenous glycinin gene e.g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy8 locus
  • the modified glycinin polypeptide is introduced in an endogenous glycinin gene (e.g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy 8) locus by replacing a polynucleotide encoding an endogenous glycinin protein with a polynucleotide encoding a modified glycinin polypeptide described herein, such that the polynucleotide encoding a modified glycinin polypeptide is operably linked to the endogenous glycinin gene promoter.
  • Gyl, Gy2, and Gy4 are the highest expressing glycinins in seed at each of 25, 28, 35, and 42 days after flowering.
  • the polynucleotide encoding a modified glycinin polypeptide described herein is introduced at a Gyl, Gy2, or Gy4 gene locus.
  • a polynucleotide encoding a modified glycinin polypeptide is introduced in two or more endogenous glycinin gene loci (e.g., two or more of Gyl, Gy2, Gy 3, Gy 4, Gy 5, Gy6, Gy 7, and Gy8).
  • the encoded modified glycinin polypeptide at the two or more endogenous glycinin gene loci comprise the same amino acid sequence.
  • the encoded modified glycinin polypeptide at the two or more endogenous glycinin gene loci comprise different amino acid sequences.
  • the modified glycinin polypeptides may comprise the same sequence, different sequences, or a combination thereof.
  • Tn certain two or more endogenous glycinin gene loci comprises one or more of Gyl, Gy2, or Gy4 gene locus.
  • multiple copies of a polynucleotide encoding a modified glycinin polypeptide can be introduced at a glycinin locus, such that, for example 2, 3, 4, 5, 10, 20, 30 or more copies are introduced at a locus or at multiple glycinin loci.
  • the polynucleotide encoding the modified glycinin polypeptide is introduce into a glycinin gene locus using a genome modification enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), and engineered site-specific meganucleases.
  • a genome modification enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), and engineered site-specific meganucleases.
  • a polynucleotide encoding a modified glycinin polypeptide described herein is introduced in at a non-native locus (e.g., a locus other than Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, or Gy 8) in the plant.
  • the non-native locus is not particularly limited such that any locus in which leads to the expression of the modified glycinin polypeptide is acceptable.
  • the polynucleotide encoding a modified glycinin polypeptide is introduced into one or more beta-conglycinin loci described herein, such that in certain embodiment at least one (e.g., 1, 2, 3, 4, 5, 6, or 7) endogenous beta-conglycinin locus is replaced with a polynucleotide encoding a modified glycinin polypeptide.
  • At least one (e.g., 1, 2, or 3) beta-conglycinin loci on chromosome 10 such as, for example, Glyma.10g246300, Glyma.l0g246500, and Glyma.10g246400, is replaced with a polynucleotide encoding a modified glycinin polypeptide.
  • each of eGlyma.10g246300, Glyma.10g246500, and Glyma.10g246400 is replaced with a polynucleotide encoding a modified glycinin polypeptide.
  • At least one (e.g., 1, 2, 3, or 4) beta-conglycinin loci on chromosome 20 such as, for example, Glyma.20gl48200, Glyma.20gl48300, Glyma.20gl48400, and Glyma.20gl46200, is replaced with a polynucleotide encoding a modified glycinin polypeptide.
  • at least one locus on chromosome 20 and at least one locus on chromosome 10 is replaced with a polynucleotide encoding a modified glycinin polypeptide.
  • multiple copies of a polynucleotide encoding a modified glycinin polypeptide can be introduced at the at least one (e.g., 1, 2, 3, 4, 5, 6, or 7) endogenous beta-conglycinin locus.
  • the multiple copies can encode the same modified glycinin, different modified glycinins, or a combination thereof.
  • multiple copies of a polynucleotide encoding a modified glycinin polypeptide can be introduced at a non-native locus, such that, for example 2, 3, 4, 5, 10, 20, 30 or more copies are introduced at a locus or at multiple glycinin loci.
  • plants may comprise a polynucleotide encoding a modified glycinin polypeptide at multiple non-native loci, at a non-native loci and a glycinin locus, multiple nonnative loci and a glycinin locus, or multiple non-native loci and multiple glycinin loci.
  • the polynucleotide encoding the modified glycinin polypeptide is introduce into a non-native locus using a genome modification enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), and engineered site-specific meganucleases.
  • a genome modification enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), and engineered site-specific meganucleases.
  • the polynucleotides encoding the modified glycinin polypeptides are introduced into the plant, plant cell, plant part, ore seed using a nucleic acid construct, or an expression cassette described herein.
  • the polynucleotides are stably expressed in the plant, plant cell, plant part, or seed using a stable transformation technique.
  • the plant cells or plant parts are grown into plants. The method for generating the plants from the plant cells or plant parts is not particularly limited. These plants can then be grown, and either pollinated with the same strain or different strains, and the resulting progeny having constitutive expression of the desired phenotypic characteristic identified.
  • Two or more generations can be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved.
  • the transformed seed, genome modified seed, or transgenic seed having a modified glycinin, nucleotide construct, or an expression cassette is stably incorporated into their genome.
  • the plant, plant part, plant cell or seed described herein further comprises at least one additional modification, the at least one additional modification associated with increased protein content, increased seed glycinin content, increased methionine, or any combination thereof.
  • the at least one additional modification is selected from the group consisting of a modification decreasing the expression of beta-conglycinin, a modification increasing the expression or activity of at least one cystathionine-gamma-synthase (CGS) polypeptide (e.g.
  • GM-CGS1 and/or GM-CGS2) a modification decreasing the expression and/or activity of methionine gamma-lyase (MGL), a modification decreasing the expression and/or activity of lysine-ketoglutarate reductase /Saccharopine dehydrogenase (LKR/SDH), a modification increasing the activity of dihydrodipicolinate synthase (DHPS), a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 1 (BS1) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 2 (BS2) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Sugars Will Eventually be Exported Transporter (SWT) polypeptide, a modification increasing expression or activity of
  • the plant, plant part, plant cell or seed comprises at least one of a modification decreasing the expression of beta-conglycinin, a modification increasing the expression or activity of at least one CGS polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, or any combination thereof.
  • the plant, plant part, plant cell or seed comprises a modification decreasing the expression of beta-conglycinin, a modification increasing the expression or activity of CGS1, CGS2, or both, a modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, and a polynucleotide encoding a modified glycinin polypeptide described herein, such as, for example, a polynucleotide encoding a modified glycinin polypeptide having at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity with SEQ ID NO: 18-86.
  • SEQ ID NO: 18-86 amino acid sequence
  • P-conglycinin also referred to herein as conglycinin
  • conglycinin the abundant 7S globulin storage protein
  • glycinin consist of about 21% and 33% of total protein content, respectively (Utsumi et al., Food Science and Technology, 257-292 (1997)).
  • Total soybean protein content did not change after silencing a and a’ subunits of P-conglycinin by RNAi (Kinney et al., The Plant Cell, 13, 623-629 (2001)).
  • the resulting engineered seeds accumulated more glycinin that accounts for more than 50% of total seed protein, which compensated for the missing P-conglycinin in the engineered seeds.
  • P-conglycinin consists of 3 isoforms, a, a’ and p. Among them, only a and a’ contain Met and Trp residues in the mature protein. Glycinin has 5 isoforms, all of which have higher Met and Trp content compared to these of P-conglycinin (Utsumi et al., Food Science and Technology, 257-292 (1997)).
  • the modification decreasing the expression of beta-conglycinin comprises a knockdown or knockout of one or more (e.g., 2 or more, 3 or more, or 4 or more) isoforms of a beta-conglycinin gene.
  • the one or more isoforms of the beta-conglycinin gene encodes a beta-conglycinin isoform comprising an amino acid sequence that is at least 90% identical to any one of SEQ ID NOs: 185-191.
  • the modification decreasing the expression of beta-conglycinin comprises a knockdown or knockout of the beta-conglycinin genes encoding the beta-conglycinin isoforms comprising SEQ ID NOs: 185-191.
  • the modification decreasing the expression of beta-conglycinin comprises the introduction of an inverted repeat sequence in the conglycinin gene cluster on chromosome 10, chromosome 20, or both, such as, for example, an inverted repeat sequence described in WO2025049884.
  • the knockdown or knockout uses RNAi such as, for example, a dominate hairpin RNAi.
  • the ratio of glycinin to conglycinin in the soybean plant or soybean seed is at least 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 65, 70, 75, 80, 85, 90, 95, 100, 500 or 1000 to 1.
  • Cystathionine-gamma-synthase catalyzes the formation of cystathionine that is subsequently converted to homocysteine and finally to methionine (Kreft et al., Plant Physiology, 131, 1843-1854 (2003)).
  • Methionine the product of CGS, functions not only as a protein storage component, but also as a metabolite in plant cells. In Arabidopsis, CGS expression is regulated at the level of mRNA stability as a feedback mechanism from its product, such as Met or its metabolites.
  • Exon 1 of CGS acts as a cis regulatory element to down-regulate its own mRNA stability in response to excess accumulation of Met (Chiba et al., Science, 286, 1371-1374 (1999)).
  • Met Chiba et al., Science, 286, 1371-1374 (1999)
  • GM-CGS1 Glyma.09g235400
  • GM-CGS2 Glyma.18g261600
  • the method for increasing the expression or activity of a CGS polypeptide comprises a targeted genetic modification that removes a self-regulatory domain of a CGS gene.
  • the CGS self-regulatory domain encodes a polypeptide comprising an amino acid sequence that is at least 80% identical to SEQ ID NO: 192.
  • the modified CGS gene encodes a CGS protein comprising an amino acid sequence that is at least 70% identical to any one of SEQ ID NOs: 195-197.
  • Lysine is one of the essential amino acids that are present in limiting amounts in crop seeds.
  • the lysine biosynthetic pathway is feedback inhibited by lysine at a rate limiting step, catalyzed by dihydrodipicolinate synthase (DHPS).
  • DHPS dihydrodipicolinate synthase
  • Seed specific expression of feedback insensitive bacterial DHPS enzyme in various plants resulted in significant seed lysine over production (Falco et al., Bio/Technology, 13, 577-582 (1995); Mazur et al., Science, 285, 372- 375 (1999)).
  • the enhanced lysine production may be associated with increased activity of the lysine catabolic enzyme, such as the bi-functional enzyme Lysine-ketoglutarate reductase /Saccharopine dehydrogenase (LKR/SDH) and enhanced levels of lysine catabolic products (Falco et al., Bio/Technology, 13, 577-582 (1995); Mazur et al., Science, 285, 372-375 (1999)).
  • LLR/SDH Lysine-ketoglutarate reductase /Saccharopine dehydrogenase
  • the modification decreasing the expression and/or activity of LKR/SDH may be any modification known in the art such as, for example, a modification described in
  • the modification increasing the activity of DHPS comprises a targeted genetic modification of the DHPS gene to remove a feedback inhibition domain of a DHPS gene.
  • the modification decreasing the expression, activity, and/or stability of an endogenous MGL polypeptide comprises a knockdown or knockout of an endogenous MGL gene, the endogenous MGL gene encoding an MGL polypeptide comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 182-184.
  • the endogenous MGL gene comprises a nucleic acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 216-218.
  • MFT phosphatidylethanolamine binding protein
  • the modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide comprises a modification introducing a polymorphism in an endogenous MFT gene to encode a modified MFT polypeptide, the modified MFT polypeptide comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NO: 167 or 168 and comprises a modification at a position other than the amino acid corresponding to LI 06 in SEQ ID NO: 167.
  • the modified MFT polypeptide comprises a nonleucine at the amino acid residue corresponding to position L140 of SEQ ID NO: 167. In certain embodiments, the modified MFT polypeptide comprises a non-threonine at the amino acid residue corresponding to position T82 of SEQ ID NO: 167. In certain embodiments, the modified MFT polypeptide comprises both a non-leucine at the amino acid residue corresponding to position L140 of SEQ ID NO: 167 and a non-threonine at the amino acid residue corresponding to position T82 of SEQ ID NO: 167.
  • the modified MFT polypeptide comprises an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 168.
  • the modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide comprises a knockout of an endogenous MFT gene, the endogenous MFT gene encoding a MFT polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 167 or 168.
  • the modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide comprises a knockdown of an endogenous MFT gene, the endogenous MFT gene encoding a MFT polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 167.
  • the plant cell e.g., legume plant cell, soybean cell or pea cell
  • seed e.g., legume seed, soybean seed or pea seed
  • the plant e.g., legume plant, soybean plant or pea plant
  • the modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide comprises a knockout of an endogenous CCT gene, the endogenous CCT gene encoding a CCT protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 169 or 170.
  • the modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide comprises a knockdown of an endogenous CCT gene, the endogenous CCT gene encoding a CCT protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 169 or 170.
  • the plant cell e.g., legume plant cell, soybean cell or pea cell
  • seed e.g., legume seed, soybean seed or pea seed
  • the plant e.g., legume plant, soybean plant or pea plant
  • the modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide is introduced by introgressing a high protein CCT QTL, such as, for example, the high protein CCT QTL from PI678444 or PI678444 (e.g., SEQ ID NO: 204 (DNA) and 170 (protein)).
  • a high protein CCT QTL such as, for example, the high protein CCT QTL from PI678444 or PI678444 (e.g., SEQ ID NO: 204 (DNA) and 170 (protein)).
  • Glyma.10g244400 was named as GmBSl and glyma.20Gl 50000 was named as GmBS2.
  • GmBSl and GmBS2 show 70.9% identity and 71 .4% identity to Medicago BS1, respectively.
  • the modification decreasing expression, activity, and/or stability of an endogenous BS1 polypeptide comprises a knockout of an endogenous BS1 gene, the BS1 gene encoding a BS1 protein comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 165.
  • the knockout of the endogenous BS1 gene is generated by introducing a frame-shift mutation in the endogenous BS1 gene.
  • the knockout of the endogenous BS1 gene is generated by introducing a modification removing the endogenous BS1 gene.
  • the modification decreasing expression, activity, and/or stability of an endogenous BS1 polypeptide comprises a knockdown of an endogenous BS1 gene, the BS1 gene encoding a BS1 protein comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 165.
  • the knockdown of the endogenous BS1 gene is generated by introducing an in-frame deletion in an endogenous BS1 gene. In certain embodiments, the knockdown of the endogenous BS1 gene is generated by introducing a modification in the endogenous promoter and/or replacing the endogenous promoter with promoter that results in decreased gene expression as compared to the endogenous promoter. In certain embodiments, the knockdown of the endogenous BS1 gene is generated by seed specific silencing of the gene such as, for example, editing an endogenous seed specific miRNA to target the endogenous BS1 gene (WO2021150469).
  • the modification decreasing expression, activity, and/or stability of an endogenous BS1 polypeptide comprises a modification introducing a polymorphism in an endogenous BS1 gene to encode a modified BS1 polypeptide, the modified BS1 polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 165 and comprises at least one amino acid substitution as compared to SEQ ID NO: 165.
  • the plant cell e.g., legume plant cell, soybean cell or pea cell
  • seed e.g., legume seed, soybean seed or pea seed
  • the plant e.g., legume plant, soybean plant or pea plant
  • the modification decreasing expression, activity, and/or stability of an endogenous BS2 polypeptide comprises a knockout of an endogenous BS2 gene, the BS2 gene encoding a BS2 protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 166.
  • the knockout of the endogenous BS2 gene is generated by introducing a frame-shift mutation in the endogenous BS2 gene. In certain embodiments, the knockout of the endogenous BS2 gene is generated by introducing a mutation removing the endogenous BS2 gene.
  • the modification decreasing expression, activity, and/or stability of an endogenous BS2 polypeptide comprises a knockdown of an endogenous BS2 gene, the BS2 gene encoding a BS2 protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 166.
  • the plant cell e.g., legume plant cell, soybean cell or pea cell
  • seed e.g., legume seed, soybean seed or pea seed
  • the plant e.g., legume plant, soybean plant or pea plant
  • the knockdown of the endogenous BS2 gene is generated by introducing an in-frame deletion in an endogenous BS2 gene.
  • the knockdown of the endogenous BS2 gene is generated by introducing a modification in the endogenous promoter and/or replacing the endogenous promoter with promoter that results in decreased gene expression as compared to the endogenous promoter.
  • the knockdown of the endogenous BS2 gene is generated by seed specific silencing of the gene such as, for example, editing an endogenous seed specific miRNA to target the endogenous BS2 gene (WO2021150469).
  • the modification decreasing expression, activity, and/or stability of an endogenous BS2 polypeptide comprises a modification introducing a polymorphism in an endogenous BS2 gene to encode a modified BS2 polypeptide, the modified BS2 polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 166 and comprises at least one amino acid substitution as compared to SEQ ID NO: 166.
  • SWTs Exported Transporters
  • Carbohydrates are translocated from source to seeds through seed coat.
  • SWTs expressed in the seed coat facilitate sucrose efflux from the seed coat to developing seed cotyledons, affecting seed development.
  • the SWT gene is selected from the group consisting of SWT4, SWT5, SWT16, SWT17, SWT24, SWT39.
  • the modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide comprises a knockout of an endogenous SWT gene, the endogenous SWT gene encoding a SWT4 polypeptide, SWT5 polypeptide, SWT 16 polypeptide, SWT 17 polypeptide, SWT24 polypeptide, SWT39 polypeptide, or any combination thereof.
  • the endogenous SWT gene encodes a polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 172-177.
  • the modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide comprises a knockdown of an endogenous SWT gene, the endogenous SWT gene encoding a SWT polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to any one of SEQ ID NOs: 172-177.
  • the plant cell e.g., legume plant cell, soybean cell or pea cell
  • seed e.g., legume seed, soybean seed or pea seed
  • the plant e.g., legume plant, soybean plant or pea plant
  • the modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide is introduced by introgressing a high protein SWT QTL.
  • the high protein SWT QTL is a SWT39 high protein QTL (SEQ ID NO: 171), such as, for example, the SWT39 QTL from PI678444.
  • the modification increasing expression or activity of a AB 13 comprises introducing a genetic modification in the endogenous gene sequence.
  • the genetic modification is in the promoter region, for example, a promoter swap so that the endogenous gene is operably linked to a heterologous promoter.
  • the modification increasing expression or activity of ABI3 comprises introducing a nucleic acid construct comprising a polynucleotide encoding the ABI3 polypeptide operably linked to a heterologous regulatory element.
  • the polynucleotide encodes a polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 178 or 179.
  • the ABI3 comprises a nucleic acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 212 or 213.
  • the modification increasing expression or activity of a ODP1 comprises introducing a genetic modification in the endogenous gene sequence.
  • the genetic modification is in the promoter region, for example, a promoter swap so that the endogenous gene is operably linked to a heterologous promoter.
  • the modification increasing expression or activity of ODP1 comprises introducing a nucleic acid construct comprising a polynucleotide encoding the ODP1 polypeptide operably linked to a heterologous regulatory element.
  • the polynucleotide encodes a polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 180 or 181.
  • the ODP1 comprises a nucleic acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 214 or 215.
  • Raffinose family oligosaccharides are alpha-galactosyl derivatives of sucrose, and include, for example, raffinose and stachyose.
  • RFOs are anti -nutritional factors that reduce metabolizable energy, cause poor digestibility, and an increase in flatulence and diarrhea in monogastric animals.
  • raffinose family oligosaccharides (RFO) content refers to the content of raffinose and stachyose.
  • the RFO content can be measured using methods known in the art such as those described in US Patent Publication No. 2019-0383733.
  • the modification decreasing RFO content comprises a decrease in the expression, activity and/or stability of a raffinose synthase.
  • the modification comprises a decrease in the expression, activity and/or activity of raffinose synthase 2 (RS2), raffinose synthase 3 (RS3) and/or raffinose synthase 4 (RS4).
  • the plant (e.g., legume, soybean or pea) seed comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of RS2, RS4, or RS2 and RS4, as compared to a control seed.
  • the modification decreasing expression, activity, and/or stability of an endogenous RS polypeptide comprises a knockout of an endogenous RS gene, the endogenous RS gene encoding a RS polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 198.
  • the modification decreasing expression, activity, and/or stability of an endogenous RS polypeptide comprises a knockdown of an endogenous RS gene, the endogenous RS gene encoding an RS polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 198.
  • the cell, seed, or plant comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of a RS, as compared to a control seed.
  • “decrease in expression” “decreased expression” or the like refers to any detectable reduction in expression of a gene and/or the corresponding polypeptide.
  • “decrease in activity” “decreased activity” or the like refers to any detectable reduction in the activity (e.g., enzymatic activity) of the encoded polypeptide.
  • the method by which the expression or activity of a gene or polypeptide described herein is decreased is not particularly limited and can be done using methods known in the art such at RNAi, gene knockdown, gene knockout, or targeted amino acid modification.
  • gene knockout is used to refer to gene in which there is no detectable expression of the mRNA or protein encoded by the gene
  • gene knockdown is used to refer to a gene in which there is reduced expression of the mRNA or protein encoded by the gene.
  • decreased expression encompasses both gene knockout and gene knockdown.
  • a “targeted” genetic modification refers to the direct manipulation of an organism’s genes.
  • the targeted modification may be introduced using any technique known in the art, such as, for example, plant breeding, genome editing, or single locus conversion.
  • “increase in activity” refers to any detectable gain in activity (e g., enzymatic activity) of the polypeptide.
  • the method by which the activity of a polypeptide described herein is decreased is not particularly limited and can be done using methods known in the art such as increasing expression of the gene encoding the polypeptide (e.g., transgenic expression, promoter swap, gene modification) or a targeted modification the gene encoding the polypeptide to, for example, remove a self-regulatory domain.
  • the disclosure also provides protein compositions (e.g., soy protein composition or pea protein composition) comprising any of the modified glycinin polypeptides described herein.
  • the protein composition comprises a methionine content of at least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20%.
  • the methionine content in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the sum of the methionine and tryptophan in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the sum of the methionine, lysine, threonine and tryptophan in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • protein composition refers to food ingredients for humans or animals which contain plant proteins, such as, for example, legume proteins (e.g., soy protein or pea protein).
  • the composition is an animal feed composition.
  • the composition is a human food composition.
  • the human food composition is a composition selected from the group consisting of soybean meal; soy flour; defatted soy flour; soymilk; spray-dried soymilk; soy protein concentrate; texturized soy protein concentrate; hydrolyzed soy protein; soy protein isolate; spray-dried tofu; soy meat analog; soy cheese analog; and soy coffee creamer.
  • the disclosure also provides methods for producing a plant producing seed having increased methionine content comprising introducing into a regenerable plant cell a polynucleotide encoding any of the modified glycinin polypeptides described herein; and generating the plant, wherein the plant comprises the polynucleotide encoding the modified glycinin polypeptide and produces a seed having an increased amount of methionine as compared to seed of a plant not comprising the modified glycinin polypeptide.
  • the method further comprises introducing at least one additional modification associated with increased glycinin, increased methionine, increased protein, or any combination thereof, such as for example, the modifications described herein.
  • the method for introducing the at least one additional modification may be any method known in the art or described herein.
  • the at least one additional modification is introduced by genome editing or transformation.
  • the at least one additional modification is introduced by crossing the plant produced by the method with a second plant comprising the at least one additional modification, harvesting the seed produced thereby, and generating a progeny plant, the progeny plant comprising the modified glycinin polypeptide and the at least one additional modification.
  • the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methionine on a dry weight of seed basis, as compared to a seed from a control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide).
  • a control plant e.g.
  • the seed of the plant produced by the method at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight basis.
  • the seed of the plant produced by the method comprises at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a corresponding control plant e.g., a plant not comprising the modified glycinin polypeptide
  • the seed of the plant produced by the method comprises a modified glycinin content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a corresponding control plant e.g., wild-type glycinin from a plant not comprising the modified glycinin polypeptide
  • At least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the plant produced by the method comprises a modified glycinin polypeptide described herein and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a corresponding control plant e.g., seed from a plant not comprising the modified glycinin polypeptide
  • the method generates plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the introduced mutations.
  • Various methods can be used to introduce the polynucleotide sequences into a plant, plant part, plant cell, seed, and/or grain.
  • "Introducing" is intended to mean presenting to the plant, plant cell, seed, and/or grain the inventive polynucleotide or resulting polypeptide in such a manner that the sequence gains access to the interior of a cell of the plant.
  • the methods of the disclosure do not depend on a particular method for introducing a sequence into a plant, plant cell, seed, and/or grain, only that the polynucleotide or polypeptide gains access to the interior of at least one cell of the plant.
  • the method for introducing the modified glycinin polypeptide and/or the modification associated with increased glycinin, increased methionine, and/or increased protein comprises transforming a regenerable plant cell with a nucleic acid construct or expression cassette comprising a polynucleotide described herein.
  • the transformation technique of the methods is not particularly limited and includes both stable transformation methods and transient transformation methods.
  • Stable transformation is intended to mean that the polynucleotide introduced into a plant integrates into the genome of the plant of interest and is capable of being inherited by the progeny thereof.
  • Transient transformation is intended to mean that a polynucleotide is introduced into the plant of interest and does not integrate into the genome of the plant or organism, or a polypeptide is introduced into a plant or organism.
  • Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing polypeptides and polynucleotides into plant cells include microinjection (Crossway et al. (1986) Biotechmques 4:320-334), electroporation (Riggs et al. (1986) Proc. Natl. Acad. Sci. USA 83:5602-5606), 4 ⁇ q/Y> 7c/c7c/7z//7?-mediated transformation (U.S. Patent No. 5,563,055 and U.S. Patent No.
  • Methods are known in the art for the targeted insertion of a polynucleotide at a specific location in the plant genome.
  • the insertion of the polynucleotide at a desired genomic location is achieved using a site-specific recombination system. See, for example, WO99/25821, WO99/25854, WO99/25840, WO99/25855, and WO99/25853, all of which are herein incorporated by reference.
  • the polynucleotide disclosed herein can be contained in a transfer cassette flanked by two non-recombinogenic recombination sites.
  • the transfer cassette is introduced into a plant having stably incorporated into its genome a target site which is flanked by two non-recombinogenic recombination sites that correspond to the sites of the transfer cassette.
  • An appropriate recombinase is provided, and the transfer cassette is integrated at the target site.
  • the polynucleotide of interest is thereby integrated at a specific chromosomal position in the plant genome.
  • the expression cassette containing the inventive polynucleotide is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
  • Parts obtained from the regenerated plants described herein, such as flowers, seeds, leaves, branches, fruit, and the like are included, provided that these parts comprise cells comprising the inventive polynucleotide.
  • Progeny and variants, and mutants of the regenerated plants are also included, provided that these parts comprise the introduced nucleic acid sequences.
  • a homozygous transgenic plant can be obtained by sexually mating (selfing) a heterozygous transgenic plant that contains a single added heterologous nucleic acid, germinating some of the seed produced and analyzing the resulting plants produced. Back- crossing to a parental plant and out-crossing with a non-transgenic plant are also contemplated.
  • the method for introducing the modified glycinin polypeptide and/or the modification associated with increased glycinin, increased methionine, and/or increased protein, into the regenerable plant cell comprises using genome editing technologies. In certain embodiments, the method comprises editing the endogenous gene or a previously introduced gene.
  • the genome editing technology for use in the methods and compositions described herein is not particularly limited and may be any genome editing technique that allows for the modification or targeted introduction of the desired polynucleotide.
  • the genome editing technique uses an enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), or an engineered site-specific meganuclease.
  • an enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), or an engineered site-specific meganuclease.
  • the genome modification may be facilitated through the induction of a double-stranded break (DSB) or single-strand break, in a defined position in the genome near the desired alteration.
  • DSBs can be induced using any DSB-inducing agent available, including, but not limited to, TALENs, meganucleases, zinc finger nucleases, Cas9- gRNA systems (based on bacterial CRISPR-Cas systems), guided cpfl endonuclease systems, and the like.
  • the introduction of a DSB can be combined with the introduction of a polynucleotide modification template.
  • the method comprises: (a) providing a guide RNA, at least one polynucleotide modification template, and at least one Cas endonuclease to the regenerable plant cell, wherein the at least one Cas endonuclease introduces a double stranded break at an endogenous gene to be modified (e.g., beta-conglycinins, Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, Gy ) in the plant cell, and wherein the polynucleotide modification template generates a modified gene that encodes any of the modifications described herein; (b) obtaining a plant from the plant cell; and (c) generating a progeny plant.
  • an endogenous gene to be modified e.g., beta-conglycinins, Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, Gy
  • Double-strand breaks induced by double-strand-break-inducing agents can result in the induction of DNA repair mechanisms, including the non-homologous end-joining pathway, and homologous recombination.
  • Endonucleases include a range of different enzymes, including restriction endonucleases (see e.g. Roberts et al., (2003) Nucleic Acids Res 1:418-20), Roberts et al., (2003) Nucleic Acids Res 31 : 1805-12, and Belfort et al., (2002) in Mobile DNA II, pp.
  • NHEJ nonhomologous end-joining pathway
  • HDR homology-directed repair
  • site-specific base conversions can also be achieved to engineer one or more nucleotide changes to create one or more modifications described herein into the genome.
  • site-specific base edit mediated by an C»G to T»A or an A»T to G»C base editing deaminase enzymes
  • Gaudelli et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage.” Nature (2017); Nishida et al. “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.” Science 353 (6305) (2016); Komor et al. “Programmable editing of a target base in genomic DNA without double- stranded DNA cleavage.” Nature 533 (7603) (2016):420-4.
  • the endogenous gene may be modified by a CRISPR associated (Cas) endonuclease, a Zn-finger nuclease-mediated system, a meganuclease-mediated system, an oligonucleobase-mediated system, or any gene modification system known to one of ordinary skill in the art.
  • Cas CRISPR associated
  • the endogenous gene is modified by a CRISPR associated (Cas) endonuclease.
  • Cas CRISPR associated
  • Class I Cas endonucleases comprise multisubunit effector complexes (Types I, III, and IV), while Class 2 systems comprise single protein effectors (Types II, V, and VI) (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15; Zetsche et al., 2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular Cell 60, 1-13; Haft et al., 2005, Computational Biology, PLoS Comput Biol 1(6): e60; and Koonin et al. 2017, Curr Opinion Microbiology 37:67-78).
  • Class 2 Type II systems the Cas endonuclease acts in complex with a guide polynucleotide.
  • the Cas endonuclease forms a complex with a guide polynucleotide (e.g., guide polynucleotide/Cas endonuclease complex).
  • a guide polynucleotide e.g., guide polynucleotide/Cas endonuclease complex
  • the term “guide polynucleotide”, relates to a polynucleotide sequence that can form a complex with a Cas endonuclease, including the Cas endonucleases described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site.
  • the guide polynucleotide sequence can be an RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence).
  • the guide polynucleotide may further comprise a chemically modified base, such as, but not limited, to Locked Nucleic Acid (LNA), 5-methyl dC, 2,6-Diaminopurine, 2’-Fluoro A, 2’-Fluoro U, 2'-O-Methyl RNA, Phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5’ to 3’ covalent linkage resulting in circularization.
  • LNA Locked Nucleic Acid
  • 5-methyl dC 2,6-Diaminopurine
  • 2’-Fluoro A 2,6-Diaminopurine
  • 2’-Fluoro U 2,6-Diaminopurine
  • 2'-O-Methyl RNA Phosphorothioate bond
  • linkage to a cholesterol molecule linkage to a polyethylene glyco
  • the Cas endonuclease forms a complex with a guide polynucleotide (e.g., gRNA) that directs the Cas endonuclease to cleave the DNA target to enable target recognition, binding, and cleavage by the Cas endonuclease.
  • the guide polynucleotide e.g., gRNA
  • the guide polynucleotide may comprise a Cas endonuclease recognition (CER) domain that interacts with the Cas endonuclease, and a Variable Targeting (VT) domain that hybridizes to a nucleotide sequence in a target DNA.
  • CER Cas endonuclease recognition
  • VT Variable Targeting
  • the guide polynucleotide (e.g., gRNA) comprises a CRISPR nucleotide (crNucleotide; e.g., crRNA) and a trans-activating CRISPR nucleotide (tracrNucleotide; e.g., tracrRNA) to guide the Cas endonuclease to its DNA target.
  • the guide polynucleotide (e.g., gRNA) comprises a spacer region complementary to one strand of the double strand DNA target and a region that base pairs with the tracrNucleotide (e.g., tracrRNA), forming a nucleotide duplex (e.g. RNA duplex).
  • the gRNA is a “single guide RNA” (sgRNA) that comprises a synthetic fusion of crRNA and tracrRNA.
  • sgRNA single guide RNA
  • the Cas endonuclease-guide polynucleotide complex recognizes a short nucleotide sequence adjacent to the target sequence (protospacer), called a “protospacer adjacent motif’ (PAM).
  • PAM protospacer adjacent motif
  • single guide RNA and “sgRNA” are used interchangeably herein and relate to a synthetic fusion of two RNA molecules, a crRNA (CRISPR RNA) comprising a variable targeting domain (linked to a tracr mate sequence that hybridizes to a tracrRNA), fused to a tracrRNA (trans-activating CRISPR RNA).
  • CRISPR RNA crRNA
  • variable targeting domain linked to a tracr mate sequence that hybridizes to a tracrRNA
  • trans-activating CRISPR RNA trans-activating CRISPR RNA
  • the single guide RNA can comprise a crRNA or crRNA fragment and a tracrRNA or tracrRNA fragment of the type II CRISPR/Cas system that can form a complex with a type II Cas endonuclease, wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, optionally bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site.
  • the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA combination sequence.
  • the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can be at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
  • the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a tetraloop sequence, such as, but not limiting to a GAAA tetraloop sequence.
  • variable targeting domain or “VT domain” is used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target site.
  • the percent complementation between the first nucleotide sequence domain (VT domain) and the target sequence can be at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%, 67%,
  • variable targeting domain can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
  • variable targeting domain comprises a contiguous stretch of 12 to 30 nucleotides.
  • the variable targeting domain can be composed of a DNA sequence, an RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.
  • CER domain of a guide polynucleotide
  • CER domain includes a nucleotide sequence that interacts with a Cas endonuclease polypeptide.
  • a CER domain comprises a (trans-acting) tracrNucleotide mate sequence followed by a tracrNucleotide sequence.
  • the CER domain can be composed of a DNA sequence, an RNA sequence, a modified DNA sequence, a modified RNA sequence (see for example US20150059010A1, published 26 February 2015), or any combination thereof.
  • a “protospacer adjacent motif’ refers to a short nucleotide sequence adjacent to a target sequence (protospacer) that is recognized (targeted) by a guide polynucleotide/Cas endonuclease system described herein.
  • the Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not adjacent to, or near, a PAM sequence.
  • the PAM precedes the target sequence (e.g., Casl2a).
  • the PAM follows the target sequence (e g., S. pyogenes Cas9).
  • the sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used.
  • the PAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
  • guide polynucleotide/Cas endonuclease complex As used herein, the terms “guide polynucleotide/Cas endonuclease complex”, “guide polynucleotide/Cas endonuclease system”, “ guide polynucleotide/Cas complex”, “guide polynucleotide/Cas system” and “guided Cas system” “polynucleotide-guided endonuclease”, and “PGEN” are used interchangeably herein and refer to at least one guide polynucleotide and at least one Cas endonuclease, that are capable of forming a complex, wherein said guide polynucleotide/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce
  • a guide polynucleotide/Cas endonuclease complex herein can comprise Cas protein(s) and suitable polynucleotide component s) of any of the known CRISPR systems (Horvath and Barrangou, 2010, Science 327: 167-170; Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15; Zetsche et al., 2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular Cell 60, 1-13).
  • the guide polynucleotide/Cas endonuclease complex is provided as a ribonucleoprotein (RNP), wherein the Cas endonuclease component is provided as a protein and the guide polynucleotide component is provided as a ribonucleotide.
  • RNP ribonucleoprotein
  • Cas endonucleases for use in the methods described herein include, but are not limited to, Cas9 and Cpfl.
  • Cas9 (formerly referred to as Cas5, Csnl, or Csxl2) is a Class 2 Type II Cas endonuclease (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15).
  • a Cas9-gRNA complex recognizes a 3’ PAM sequence (NGG for the S. pyogenes Cas9) at the target site, permitting the spacer of the guide RNA to invade the double-stranded DNA target, and, if sufficient homology between the spacer and protospacer exists, generate a double-strand break cleavage.
  • Cas9 endonucleases comprise RuvC and HNH domains that together produce double strand breaks, and separately can produce single strand breaks.
  • the double-strand break leaves a blunt end.
  • Cpfl is a Clas 2 Type V Cas endonuclease, and comprises nuclease RuvC domain but lacks an HNH domain (Yamane et al., 2016, Cell 165:949-962). Cpfl endonucleases create “sticky” overhang ends.
  • Cas9-gRNA systems at a genomic target site include, but are not limited to, insertions, deletions, substitutions, or modifications of one or more nucleotides at the target site; modifying or replacing nucleotide sequences of interest (such as a regulatory elements); insertion of polynucleotides of interest; gene knock-out; gene-knock in; modification of splicing sites and/or introducing alternate splicing sites; modifications of nucleotide sequences encoding a protein of interest; amino acid and/or protein fusions; and gene silencing by expressing an inverted repeat into a gene of interest.
  • target site refers to a polynucleotide sequence such as, but not limited to, a nucleotide sequence on a chromosome, episome, a locus, or any other DNA molecule in the genome (including chromosomal, chloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which a guide polynucleotide/Cas endonuclease complex can recognize, bind to, and optionally nick or cleave .
  • the target site can be an endogenous site in the genome of a cell, or alternatively, the target site can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target site can be found in a heterologous genomic location compared to where it occurs in nature.
  • endogenous target sequence and “native target sequence” are used interchangeable herein to refer to a target sequence that is endogenous or native to the genome of a cell and is at the endogenous or native position of that target sequence in the genome of the cell.
  • An “artificial target site” or “artificial target sequence” are used interchangeably herein and refer to a target sequence that has been introduced into the genome of a cell.
  • Such an artificial target sequence can be identical in sequence to an endogenous or native target sequence in the genome of a cell but be located in a different position (i.e., a non-endogenous or non-native position) in the genome of a cell.
  • An “altered target site”, “altered target sequence”, “modified target site”, “modified target sequence” are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence.
  • Such “alterations” include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, or (iv) any combination of (i) - (iii).
  • a “polynucleotide modification template” is also provided that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited.
  • a nucleotide modification can be at least one nucleotide substitution, addition, deletion, or chemical alteration.
  • the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
  • a polynucleotide of interest is inserted at a target site and provided as part of a “donor DNA” molecule.
  • donor DNA is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas endonuclease.
  • the donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest.
  • the first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome.
  • the donor DNA can be tethered to the guide polynucleotide.
  • Tethered donor DNAs can allow for co- localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al., 2013, Nature Methods Vol. 10: 957-963).
  • the amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions.
  • the process for editing a genomic sequence at a Cas9-gRNA double-strand-break site with a modification template generally comprises: providing a host cell with a Cas9-gRNA complex that recognizes a target sequence in the genome of the host cell and is able to induce a double-strand-break in the genomic sequence, and at least one polynucleotide modification template comprising at least one nucleotide alteration when compared to the nucleotide sequence to be edited.
  • the polynucleotide modification template can further comprise nucleotide sequences flanking the at least one nucleotide alteration, in which the flanking sequences are substantially homologous to the chromosomal region flanking the double-strand break.
  • Genome editing using double-strand-break-inducing agents such as Cas9-gRNA complexes, has been described, for example in US20150082478 published on 19 March 2015, WO2015026886 published on 26 February 2015, W02016007347 published 14 January 2016, and W02016025131 published on 18 February 2016.
  • the gene comprising the Cas endonuclease may be optimized as described in WO2016186953 published 24 November 2016, and then delivered into cells as DNA expression cassettes by methods known in the art.
  • the Cas endonuclease is provided as a polypeptide.
  • the Cas endonuclease is provided as a polynucleotide encoding a polypeptide.
  • the guide RNA is provided as a DNA molecule encoding one or more RNA molecules.
  • the guide RNA is provided as RNA or chemically modified RNA.
  • the Cas endonuclease protein and guide RNA are provided as a ribonucleoprotein complex (RNP).
  • the endogenous gene is modified by a zinc-finger-mediated genome editing process.
  • the zinc-finger-mediated genome editing process for editing a chromosomal sequence includes for example: (a) introducing into a cell at least one nucleic acid encoding a zinc finger nuclease that recognizes a target sequence in the chromosomal sequence and is able to cleave a site in the chromosomal sequence, and, optionally, (i) at least one donor polynucleotide that includes a sequence for integration flanked by an upstream sequence and a downstream sequence that exhibit substantial sequence identity with either side of the cleavage site, or (ii) at least one exchange polynucleotide comprising a sequence that is substantially identical to a portion of the chromosomal sequence at the cleavage site and which further comprises at least one nucleotide change; and (b) culturing the cell to allow expression of the zinc finger nucle
  • a zinc finger nuclease includes a DNA binding domain (i.e., zinc finger) and a cleavage domain (i.e., nuclease).
  • the nucleic acid encoding a zinc finger nuclease may include DNA or RNA.
  • Zinc finger binding domains may be engineered to recognize and bind to any nucleic acid sequence of choice. See, for example, Beerli et al. (2002) Nat. Biotechnol. 20: 135-141; Pabo et al. (2001) Ann. Rev. Biochem. 70:313-340; Choo et al. (2000) Curr. Opin. Struct. Biol. 10:411- 416; and Doyon et al. (2008) Nat.
  • An engineered zinc finger binding domain may have a novel binding specificity compared to a naturally occurring zinc finger protein.
  • the algorithm of described in U.S. Pat. No. 6,453,242 may be used to design a zinc finger binding domain to target a preselected sequence.
  • Nondegenerate recognition code tables may also be used to design a zinc finger binding domain to target a specific sequence (Sera et al.
  • An exemplary zinc finger DNA binding domain recognizes and binds a sequence having at least about 80% sequence identity with the desired target sequence.
  • the sequence identity may be about 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%.
  • a zinc finger nuclease also includes a cleavage domain. The cleavage domain portion of the zinc finger nucleases may be obtained from any endonuclease or exonuclease.
  • Non-limiting examples of endonucleases from which a cleavage domain may be derived include, but are not limited to, restriction endonucleases and homing endonucleases. See, for example, 2010-2011 Catalog, New England Biolabs, Beverly, Mass.; and Belfort et al. (1997) Nucleic Acids Res. 25:3379-3388. Additional enzymes that cleave DNA are known (e.g., SI Nuclease; mung bean nuclease; pancreatic DNase I; micrococcal nuclease; yeast HO endonuclease). One or more of these enzymes (or functional fragments thereof) may be used as a source of cleavage domains.
  • the endogenous gene is modified by using “custom" meganucleases produced to modify plant genomes (see e.g., WO 2009/114321; Gao et al. (2010) Plant Journal 1 : 176-187).
  • the term "meganuclease” generally refers to a naturally occurring homing endonuclease that binds double-stranded DNA at a recognition sequence that is greater than 12 base pairs and encompasses the corresponding intron insertion site.
  • Naturally occurring meganucleases can be monomeric (e.g., I-Scel) or dimeric (e.g., I-Crel).
  • the term meganuclease, as used herein, can be used to refer to monomeric meganucleases, dimeric meganucleases, or to the monomers which associate to form a dimeric meganuclease.
  • Naturally occurring meganucleases for example, from the LAGLID ADG family, have been used to effectively promote site-specific genome modification in plants, yeast, Drosophila, mammalian cells and mice.
  • Engineered meganucleases such as, for example, LIG-34 meganucleases, which recognize and cut a 22 basepair DNA sequence found in the genome of Zea mays (maize) are known (see e.g., US 20110113509).
  • the endogenous gene is modified by using TAL endonucleases (TALEN).
  • TAL transcription activator-like effectors from plant pathogenic Xanthomonas are important virulence factors that act as transcriptional activators in the plant cell nucleus, where they directly bind to DNA via a central domain of tandem repeats.
  • a transcription activator-like (TAL) effector-DNA modifying enzymes (TALE or TALEN) are also used to engineer genetic changes. See e.g., US20110145940, Boch et al., (2009), Science 326(5959): 1509-12.
  • Fusions of TAL effectors to the FokI nuclease provide TALENs that bind and cleave DNA at specific locations. Target specificity is determined by developing customized amino acid repeats in the TAL effectors.
  • the endogenous gene is modified by using base editing, such as an oligonucleobase-mediated system.
  • site-specific base conversions can also be achieved to engineer one or more nucleotide changes to create one or more EMEs described herein into the genome.
  • a site-specific base edit mediated by a OG to T*A or an A*T to G*C base editing deaminase enzymes (Gaudelli et al., Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage.” Nature (2017); Nishida et al. “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.” Science 353 (6305) (2016); Komor et al. “Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage.” Nature 533 (7603) (2016):420-4.
  • Catalytically dead dCas9 fused to a cytidine deaminase or an adenine deaminase protein becomes a specific base editor that can alter DNA bases without inducing a DNA break.
  • Base editors convert C->T (or G->A on the opposite strand) or an adenine base editor that would convert adenine to inosine, resulting in an A->G change within an editing window specified by the gRNA.
  • the present disclosure further provides a method of generating plants (e.g., soybean plants or pea plants) producing seeds with an increased methionine content as compared to a control seed comprising crossing a first plant (e.g., soybean plant or pea plant) comprising a polynucleotide encoding a modified glycinin polypeptide described herein with a second plant (e.g., soybean plant or pea plant) comprising a polynucleotide encoding a modified glycinin polypeptide and/or at least one modification associated with increased glycinin, increased methionine, and/or increased protein, described herein, harvesting the seed produced thereby, and generating a progeny plant.
  • a first plant e.g., soybean plant or pea plant
  • a second plant e.g., soybean plant or pea plant
  • the progeny plant comprises the polynucleotide encoding the modified glycinin polypeptide of the first plant (e.g., soybean plant or pea plant) and the polynucleotide encoding the modified glycinin polypeptide and/or the at least one modification associated with increased glycinin, increased methionine, and/or increased protein present in the second plant (e.g., soybean plant or pea plant).
  • the modified glycinin polypeptide of the first plant and the second plant comprise the same amino acid sequence.
  • the modified glycinin polypeptide of the first plant and the second plant comprise different amino acid sequences.
  • the at least one modification may be introduced into the first plant and the second plant using any method described herein or known in the art.
  • one or more of the modifications of the first and/or second plant e.g., soybean plant or pea plant
  • one or more of the modifications of the first and/or second plant e.g., soybean plant or pea plant
  • is a native modification such as for example, a high protein QTL.
  • the seed of the progeny plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methionine on a dry weight of seed basis, as compared to a seed from a control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide).
  • a control plant e
  • the seed of the progeny plant produced by the method comprises at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight basis.
  • the seed of the progeny plant produced by the method comprises at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a corresponding control plant e.g., a plant not comprising the modified glycinin polypeptide
  • the seed of the progeny plant produced by the method comprises a modified glycinin content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a corresponding control plant e.g., wild-type glycinin from a plant not comprising the modified glycinin polypeptide
  • At least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the progeny plant produced by the method comprises a modified glycinin polypeptide described herein and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
  • a corresponding control plant e.g., seed from a plant not comprising the modified glycinin polypeptide
  • the method generates progeny plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the introduced mutations.
  • a plant is produced from the progeny seed.
  • the methionine content in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the protein composition e g., soy protein composition or pea protein composition
  • the protein composition has an essential amino acid content of at least 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, or 85% and less than 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, or 45%.
  • the sum of the methionine and tryptophan in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the sum of the methionine, lysine, threonine and tryptophan in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the protein composition is an animal feed composition.
  • the protein composition is a human food composition.
  • the human food composition is a composition selected from the group consisting of legume meal; soy flour; defatted soy flour; soymilk; spray-dried soymilk; soy protein concentrate; texturized soy protein concentrate; hydrolyzed soy protein; soy protein isolate; spray-dried tofu; soy meat analog; soy cheese analog; and soy coffee creamer.
  • Also provided are methods for feeding animals comprising administering to an animal a feed comprising any of the protein compositions (e.g., soy protein composition or pea protein composition) described herein.
  • the animal is a chicken or a pig.
  • the feeding does not require a synthetic or manufactured amino acid supplement to maintain animal growth as compared with a control protein composition (e.g., soy protein composition or pea protein composition) from comparable plants (e.g., soybean or pea not comprising the modified glycinin polypeptide).
  • the animal gains weight at a similar rate as a control animal under the same feeding regimen except that the control animal receives a feed comprising a protein composition (e.g., soy protein composition or pea protein composition) produced from commodity non-modified soybeans and receives supplementary essential amino acids in an amount sufficient to optimize weight gain in the animal.
  • a protein composition e.g., soy protein composition or pea protein composition
  • a method of producing a protein composition comprising crushing the seed or seeds of the plants described herein and extracting protein (e.g., soy protein or pea protein) from the crushed seed to form the protein composition (e.g., soy protein composition or pea protein composition).
  • the methionine content in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the protein composition e.g., soy protein composition or pea protein composition
  • the protein composition has an essential amino acid content of at least 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, or 85% and less than 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, or 45%.
  • the sum of the methionine and tryptophan in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the sum of the methionine, lysine, threonine and tryptophan in the protein composition is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
  • the method comprises generating an in silico population of high-methionine seed storage protein (e.g., glycinin) variants by inputting the 3D structural coordinates and/or the primary amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage protein 3D structure and/or sequential information, calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silico population, and selecting from the in silico population one or more candidate high-methionine seed storage protein variants having (i) a predicted solubility score that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 8
  • a high-methionine variant refers to a polypeptide comprising an amino acid sequence having at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more total methionine residues.
  • the high- methionine variant comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more and fewer than 150, 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42,
  • the high-methionine variant comprises an additional 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45,
  • control polypeptide e.g., candidate seed storage polypeptide
  • the seed storage protein for use in the method is not particularly limited and may be any seed storage protein in which increased methionine content is desired.
  • the seed storage protein is a globulin protein, such as, for example a 7S globulin or an 1 IS globulin.
  • the seed storage protein is a glycinin, such as, for example, GY1, GY2, GY3, GY4, GY5, GY6, GY7, G8.
  • the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to any one of SEQ ID NOs: 4 and 18-86.
  • the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to amino acid positions 20-495 of any one of SEQ ID NOs: 18-86.
  • the 3D structural coordinates of a candidate seed storage polypeptide can be generated and/or selected using any method known in the art.
  • the 3D structural coordinates are structural coordinates that have been previously determined and disclosed in the art, such as those provided in the protein data bank.
  • the 3D structural coordinates of the candidate seed storage protein are experimentally determined using structural biology techniques including, but not limited to, protein crystallography.
  • the 3D structural coordinates in the methods for generating high- methionine variants described herein are predicted and/or modeled using in silico methods.
  • the in silico method uses AlphaFold2 version 2.3.1 in the monomer/multimer mode with relaxed model prediction.
  • the reference database used for model generation in the methods described herein may be any reference database known in the art including, but not limited to, UniRef90 (accessible on the internet at uniprot.org), BFD (accessible on the internet at bfd.mmseqs.com), MGnify (accessible on the internet at ebi.ac.uk/metagenomics), PDB70, UniRefBO (accessible on the internet at uniclust.mmseqs.com), PDB seqres (accessible on the internet at wwpdb.org), UniProt (accessible on the internet at uniprot.org), and PDB (accessible on the internet at wwpdb.org).
  • UniRefPO is used as the reference database.
  • the Al model is generative model, also referred to as generative Al, a large language model, a machine learning model.
  • Types of machine learning models include without limitation statistical models, such as probability models, regression models, and those involving deep learning, such as supervised, self-supervised, unsupervised models, and reinforcement learning, or combinations thereof.
  • the machine learning model is a classification model, a regression model, a clustering model, a dimensionality reduction model, a distribution model, for example, a multivariate or univariate Gaussian distribution model, or a deep learning model.
  • the deep learning model is part of an ensemble model.
  • the deep learning model is an ensemble model comprising two or more models.
  • the deep learning model is a supervised learning model.
  • the supervised learning model may be a classification or regression model.
  • the machine learning models include support vector machines, neural networks, such as SVM-DA (Support Vector machines) or ANN (Artificial Neural Networks), or deep learning algorithms and the like.
  • the machine learning model comprises a deep learning model, such as, for example ProteinMPNN.
  • the method for generating or producing high-methionine seed storage protein variants further comprises expressing one or more (e.g., 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 500, 1000 or more) of the selected candidate high-methionine seed storage protein variants in a model organism, determining the solubility, the stability, or a combination thereof of the one or more candidate high-methionine seed storage protein variants in the model organism, and selecting high-methionine seed storage variants having solubility and/or stability in the model organism.
  • one or more e.g., 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 500, 1000 or more
  • candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of an experimentally determined solubility and/or stability score in the model organism for the candidate seed storage polypeptide.
  • candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the predicted solubility and/or stability score for the candidate seed storage polypeptide.
  • model organism in which the generated variant polypeptides can be expressed may be used in the in the methods described herein.
  • the model organism is a plant model organism, such as, for example Arabidopsis, N Benthamiana or a legume.
  • the model organism is a cell culture system, including, but not limited to, plant cell culture, insect cell culture (e.g., Sf9 and Sf21 cells), bacterial cell culture (e.g., E. coll), yeast cell culture, or mammalian cell culture.
  • the model organism is E. coll.
  • the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methion
  • the seed of the plant produced by the method comprises at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight basis.
  • the seed of the plant produced by the method comprises at least or at least about a l, 1 .5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the high-methionine seed storage protein variant) on a dry weight of seed basis.
  • a corresponding control plant e.g., a plant not comprising the high-methionine seed storage protein variant
  • the seed of the plant produced by the method comprises a high- methionine seed storage protein variant content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the high-methionine seed storage protein variant) on a dry weight of seed basis.
  • a corresponding control plant e.g., wild-type glycinin from a plant not comprising the high-methionine seed storage protein variant
  • At least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the plant produced by the method comprises the high-methionine seed storage protein variant and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising high- methionine seed storage protein variant) on a dry weight of seed basis.
  • a corresponding control plant e.g., seed from a plant not comprising high- methionine seed storage protein variant
  • the method generates plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the high-methionine seed storage protein variant.
  • high-essential amino acid e.g., arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine
  • the method comprises generating an in silica population of high- essential amino acid seed storage protein (e.g., glycinin) variants by inputting the 3D structural coordinates and/or the primary amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage protein 3D structure and/or sequential information, calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silica population, and selecting from the in silica population one or more candidate high-essential amino acid seed storage protein variants having (i) a predicted solubility score that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%,
  • a high-essential amino acid variant refers to a polypeptide comprising an amino acid sequence having at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more total arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, or valine residues, or any combination thereof.
  • the high-methionine variant comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more and fewer than 150, 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 total arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, or valine residues, or any combination thereof.
  • the high-essential amino acid variant comprises an additional 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, or valine residues, or any combination thereof as compared to a control polypeptide (e.g., candidate seed storage polypeptide).
  • a control polypeptide e.g., candidate seed storage polypeptide
  • the high-essential amino acid variant comprises an amino acid sequence having at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more total methionine, lysine, tryptophan or threonine residues, or any combination thereof.
  • the high-methionine variant comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more and fewer than 150, 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 total methionine, lysine, tryptophan or threonine residues, or any combination thereof.
  • the high- essential amino acid variant comprises an additional 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 methionine, lysine, tryptophan or threonine residues, or any combination thereof as compared to a control polypeptide (e.g., candidate seed storage polypeptide).
  • a control polypeptide e.g., candidate seed storage polypeptide
  • the seed storage protein for use in the method is not particularly limited and may be any seed storage protein in which increased essential amino acid content is desired.
  • the seed storage protein is a globulin protein, such as, for example a 7S globulin or an 1 IS globulin.
  • the seed storage protein is a glycinin, such as, for example, GY1, GY2, GY3, GY4, GY5, GY6, GY7, G8.
  • the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to any one of SEQ ID NOs: 4 and 18-86.
  • the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to amino acid positions 20-495 of any one of SEQ ID NOs: 18-86.
  • the 3D structural coordinates of a candidate seed storage polypeptide can be generated and/or selected using any method known in the art or described herein.
  • the method for generating or producing high-essential amino acid seed storage protein variants further comprises expressing one or more (e.g., 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 500, 1000 or more) of the selected candidate high-essential amino acid seed storage protein variants in a model organism, determining the solubility, the stability, or a combination thereof of the one or more candidate high-essential amino acid seed storage protein variants in the model organism, and selecting high-essential amino acid seed storage variants having solubility and/or stability in the model organism.
  • one or more e.g., 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 500, 1000 or more
  • candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of an experimentally determined solubility and/or stability score in the model organism for the candidate seed storage polypeptide.
  • candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the predicted solubility and/or stability score for the candidate seed storage polypeptide.
  • the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, 350%, or 400% and less than about a 750%, 700%, 650%, 600%, 550%, 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%,
  • the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, 350%, or 400% and less than about a 750%, 700%, 650%, 600%, 550%, 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in methionine, lysine, tryptophan, threonine, or any combination thereof on a dry weight of seed basis, as compared to a
  • the seed of the plant produced by the method comprises at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the high-methionine seed storage protein variant) on a dry weight of seed basis.
  • a corresponding control plant e.g., a plant not comprising the high-methionine seed storage protein variant
  • the seed of the plant produced by the method comprises a high- essential amino acid seed storage protein variant content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the high-essential amino acid seed storage protein variant) on a dry weight of seed basis.
  • a corresponding control plant e.g., wild-type glycinin from a plant not comprising the high-essential amino acid seed storage protein variant
  • At least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the plant produced by the method comprises the high-essential amino acid seed storage protein variant and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising high-essential amino acid seed storage protein variant) on a dry weight of seed basis.
  • a corresponding control plant e.g., seed from a plant not comprising high-essential amino acid seed storage protein variant
  • the method generates plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the high-essential amino acid seed storage protein variant.
  • GY1 ALT1 was the most modified variant of Set 1, with 43 methionine substitutions.
  • GY1_ALT2 contained 35 methionine substitutions,
  • GY1 ALT3 contained 25 methionine substitutions, and
  • GY1 ALT4 was the least modified variant, with only 17 methionine substitutions.
  • GY1_ALT4 design was based on a specific glycinin structural feature.
  • Glycinin belongs to a large superfamily with a conserved structural core of double-stranded beta helix barrel (DSBH) including catalytic enzymes such as metal -dependent oxalate decarboxylase, mannose- 6-phosphate isomerase, a-ketoglutarate dehydrogenase, and non-enzymatic seed storage proteins as well.
  • the active-site is located inside the barrel and consists of the cofactor metal cation chelating and substrate binding residues.
  • Seed protein glycinin does not have the catalytic function but many of the corresponding pseudo catalytic residues inside the DSBH barrel remain hydrophilic/apolar in nature. In GY1 ALT4, all these pseudo catalytic residues (total 17) are chosen for methionine substitution.
  • GY1 ALT4 was then used as a starting scaffold to further engineer an additional 27 high- met variants (Set 2 variants of Table 2).
  • GY1_ALT4 was chosen as the scaffold for Set 2 because it was more soluble and more stable than the wild-type protein, following expression as the proglycinin form in E. coli, as described in Example 2.
  • the GY1_ALT4_5 variant of Set 2 was then used as a starting scaffold to further engineer an additional 20 high-met variants (Set 3 variants of Table 2).
  • GY1 ALT4 5 was chosen as the scaffold for Set 3 because of its solubility, observed following expression as the proglycinin form in E. coli, as described in Example 2.
  • Some of the Set 3 variants contained some methionine substitutions located outside of the beta-barrels.
  • GY1 ALT4 38 had methionine substitutions at the barrel -barrel interface.
  • GY1 ALT4 39 had some methionine substitutions in the trimeric interface, sometimes referred to as the “donut hole” at the center of the trimer.
  • GY1 ALT4 40 and GY1 ALT4 47 had methionine substitutions in a surface loop on the face of the trimer opposite to the face involved in hexamer formation.
  • GY1 ALT4 41 had numerous methionine substitutions in a flexible loop.
  • GY1 ALT4 44 had methionine substitutions at three surface hydrophobic amino acids, one of which was also included in ALT4_47.
  • ProteinMPNN is a message passing neural network (MPNN) with 3 encoder and 3 decoder layers (128 hidden dimensions) which predicts protein sequences in an agnostic autoregressive manner by using backbone features of an input protein 3D structure. This neural network was trained with diverse protein structures and sequences in PDB and CATH.
  • ProteinMPNN calculates the per-residue probability of amino acid by using encoded geometrical information of an input protein 3D structure and sequential information of neighbor residues. Designed sequences based on this per-residue probability of amino acid, sequences having low ProteinMPNN score which is a negative average log per-residue probability of amino acid, indicate the formation of a molecular structure identical to input structure and appropriate properties such as stability and solubility in E. coli.
  • This new sequence was used as a scaffold sequence for the next round of the greedy search, finding a best single methionine substitution case.
  • This greedy search search and substitution process was iteratively performed until the targeted number of methionine was introduced.
  • original parameters of ProteinMPNN were used (v_48_020, version with 48 edges and 0.20 A training noise), and same sequential changes were introduced to all monomers of trimer to maintain the symmetry.
  • residues showing a poor structural model quality (large loops those are longer than 5 amino acid length and indicate pLDDT lower than 70), those possibly involved in trimer-trimer interactions to form the hexamer, and those methionine substitutions that may alter trimer stability were omitted in the greedy search process.
  • 100 trials for introducing 17 additional methionine substitutions and another 100 trials for introducing 27 additional methionine substitutions into GY1 ALT4 were performed and ranked based on ProteinMPNN score of the final sequence.
  • Aggrescan3D (Kuriata et al., 2019, Nucleic Acids Research 47.WEW300-W307) and ESMFold-based in silico melting (Hermosilla et al., 2023, biorxiv 2023.06.06.543955) were used as supplementary metrics to avoid the methionine substitutions which drastically decrease the stability and solubility.
  • Aggrescan3D is calculating the aggregation propensity of each residue based on physicochemical properties of amino acid and structural information.
  • ESMFold-based in silico melting predicts the stability of protein by evaluating the distribution of favorable contacts. Variants indicating maximal Aggrescan3D score value higher than 2.5 or inflection point of ESMFold-based in silico melting curve lower (Succso) than 0.4454 (observed value from experimentally verified negative variant) were removed.
  • Table 3 indicated top-ranked variants from the final ranked list of 17 methionine substitutions (GY1 AI 1-8) and 27 methionine substitutions (GY1 AI 9-18).
  • This method can be applied to other proteins as well as the glycinin family for altering the amino acid composition while maintaining or improving stability and solubility.
  • the substitutions presented in Table 1 could be made at the corresponding positions of other glycinin family members such as GY2, GY3, GY4, GY5, GY7 (Figs. 5A-5B), or in other cupin superfamily members with similar protein folds, such as the conglycinin family.
  • Methods as described in Example 1 e.g., ProteinMPNN
  • Table 2 Description of amino acid substitutions (subs) in soybean GY1 high-met variants.
  • Table 3 Computational data of soybean GY1 high-met variants designed by Al approach.
  • CDS The coding sequences (CDS) of the GY1 variants of Table 2 were expressed in the proglycinin form in E. coli strain BL21-CodonPlus (DE3) RIPL(Agilent Technologies).
  • the public GY1 CDS sequence for soybean cultivar Williams was obtained from Soybase and was identical to the CDS of NCBI accession M36686.
  • the 19-amino acid signal peptide was excluded, a non-native methionine (F1M) served as the start codon, and a C-terminal fusion comprising a five-glycine linker and a six-histidine tag was included to facilitate purification by immobilized metal affinity chromatography.
  • E. coli lacks the soybean vacuolar processing enzyme that cleaves proglycinin to allow hexameric glycinin formation.
  • Cultures were grown with the appropriate antibiotics in 2xYT medium at 37°C until an absorbance at 600 nm of about 0.4 to 0.6 was reached.
  • the insoluble pellet was resuspended in 5% SDS with pipetting, heated at 95°C for 5 min, and pipetted again if needed for complete resuspension. Three pl each of soluble and insoluble protein were assessed by SDS-PAGE.
  • GY1_ALT4_29 For Set 2 variants, soluble protein was observed for GY1_ALT4_29, GY1_ALT4_39, GY1_ALT4_4O, GY1_ALT4_42, GY1 ALT4 43, GY1_ALT4_44, and GY1_ALT4_47.
  • GY1_ALT4_42 and GY1_ALT4_43 methionine substitutions were subsets of GY1 ALT4 29 methionine substitutions, and all these variants had some soluble protein.
  • GY1 ALT4 29 methionine substitutions were a combination of GY1 ALT4 5 and GY1 ALT4 16 methionine substitutions, and all these variants had some soluble protein.
  • the Set 1, 2 and 3 results of Figs. 2A-2L demonstrated introducing numerous methionine substitutions in proglycinin while still retaining sufficient solubility to allow purification of the proteins.
  • GY1 AI 1, GY1 AI 4, GY1 AI 5, GY1 AI 7, and GY1 AI 8 had substantial amounts of soluble protein evident, and GY1 AI 3 and GY1 AI 6 also had small amounts of soluble protein visible.
  • GY1 AI 9, GY1 AI 11, GY1 AI 12, GY1 AI 14, GY1 AI 15, GY1 AI 16, GY1 AI 17, and GY1 AI 18 had soluble protein, and GY1 AI 10 also had detectable, though less abundant, soluble protein.
  • the lysate was centrifuged for 10 min at 10,000 g, and the supernatant was filtered through one layer of Miracloth and then slowly run through a 2 ml column of HisPur Cobalt Superflow Agarose (Thermo Scientific #25229) equilibrated with 50 mM Tris-HCl pH 8, 50 mM NaCl, 15 mM imidazole. The column was washed with 10 ml of the same equilibration buffer to remove all the non-bound protein.
  • the column was then washed with 10 ml of 50 mM Tris-HCl pH 8, 50 mM NaCl, 30 mM imidazole, and eluted with 8 ml of 50 mM Tris-HCl pH 8, 50 mM NaCl, 150 mM imidazole.
  • the purified proteins were concentrated by ultrafiltration. Buffer was changed to the storage buffer comprising 20 mM Tris- HCl pH 8, 300 mM NaCl, and 10% glycerol.
  • the purified proteins were quantitated by absorbance at 205 nm, using an extinction coefficent for 1 mg/ml of 31.
  • the proteins were aliquoted and quickly frozen in liquid nitrogen prior to long-term storage at -80°C.
  • the stability data for high-met proglycinin variants from Set 4 are presented in Fig. 3B.
  • All variants examined were substantially more stable than WT.
  • the stability data for high-met proglycinin variants from Set 5 are presented in Fig. 3C.
  • Two other Set 5 variants, GY1 AI 12 and GY1 AI 17, had slightly greater conformational change than WT at 2 M GuHCl, but appeared to be more stable than WT at higher denaturant concentrations.
  • the GY1_AI_11 variant was the least stable of the Set 5 variants.
  • the results of the Set 4 and Set 5 variants demonstrated that the Protein MPNN method was an effective way to add further methionine substitutions to the GY1 ALT4 scaffold, without destabilizing the protein.
  • Expression vectors were constructed to express the CDS of high-methionine glycinin variants as transgenes in soybeans, with or without increased methionine biosynthesis or decreased methionine catabolism (Table 4).
  • the full preproglycinin sequences were used, including signal peptides but omitting introns.
  • the seed-specific promoter and the terminator from the soybean Gyl gene (SEQ ID Nos: 232 and 233) were used to control expression of the glycinin variants.
  • GM- CGS1 Glyma.09g235400 variant CDS (SEQ ID NO: 231) that contained a 78-amino acid deletion from K66 to S143 inclusive, to remove part of the regulatory region to reduce feedback regulation.
  • the promoter from a soybean ubiquitin gene, GmUBQ (Seq ID No: 234), and the terminator from the phaseolin gene of Phaseolus vulgaris (SEQ ID NO: 235) were used to control GM-CGS1 (78 aa del) expression.
  • Constructs for expression of high-methionine glycinin variants were also transformed into soybeans that were deficient in GM-MGL 3
  • Gene editing was used to create a frameshift mutation that knocked out the GM-MGL 3 gene. Standard Agrobacterium transformation methods were used to transform soybeans with the expression vectors. [0236] As an alternative to transgenic constructs, gene editing could be used to replace the Gyl gene with either the CDS or the gene (including introns) encoding high-methionine glycininl variants.
  • CGS activity is also envisioned, including increasing the expression of wild-type GM-CGS1 or GM-CGS2 (Glyma.l8g261600), or increasing expression via insertion of small sequence motifs into the GM-CGS promoter, as an alternative to replacing the native promoter with GmUBQ or with other strong promoters.
  • Gene editing methods could be used to make the desired deletions in GM-CGS or to replace the native promoter with the GmUBQ promoter.
  • Table 4 Examples of constructs expressing high-met glycinin variants in soybeans, with or without increased met biosynthesis or decreased met catabolism
  • This example demonstrates the impact of expressing high-methionine glycinin variants on soybean seed methionine and cysteine content in greenhouse conditions.
  • T1 seed analysis 40 T1 seeds each from soybean transgenic events expressing high- methionine variants were genotyped. For each event, comprising a single plant, those seeds identified as homozygous were pooled, ground, and analyzed for total seed methionine content and cysteine content. Likewise, the null seeds for each event were pooled, ground, and analyzed. The results are presented in Table 5. Overexpression of GY1 ALT4 without an increased methionine source (Construct 1) resulted in an average increase of 7.67% in seed total methionine (free methionine plus proteogenic methionine) content.
  • T1 seed results demonstrated increased soybean seed methionine content by overexpressing a high-met GY1 variant.
  • T1 and T2 seeds from transgenic plants homozygous for constructs 7-17 of Table 4 are tested for methionine content. Seeds from transgenic plants expressing one of construct 7-17 are expected to have increased methionine content as compared to controls.
  • Table 5 Effects of expressing hi h-methionine glycinin variants on T1 soybean seed methionine and cysteine content
  • Table 6 Effects of expressing high-methionine glycinin variants on T2 soybean seed methionine and cysteine content
  • This example demonstrates providing multiple copies of hi h-methionine glycinin variants to soybean.
  • the protein abundance of high-methionine glycinin variants could be further increased in soybeans by using gene editing methods to introduce multiple copies of the desired CDS or gene.
  • a high-methionine glycinin variant gene could be introduced at its native Gyl locus as well as at the nearby Gy2 locus, thus replacing both native genes. This approach would simultaneously remove two genes encoding low- methionine proteins while introducing two copies of a gene encoding a high- methionine protein.
  • low- methionine beta- conglycinin genes could be knocked out and replaced with one or more copies of a high- methionine glycinin variant.
  • the methionine subs in GY1 variants with acceptable solubility and stability could be made at the corresponding positions in other glycinin family members, such as GY2, GY3, GY4, GY5, and GY7.
  • the GY1 variant replaces any or all of the glycinin family members, GY2, GY3, GY4, GY5, and GY7. Seeds expressing a high-methionine glycinin variant at one or more native glycinin loci have increased methionine on a seed dry weight basis compared to an unmodified or wild-type seed.
  • Glycinin and conglycinin are two major soybean storage proteins in soybean seeds (Table 7 and Table 8). In soybean seeds, P-conglycinin, the abundant 7S globulin storage protein, and glycinin consist of about 21% and 33% of total protein content, respectively (Utsumi et al., 1997). The genes encoding these storage proteins are used as gene editing targets for high-methionine glycinin variant over-expression.
  • the native GY1, GY2, GY3, GY4, GY5 and all the conglycinin alpha, alpha’ or beta subunit genes can be replaced with the high-methionine glycinin variants at those soybean storage protein native loci.
  • Table 7 Expression profiling of glycinin 1 and other putative glycinin family members in soybean
  • Table 8 Expression level of 7 P-conglycinin isoforms in soybean seeds 30 or 50 days after flowering.
  • Example 8-1 Replace the native GY1 with a high-met GY1 variant (GY1 ALT4) on the GY1 native locus.
  • GM-GY-CR1, SEQ ID NO: 1; and GM-GY-CR3, SEQ ID NO: 2) to target the Glycinin 1 (GY1) gene (glyma.03G163500, SEQ ID NO: 3 for the nucleotide sequences, SEQ ID NO: 4 for the peptide sequences) were designed.
  • GY1-CR1 was designed to target a site near the beginning of the exon 1 of the pro-glycinin 1 protein.
  • GM-GY1-CR3 was designed to target the beginning of the 3’ UTR of the glycinin 1 gene.
  • the binary vector contained CR1/CR3 gRNA combinations and their corresponding donor DNA templates (SEQ ID NO: 5).
  • the homology recombination (HR) fragments were used to flank the high-methionine GY 1 variant sequences to facilitate the homology-mediated recombination process.
  • the CR1 or CR3 gRNA target sites were also used to flank the donor DNAs to enable them to be excised from the binary vectors for double strand break repair process.
  • the binary vectors were introduced into soybean plants by Agrobacterhim -mediated soybean embryonic axis transformation.
  • T2 seed expressing the high-methionine glycinin variants Gyl Al 4 or Gyl ALT4 47 at the GY1 gene locus in a wild-type or the CGS edited background were analyzed for total methionine, total cysteine, and total protein content.
  • the increases in Gyl_AI_4 protein were visualized by non-reducing SDS-PAGE and anti-glycinin immunoblot, including a decrease in the amount of wild-type glycinins (Figs. 10A-10B).
  • Gyl_ALT4_47 gene replacement when paired with CGS edit, significantly increased total methionine with a 32% increase in one event and 47% increase in a second event (Table 10).
  • Total cysteine levels were significantly increased the Gyl_ALT4_47 edit + CGS events.
  • the CGS edit on its own showed increases in methionine and cysteine content as well.
  • the protein and oil content was decreased for one Gyl ALT4 47 + CGS event but not the other.
  • the expression of Gyl_ALT4_47 protein was visualized by non-reducing SDS-PAGE (Fig. 11).
  • Example 8-2 Replace both GY1 and GY2 with a high-methionine GY1 variant (Gyl_ALT4_47 and Gyl_AI_4) on the GY1 and GY2 native loci.
  • GM-GY1- CR12 (SEQ ID NO: 6) was designed to target a site near the end of the exon 4 of the pro- glycinin2 protein (glyma03g32020, SEQ ID NO: 7 for nucleotide sequences, SEQ ID NO: 8 for peptide sequences).
  • the same binary vector design was used to combine CR1/CR12 gRNA and their corresponding donor DNA templates (SEQ ID NO: 9 for the Gyl_ALT4_47 variant; SEQ ID NO: 10 for the Gyl_AI_4 variant).
  • the binary vectors are introduced into soybean plants by Agrobacterium-mediated soybean embryonic axis transformation.
  • genome editing variants of the GY1 with high-methionine GY1 variant will be created by replacing both the GY1 and GY2 genomic sequences.
  • TO plants are generated and molecularly analyzed to identify gene replacement with homology-dependent repair at both sites (2xHDR variants). Plants are grown in the greenhouse. T1 seeds are harvested, and T1 planting is conducted to get homozygous T2 seeds with the high-methionine GY1 gene replacement variants.
  • the high-methionine GY1 protein is quantified, and the methionine, cysteine, and total amino acid composition is analyzed to demonstrate the impact of the high-methionine GY1 variant as a replacement of the native GY1 and GY2 protein in soybean seeds. Seeds have measurable increases in total methionine without a significant loss in total protein content.
  • Example 8-3 Replace the conglycinin subunit gene clusters with three copies of a high-methionine GY1 variant (GY1 ALT4) on native conglycinin loci.
  • a two-step process was used for this editing approach. First, two gRNAs were used to dropout the conglycinin gene cluster on chromosome 20 (Gm20); in a separate experiment, another two gRNAs were used to dropout the conglycinin gene cluster on chromosome 10 (Gm 10). The T2 homozygous plants with either Gm20 or Gm 10 conglycinin gene cluster dropouts were genetically crossed, and homozygous double dropout lines were identified.
  • a new gRNA was designed to the new dropout junction sequences to enable the homology-dependent insertion of multiple copies of high-methionine GY1 variant (GY1 ALT4) into the conglycinin native loci, resulting the replacement of conglycinin proteins with the high-met GY1 variants.
  • the GM- CONG-CR1 (SEQ ID NO: 11) and GM-CONG-CR2 (SEQ ID NO: 12) was used to dropout the conglycinin cluster on chromosome 20 (Gm20); the GM-CONG-CR3 (SEQ ID NO: 13) and GM-CONG-CR4 (SEQ ID NO: 14) were used to dropout the conglycinin cluster on chromosome 10 (Gm 10). T2 homozygous seed from the conglycinin Gm 10 locus dropout experiment was generated. Seed protein analyses was conducted by SDS-PAGE Coomassie Blue gel staining analyses.
  • This double conglycinin dropout line was used for the insertion of multiple copies of the high-methionine GY1 variant (GY1 ALT4) into the conglycinin native locus by a template-based genome editing technology.
  • GM-CONG-CR12 SEQ ID NO: 15
  • GM-CONG-CR13 SEQ ID NO: 16
  • the donor DNA (SEQ ID NO: 17) in the binary vector contained three copies of the high-methionine GY1 variant (GY1 ALT4) under the control of native beta-conglycinin alpha’ promoter and terminator, flanked by homology recombination (HR) fragments to facilitate the homology-mediated recombination process.
  • the GM-CONG-CR12 gRNA target sites were also used to flank our donor DNA to enable them to be excised from the binary vectors for double strand break repair process.
  • the binary vectors are introduced into soybean plants by Agrobacterium-mediated soybean embryonic axis transformation.
  • T1 seeds are harvested, and T1 planting is conducted to get homozygous T2 seeds with the high-methionine GY1 variant at the native conglycinin GmlO locus or Gm20 locus.
  • the T2 seed were generated with the high-methionine glycinin variant GY1 ALT4 at the native conglycinin Gm 10 locus and analyzed for total methionine, total cysteine, and total protein content.
  • This edit contains three copies (3x) GY1 ALT4 replacing the three conglycinins on chromosome 10. There was a significant increase in total methionine content reaching an average of 1.16% met (dry weight-basis), which was in 75% increase compared to the WT (Table 11).
  • EMEI was inserted 19 bp upstream of the TATA box in the GM-CGS1 promoter (SEQ ID NO: 240) to make the GM-CGS1 (EMEI) promoter (SEQ ID NO: 241).
  • EME2 was inserted 19 bp upstream of the TATA box in the GM-CGS1 promoter to make the GM-CGS1 (EME2) promoter (SEQ ID NO: 242).
  • EMEI GM-CGS1, GM- CGS1
  • EME2 GM-CGS1 (EME2) and GmUBQ promoters on GM-CGS1 (78 aa del) expression is presented in Table 12.
  • the non-soybean phaseolin terminator was used for these four constructs, thus facilitating the use of a qRT-PCR assay with primers in the phaseolin terminator region to assess only transgene expression, without interference from native expression.
  • Expression in leaves of T3 plants grown in the field in short-row plots is summarized in Table 12. Three leaf punches, one each on three different plants in the row, were averaged to get the row value. Expression in null leaves was below the detection limit as expected with this assay. Transgene expression in leaves was much greater with the GM- CGS1 (EMEI) and GmUBQ promoters compared with the GM-CGS1 and GM-CGS1 (EME2) promoters, which was similar as greenhouse grown plants.
  • EMEI GM- CGS1
  • EME2 GmUBQ promoters
  • the GmUBQ promoter was environmentally variable, with approximately 10-fold greater expression in leaves in the field compared with greenhouse. These results demonstrated that the EMEI insertion into the GM- CGS1 promoter, but not the EME2 insertion, was highly effective in increasing expression in leaves. For the two strongly expressing promoters, expression in developing seed at 35 days after pollination was also examined. Both the GM-CGS1 (EMEI) and GmUBQ promoters were effective in driving transgene expression in developing seeds, with the GmUBQ promoter achieving the greatest expression levels (Table 12). In hypocotyls, the GmUBQ promoter provided the greatest level of expression. In roots, both GmCGS (EMEI) and GmUBQ promoters had higher expression than the other promoters.
  • Table 12 Effect of promoter on GM-CGS (78 aa del) transgene expression in soybean leaves and seeds
  • Seed Composition was determined from mature seed (Table 13 A). Free methionine was significantly increased when using the Gm-CGSl (EMEI) and GmUBQ promoters. Total methionine and total cysteine were also significantly increased in the seed when using the GmUBQ promoter, but not as greatly as free methionine (Table 13 A). Protein was slightly increased and oil was slightly decreased (Table 13B). RNAseq analysis of developing seed was performed and showed increases in three-cys-rich Bowman-Birk protease inhibitors (BBI) and two methionine-rich 2S albumin transcripts when overexpressing CGS with the GmQBQ promoter, suggesting that the increase in total cysteine and total methionine are from these genes.
  • BBI Bowman-Birk protease inhibitors
  • BBI is a trypsin and chymotrypsin inhibitor
  • trypsin-agarose column chromatography was used to determine that the increased protein (separated at a similar size to BBI) bound trypsin.
  • Protease inhibition assays with both trypsin and chymotrypsin also confirmed the increases in BBI protein in the GmUBQ:CGS overexpressed seed. These in vitro assays showed significantly increased trypsin inhibition (decreased trypsin activity) and some increase in chymotrypsin inhibition in the presence of protein from GmUBQ:CGS seed.
  • Table 13A Effects of ectopic expression of GmCGSl del on sulfur amino acid contents of mature soybean seed.
  • DW denotes dry weight
  • DW denotes dry weight
  • This example demonstrates the impact of expressing high-methionine glycinin variant Gyl-ALT4 on soybean seed methionine and cysteine content.
  • Transgenic plants expressing the Gyl_ALT4 variant were grown in the field in short rows. After harvest, seeds were analyzed for total methionine, total cysteine, and total protein content. The results are presented in Table 14. All five events showed increases in total methionine compared to the null, with four events having a statistically significant increase in total methionine content when analyzed with two-tailed T-tests, assuming equal variances and using a cutoff for the p value of 0.05. Total cysteine content was decreased significantly in three events. Total protein content remained relatively unchanged, with three events increased (two significantly) and two events slightly decreased.
  • Transcripts of the endogenous Gyl gene and Gyl_ALT4 transgene were quantified by Next-Gen amplicon sequencing of RT-PCR products.
  • mRNA were extracted from immature seed of 35 DAF and 45 DAF sampled from soybean plants grown in the field.
  • RT-PCR assays were performed to generate amplicons of 173 bp in length using a pair of primers that target the conserved region of Gyl, Gy2, Gy3, and Gyl_ALT4).
  • Secondary PCR amplifications were performed to incorporate sample specific indices and Illumina specific sequences. Sequencing was carried out on a MiSeq (Illumina). Sequencing adapters and low-quality data were removed using the Cutdapt tool (version 1.12).
  • BWA-MEM was employed to align reads to the transgene or the endogenous genes.
  • SAM tools were used to count reads. Relative expression levels are presented as percentage.
  • Gyl had the highest expression in 35 ADF seed, accounting for 56.3% of the transcripts in WT (Table 15).
  • the expression levels of Gy2 and Gy3 were not significantly different from those in WT.
  • the endogenous Gyl expression was significantly lower than WT.
  • the transgenic Gyl_ALT4 transcripts account for 37.0% of the sequenced transcripts.
  • the two copies of the Gyl promoter in transgenic plants, one endogenous copy and one transgenic copy contributed 57.2% of the transcripts, similar to that resulted from the single endogenous copy in WT and null segregants.
  • the T2 seed were generated from gene editing of Gyl-ALT4 described in Example 8-1 in the MGL3 frameshift edited background and analyzed for total methionine, total cysteine, and total protein content.
  • Total methionine in the Gyl-ALT4 seed was significantly increased as compared to the WT seed when analyzed with two-tailed T-tests, assuming equal variances and using a cutoff for the p value of 0.05.
  • the edited Gyl_ALT4 gene replacement also showed band increases in both Gyl_ALT4 acidic and basic chains disulfide-bonded together (arrow 2) and the Gyl_ALT4 acidic chain only (arrow 4; Figs. 8A-8D).
  • Gyl_ALT4 edit a decrease was observed for the wild-type glycinin family with the acidic and basic chains disulfide-bonded together (arrow 1) and the wild-type glycinins acidic chain only (arrow 3) compared to WT.
  • Additional glycinin variants were transgenically expressed in soybean, including constructs 7 through 17 described in Table 4.
  • the T2 seed from greenhouse grown plants were analyzed for total methionine, total cysteine, protein, and oil. Significant increases in methionine were observed in events for Gyl variants Gyl_ALT4_40, GY1_AI_4, Gyl_AI_5, Gyl_AI_7, _Gyl_AI_9, Gy_AI_14, Gyl_ALT4_47+CGS, Gyl_AI_4+CGS, and Gyl_AI_14+CGS (Table 17A).
  • Table 17A Effects of expressing Gyl high-methionine variants on T2 soybean seed methionine and cysteine content of T2 seed from plants grown in the greenhouse
  • Soybean lines containing three copies of the GY1_ALT4 glycinin variant in the double conglycinin dropout background, described in Example 8-3, are further modified to have increased expression or activity of a CGS1, CGS2, or both a CGS1 and CGS2 polypeptide (e g., removing CGS regulatory domain by genome editing) in order to generate plants having increased expression or activity of at least one CGS polypeptide, decreased expression of beta- conglycinin, and expressing a glycinin variant.
  • the modified CGS is introduced by genome editing, a transgene containing CGS or a variant, or by introgression.
  • Seeds produced from the plants have increased total methionine and total cysteine content as compared to seeds from wild-type plants or plants containing three copies of the GY1 ALT4 glycinin variant in the double conglycinin dropout background. Similar results (e.g., increased total methionine and total cysteine) are expected using any combination of the high methionine glycinin variants listed in Table 2 or 17A including GY1 ALT4, in the 3x configuration on chromosome 10 (described in Example 8-3) to replace the conglycinin genes.
  • conglycinins on chromosome 20, removed as described in Example 8-3 can also be replaced with one or more copies of a high methionine glycinin variant to further increase total methionine content.
  • Plants having increased expression or activity of at least one CGS polypeptide e.g., removing CGS regulatory domain by genome editing
  • decreased expression of beta- conglycinin, and expressing a high-methionine glycinin variant are further modified to have decreased expression, activity, and/or stability of an endogenous MFT polypeptide, and/or decreased expression, activity, and/or stability of an endogenous SWT polypeptide to generate a CGS, glycinin, conglycinin, MFT, and/or SWT modified plant.
  • Seeds from the CGS, glycinin, conglycinin, MFT, and/or SWT modified plants have increased total methionine content as compared to wild-type plants or plants having increased expression of a polynucleotide encoding at least one CGS, decreased expression of beta-conglycinin, and expressing a high-methionine glycinin variant.
  • the CGS, glycinin, conglycinin, MFT, and/or SWT modified plants are further modified to introduce by genome editing, a transgene, or introgression a modification decreasing RFO content such as a genome editing or mutating a polynucleotide encoding a raffinose synthase to reduce or eliminate its expression. Seeds with decreased RFO content have increased protein and high-methionine on a dry weight basis as compared to the wild-type plants.
  • Soybean plants are generated having decreased expression of beta-conglycinin and containing a high-methionine glycinin variant described herein such as GY1 ALT4 47 or GY1 Al 4 to generate high-methionine glycinin variant plants with decreased beta- conglycinin expression.
  • the modification decreasing beta-conglycinin is by a gene edited knockout of one or more beta-conglycinin isoforms, introduction of an inverted repeat, or by a beta-conglycinin dominate hairpinRNAi.
  • the high-methionine glycinin variant is introduced into one or more of the native glycinin gene loci, as described in Example 7.
  • the plants are generated by genome editing, introducing transgenes, or by crossing plants containing one or more of the modifications. Seeds from the generated plants have increased total methionine as compared to wild-type plants.
  • the plants are further modified to have increased expression or activity of a CGS1, CGS2, or both a CGS1 and CGS2 polypeptide (e.g., removing CGS regulatory domain by genome editing) to generate plants having decreased expression of beta-conglycinin, expressing a high-methionine glycinin and increased expression or activity of at least one CGS polypeptide.
  • the modified CGS is introduced by genome editing, a transgene containing CGS or a variant, or by introgression. Seeds produced from the plants have increased total methionine and total cysteine content as compared to seeds from wild-type plants or plants containing a high-methionine variant and decreased expression of beta-conglycinin.
  • Plants having increased expression or activity of at least one CGS polypeptide e.g., removing CGS regulatory domain by genome editing), decreased expression of beta- conglycinin, and expressing a high-methionine glycinin variant are further modified to have decreased expression, activity, and/or stability of an endogenous MFT polypeptide, and/or decreased expression, activity, and/or stability of an endogenous SWT polypeptide to generate a CGS, glycinin, conglycinin, MFT, and/or SWT modified plant.
  • the modifications for each can independently be introduced by genome editing, transgenes, or introgression.
  • Seeds from the CGS, glycinin, conglycinin, MFT, and/or SWT modified plants have increased total methionine content (predicted to be at, or around, 3% on a dry weight basis) as compared to wild-type plants or plants having increased expression of a polynucleotide encoding at least one CGS, decreased expression of beta-conglycinin, and expressing a high-methionine glycinin variant.
  • the CGS, glycinin, conglycinin, MFT, and/or SWT modified plants are further modified to introduce by genome editing, a transgene, or introgression a modification decreasing RFO content such as a genome editing or mutating a polynucleotide encoding a raffinose synthase to reduce or eliminate its expression. Seeds with decreased RFO content have increased protein and high-methionine on a dry weight basis as compared to the wild- type plants.
  • nucleic acids are written left to right in 5’ to 3’ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. Numeric ranges are inclusive of the numbers defining the range. Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Cell Biology (AREA)
  • Physics & Mathematics (AREA)
  • Nutrition Science (AREA)
  • Botany (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)

Abstract

Plants producing seeds and seeds comprising increased methionine content are provided. Protein compositions having increased methionine can be used from the seeds. Additionally, methods for generating and using plants, seeds and protein compositions having increased protein content or increased amounts of essential amino acids are disclosed. Methods for generating high methionine variants of seed storage proteins for use in the compositions and methods are also provided.

Description

GLYCININ VARIANTS FOR IMPROVING THE NUTRITIONAL VALUE OF
SOYBEANS
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY
[0001] The official copy of the sequence listing is submitted electronically via Patent Center as an XML formatted sequence listing with a file named 210933_SequenceListing created on March 19, 2024 and having a size of 554,789 bytes and is filed concurrently with the specification. The sequence listing comprised in this XML formatted document is part of the specification and is herein incorporated by reference in its entirety.
BACKGROUND
[0002] Livestock feed rations are commonly comprised of a mixture of soybean meal and maize. Such feed can have suboptimal amounts of the two sulfur-containing amino acids, methionine (met) and cysteine, as well as the amino acids lysine, tryptophan, and threonine. The most limiting of these amino acids for poultry feed are the sulfur amino acids, and consequently synthetic methionine made from petroleum is commonly added to poultry feed, significantly increasing the expense of the feed. Creating proteins that are enriched in these limiting amino acids, and that are capable of accumulating to abundant levels in soybean seeds, may improve the nutritional value of soybeans for feed. Achieving abundant accumulation of such proteins in soybeans may be challenging, however, because numerous and simultaneous amino acid substitutions in a protein can be destabilizing. Proteins that are incorrectly folded, or are less compactly folded, may be more susceptible to proteolysis, and thus may be less likely to accumulate abundantly in soybeans.
SUMMARY
[0003] Provided are plants and seeds comprising a modified polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11 , 20, 21 , 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154, 155, 156, 158, 160, 162, 164, 169, 171, 172,
173, 174, 176, 177, 180, 184 192, 196, 209, 211, 215, 223, 225, 228, 229, 230, 235, 246, 247,
249, 253, 255, 257 258 269, 270, 277, 279, 302, 303, 308, 310, 313, 316, 321, 322, 323, 324,
326, 328 335 338 339, 341, 343, 349, 351, 354, 356, 357, 359, 361, 363, 364, 365, 371, 372
373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
[0004] Also provided is a plant or seed comprising a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising at least 30 methionine residues and an AlphaFold2 predicted structure having a TM-score of at least 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, or 1 as compared to the AlphaFold2 predicted structure of SEQ ID NO: 4.
[0005] Provided is a method of producing a plant producing seed having increased methionine content, the method comprising introducing into a regenerable plant cell a polynucleotide encoding modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154, 155, 156, 158,
160, 162, 164, 169, 171, 172, 173, 174, 176, 177, 180, 184 192, 196, 209, 211, 215, 223, 225,
228, 229, 230, 235, 246, 247, 249, 253, 255, 257 258 269, 270, 277, 279, 302, 303, 308, 310,
313, 316, 321, 322, 323, 324, 326, 328 335 338 339, 341, 343, 349, 351, 354, 356, 357, 359,
361, 363, 364, 365, 371, 372 373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468; and generating the plant, wherein the plant comprises the polynucleotide encoding the modified glycinin polypeptide and produces a seed having an increased amount of methionine as compared to seed of a plant not comprising the modified glycinin polypeptide. [0006] Further provided is a method for generating high-methionine seed storage protein variants, the method comprising generating an in silica population of high-methionine seed storage protein variants by inputting the 3D structural coordinates and/or amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage polypeptide 3D structure and/or sequential information; calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silica population; and selecting from the in silica population one or more candidate high-methionine seed storage polypeptide variants, the one or more selected candidate high-methionine seed storage polypeptide variants having (i) a predicted solubility score that is at least 80% of a predicted solubility score for the candidate seed storage protein, (ii) a predicted stability score that is at least 80% of a predicted stability score for the candidate seed storage polypeptide, (iii) a predicted aggregation propensity score less than 50% of an aggregation propensity score for the candidate seed storage protein, or (iv) any combination thereof.
[0007] Also provided is a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150,
151, 153, 154, 155, 156, 158, 160, 162, 164, 169, 171, 172, 173, 174, 176, 177, 180, 184 192,
196, 209, 211, 215, 223, 225, 228, 229, 230, 235, 246, 247, 249, 253, 255, 257 258 269, 270,
277, 279, 302, 303, 308, 310, 313, 316, 321, 322, 323, 324, 326, 328 335 338 339, 341, 343,
349, 351, 354, 356, 357, 359, 361, 363, 364, 365, 371, 372 373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434,
436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
[0008] Further provided is a modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154, 155, 156, 158, 160, 162, 164, 169, 171, 172,
173, 174, 176, 177, 180, 184 192, 196, 209, 211, 215, 223, 225, 228, 229, 230, 235, 246, 247,
249, 253, 255, 257 258 269, 270, 277, 279, 302, 303, 308, 310, 313, 316, 321, 322, 323, 324,
326, 328 335 338 339, 341, 343, 349, 351, 354, 356, 357, 359, 361, 363, 364, 365, 371, 372
373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
[0009] Further provided are protein compositions produced from the seeds or seeds of the plants described herein, and methods of feeding an animal comprising administering a feed comprising the protein composition to the animal in a feeding regimen.
BRIEF DESCRIPTION OF THE DRAWINGS AND THE SEQUENCE LISTING
[0010] The disclosure can be more fully understood from the following detailed description and the accompanying drawings and Sequence Listing, which form a part of this application. The sequence descriptions (Table 1) and sequence listing attached hereto comply with the rules governing nucleotide and amino acid sequence disclosures in patent applications as set forth in 37 C.F.R. §§1.831-1.835.
[0011] Figs. 1A-1E provide pictures of experimental data showing the expression of six-histidine tagged wild-type proglycinin and Set 1 high-methionine proglycinin variants in E. coli. Fig. 1 A provides the wild-type proglycinin (SEQ ID NO: 237) expression. Fig. IB provides the expression of proglycinin variant GY1 ALT4 (SEQ ID NO: 21). Fig. 1C provides the expression of proglycinin variant GY1 ALT3 (SEQ ID NO: 20). Fig. ID provides the expression of proglycinin variant GY1 ALT2 (SEQ ID NO: 19). Fig. IE provides the expression of proglycinin variant GY1 ALT1 (SEQ ID NO: 18). Lane descriptions: kD, protein molecular mass markers, with sizes given in kD on the left side of the gel; un, uninduced protein; in, induced protein. The prominent bands near the 50 kD marker that are present in the induced lanes, but not in the uninduced lanes, are the proglycinin polypeptides. The full-length preproglycinin sequence that includes the signal peptide corresponding to position 1 -19 for each sequence and lacking the six-histidine tag is provided.
[0012] Figs. 2A-2L provide pictures of experimental data showing the solubility of six-histidine tagged high-methionine proglycinin variants expressed in E. coli. Fig. 2A provides data for wildtype proglycinin (SEQ ID NO: 237) along with the proglycinin variants G1_ALT4 (SEQ ID NO: 21), G1 ALT3 (SEQ ID NO: 20), G1 ALT2 (SEQ ID NO: 19), G1 ALT1 (SEQ ID NO: 18). Fig. 2B provides data for the proglycinin variants ALT4 1 (SEQ ID NO: 22), ALT4 2 (SEQ ID NO: 23), ALT4 3 (SEQ ID NO: 24), and ALT4_4 (SEQ ID NO: 25). Fig. 2C provides data for the proglycinin variants ALT4 5 (SEQ ID NO: 26), ALT4 6 (SEQ ID NO: 27), ALT4 7 (SEQ ID NO: 28), ALT4 8 (SEQ ID NO: 29), ALT4 9 (SEQ ID NO: 30), ALT4 10 (SEQ ID NO: 31), ALT4 11 (SEQ ID NO: 32), ALT4 12 (SEQ ID NO: 33), and ALT4 13 (SEQ ID NO: 34). Fig. 2D provides data for the proglycinin variants ALT4 14 (SEQ ID NO: 35), ALT4 15 (SEQ ID NO: 36), ALT4 16 (SEQ ID NO: 37), ALT4 17 (SEQ ID NO: 38), ALT4 18 (SEQ ID NO: 39), ALT4 19 (SEQ ID NO: 40), ALT4 20 (SEQ ID NO: 41), ALT4 21 (SEQ ID NO: 42), and ALT4 22 (SEQ ID NO: 43). Fig. 2E provides data for the proglycinin variants ALT4 23 (SEQ ID NO: 44), ALT4 24 (SEQ ID NO: 45), ALT4 25 (SEQ ID NO: 46), ALT4 26 (SEQ ID NO: 47), and ALT4_27 (SEQ ID NO: 48). Fig. 2F provides data for the proglycinin variants ALT4 28 (SEQ ID NO: 49), ALT4 29 (SEQ ID NO: 50), ALT4 30 (SEQ ID NO: 51), and ALT4_31 (SEQ ID NO: 52). Fig. 2G provides data for the proglycinin variants ALT4_32 (SEQ ID NO: 53), ALT4 33 (SEQ ID NO: 54), ALT4 34 (SEQ ID NO: 55), ALT4 35 (SEQ ID NO: 56), ALT4 36 (SEQ ID NO: 57), ALT4 37 (SEQ ID NO: 58), ALT4 38 (SEQ ID NO: 59), ALT4 39 (SEQ ID NO: 60), and ALT4 40 (SEQ ID NO: 61). Fig. 2H provides data for the proglycinin variants ALT4 41 (SEQ ID NO: 62), ALT4 42 (SEQ ID NO: 63), ALT4 43 (SEQ ID NO: 64), ALT4 44 (SEQ ID NO: 65), ALT4 45 (SEQ ID NO: 66), ALT4 46 (SEQ ID NO: 67), and ALT4 47 (SEQ ID NO: 68). Fig. 21 provides data for the proglycinin variants G1 AI 1 (SEQ ID NO: 69), G1 AI 2 (SEQ ID NO: 70), and G1 AI 3 (SEQ ID NO: 71). Fig. 2J provides data for the proglycinin variants G1 AI 4 (SEQ ID NO: 72), G1 AI 5 (SEQ ID NO: 73), G1 AI 6 (SEQ ID NO: 74), G1_AI_7 (SEQ ID NO: 75), and G1_AI_8 (SEQ ID NO: 76). Fig. 2K provides data for the proglycinin variants G1 AI 9 (SEQ ID NO: 77), G1 AI 10 (SEQ ID NO: 78), G1 AI 11 (SEQ ID NO: 79), G1 AI 12 (SEQ ID NO: 80), and G1 AI 13 (SEQ ID NO: 81). Fig. 2L provides data for the proglycinin variants G1_AI_14 (SEQ ID NO: 82), G1 AIJ 5 (SEQ ID NO: 83), G1 AIJ6 (SEQ ID NO: 84), G1 AI J7 (SEQ ID NO: 85), and G1 AI 18 (SEQ ID NO: 86). Lane descriptions: kD, protein molecular mass markers, with sizes given in kD on the left side of the gel; I, insoluble protein; S, protein. Asterisks denote proteins that were purified and characterized. The full-length preproglycinin sequence that includes the signal peptide corresponding to position 1-19 for each sequence and lacking the six-histidine tag is provided. “Gl_” is omitted from the front of the Set 2 variant names to facilitate labeling. [0013] Figs. 3A-3C provide graphs of experimental data depicting the stability of six-histidine tagged high-methionine proglycinin variants against unfolding by guanidine hydrochloride. Fig. 3A provides data for wild-type proglycinin WT (SEQ ID NO: 237), G1 ALT4 (SEQ ID NO: 21), G1 ALT4 5 (SEQ ID NO: 26), G1 ALT4 29 (SEQ ID NO: 50), G1 ALT4 39 (SEQ ID NO: 60), G1 ALT4 40 (SEQ ID NO: 61), G1 ALT4 47 (SEQ ID NO: 68). Fig. 3B provides data for WT (SEQ ID NO: 4), G1 AI 4 (SEQ ID NO: 72), G1 AI 5 (SEQ ID NO: 73), G1 AI 7 (SEQ ID NO: 75), and G1 AI 8 (SEQ ID NO: 76). Fig. 3C provides data for wildtype proglycinin WT (SEQ ID NO: 237), G1_AI_9 (SEQ ID NO: 77), G1_AI_11 (SEQ ID NO: 79), G1 AI 12 (SEQ ID NO: 80), G1 AI 14 (SEQ ID NO: 82), and G1 AI 17 (SEQ ID NO: 85). Unfolding was monitored by the change in fluorescence intensity at 323 nm, with an excitation wavelength of 280 nm. The full-length preproglycinin sequence that includes the signal peptide corresponding to position 1-19 for each sequence and lacking the six-histidine tag is provided.
[0014] Figs. 4A-4P provide experimental data showing the stability of six-histidine tagged high- methionine proglycinin variants against digestion by trypsin. Digests were done at 25°C for the indicated times with a 1:500 ratio (wt:wt) of trypsimproglycinin variant. The 0-minute control lanes contained the proglycinin variant without trypsin. Fig. 4A provides data for the wild-type proglycinin (SEQ ID NO: 237). Fig. 4B provides data for the high-methionine proglycinin variant GY1 ALT4 (SEQ ID NO: 21). Fig. 4C provides data for the high-methionine proglycinin variant GY1 ALT4 5 (SEQ ID NO: 26). Fig. 4D provides data for the high-methionine proglycinin variant GY1_ALT4_29 (SEQ ID NO: 50). Fig. 4E provides data for the high- methionine proglycinin variant GY1 ALT4 39 (SEQ ID NO: 60). Fig. 4F provides data for the high-methionine proglycinin variant GY1 ALT4 40 (SEQ ID NO: 61). Fig. 4G provides data for the high-methionine proglycinin variant GY1 ALT4 47 (SEQ ID NO: 68). Fig. 4H provides data for the high-methionine proglycinin variant GY1 AI 4 (SEQ ID NO: 72). Fig. 41 provides data for the high-methionine proglycinin variant GY1 AI 5 (SEQ ID NO: 73). Fig. 4J provides data for the high-methionine proglycinin variant GY1 AI 7 (SEQ ID NO: 75). Fig. 4K provides data for the high-methionine proglycinin variant GY1 AI 8 (SEQ ID NO: 76). Fig. 4L provides data for the high-methionine proglycinin variant GY1 AI 9 (SEQ ID NO: 77). Fig. 4M provides data for the high-methionine proglycinin variant GY1 AI 11 (SEQ ID NO: 79). Fig. 4N provides data for the high-methionine proglycinin variant GY1 AI 12 (SEQ ID NO: 80). Fig. 40 provides data for the high-methionine proglycinin variant GY1 AI 14 (SEQ ID NO: 82). Fig. 4P provides data for the high-methionine proglycinin variant GY1 AI 17 (SEQ ID NO: 85). The full-length preproglycinin sequence that includes the signal peptide corresponding to position 1-19 for each sequence and lacking the six-histidine tag is provided.
[0015] Figs. 5A-5B provide a sequence alignment of the soybean glycinin family members G1 (SEQ ID NO: 237), G2 (SEQ ID NO: 8), G3 (SEQ ID NO: 157), G4 (SEQ ID NO: 158), G5 (SEQ ID NO: 159), and G7 (SEQ ID NO: 160). Full-length preproglycinin sequences, including signal peptides, are shown in the alignment.
[0016] Fig. 6 provides the predicted structure of the glycinin trimer. The wild-type proglycinin sequence of SEQ ID NO: 4 was used to generate each monomer. The structures were predicted using AlphaFold2.
[0017] Fig. 7A provides the predicted structure of the wild-type proglycinin monomer (SEQ ID NO: 4). Fig. 7B provides the predicted structure of the proglycinin variant GY1 ALT4 monomer without the signal peptide (positions 20-495 of SEQ ID NO: 21). Fig. 7C provides the predicted structure of the proglycinin variant GY1_AI_14 monomer without the signal peptide (positions 20-495 of SEQ ID NO: 82). The structures were predicted using AlphaFold2 and the methionine residues in each structure are represented as sticks.
[0018] Figs. 8A-8D provide an analysis of GY1_ALT4 protein in seed from transgenic events and gene edited seed by non-reducing SDS-PAGE and anti-glycinin immunoblots. Fig 8A depicts a non-reducing SDS-PAGE of GY1 ALT4 (SEQ ID NO: 21) in transgenic seed. Fig. 8B depicts a non-reducing SDS-PAGE of GY1 ALT4 (SEQ ID NO: 21) in gene edited seed. Fig. 8C depicts anti-glycinin immunoblots of GY1 ALT4 (SEQ ID NO: 21) in transgenic seed. Fig. 8D depicts anti-glycinin immunoblots of GY1 ALT4 (SEQ ID NO: 21) in gene edited seed. The arrows denote: 1, wild-type glycinin family with acid and basic chains disulfide-bonded together; 2, Gyl_ALT4 acidic and basic chains disulfide-bonded together; 3, wild-type glycinins, acidic chain only; 4, Gyl_ALT4 acidic chain only; 5, presumed proteolytic fragment of glycinin acidic chain. The e followed by a number is the event code as found in Table 17A.
[0019] Figs. 9A-9F provides an analysis of protein in transgenic T2 seed expressing high-Met Gyl variants GY1 ALT4 47 (SEQ ID NO: 68), GY1 AI 4 (SEQ ID NO: 72), GY1 AI 14 (SEQ ID NO: 82), GY1 ALT4 40 (SEQ ID NO: 61), GY1 ALT4 39 (SEQ ID NO: 60), GY1 AI 5 (SEQ ID NO: 73), GY1 AI 7 (SEQ ID NO: 75), and GY1 AI 9 (SEQ ID NO: 77). Fig. 9A depicts a stained gel of T2 seed expressing high-Met Gyl variants GY1_ALT4_47 (SEQ ID NO: 68), GY1 AI 4 (SEQ ID NO: 72), GY1 AI 14 (SEQ ID NO: 82) in a modified CGS background. Fig. 9B depicts anti-glycinin immunoblot of T2 seed expressing high-Met Gyl variants GY1 ALT4 47 (SEQ ID NO: 68), GY1 AI 4 (SEQ ID NO: 72), GY1 AI 14 (SEQ ID NO: 82) in a modified CGS background. Fig. 9C depicts a stained gel of T2 seed expressing high-Met Gyl variants GY1 ALT4 47 (SEQ ID NO: 68), GY1 ALT4 40 (SEQ ID NO: 61), and GY1_ALT4_39 (SEQ ID NO: 60) in a wild-type background. Fig. 9D anti-glycinin immunoblot of T2 seed expressing high-Met Gyl variants GY1_ALT4_47 (SEQ ID NO: 68), GY1 ALT4 40 (SEQ ID NO: 61), and GY1 ALT4 39 (SEQ ID NO: 60) in a wild-type background. Fig. 9E depicts a stained gel of T2 seed expressing high-Met Gyl variants GY1 AI 4 (SEQ ID NO: 72), GY1 AI 5 (SEQ ID NO: 73), GY1 AI 7 (SEQ ID NO: 75), GY1 AI 14 (SEQ ID NO: 82), and GY1 AI 9 (SEQ ID NO: 77) in a wild-type background. Fig. 9F anti-glycinin immunoblot of T2 seed expressing high-Met Gyl variants GY1_AI_4 (SEQ ID NO: 72), GY1 AI 5 (SEQ ID NO: 73), GY1 AI 7 (SEQ ID NO: 75), GY1 AI 14 (SEQ ID NO: 82), and GY1 AI 9 (SEQ ID NO: 77) in a wild-type background. The e followed by a number is the event code as found in Table 17A.
[0020] Figs. 10A and 10B provides an analysis of GY1_AI_4 (SEQ ID NO: 72) gene replacement at the GY J locus in a wild-type background (Gyl_AI_4) and CGS background (Gy l_AI_4 + CGS). Fig. 10A depicts a non-reducing SDS-PAGE. Fig 10B depicts anti-glycinin immunoblots. The arrows denote: 1, wild-type glycinin family with acid and basic chains disulfide-bonded together; 2, Gyl_AI_4 acidic and basic chains disulfide-bonded together; 3, wild-type glycinins, acidic chain only; 4, Gyl_AI_4 acidic chain only; 5, uncertain, but possibly proteolytic fragment of glycinin acidic chain. [0021 ] Fig. 1 1 provides a non-reducing SDS-PAGE analysis of GY1 ALT4 47 (SEQ ID NO:
68) gene replacement at the GY1 locus. The e followed by a number is the event as found in Table 10.
Table 1: Sequence Listing Description
DETAILED DESCRIPTION
I. Compositions
A. Glycinin Polynucleotides and Polypeptides [0022] Provided are glycinin polynucleotides encoding modified glycinin polypeptides having at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity with SEQ ID NO: 4, 8, 18-86, 157-160, and 237 and a modification described herein. Also provided are glycinin polynucleotides having at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity with SEQ ID NO: 3, 7, 87-156, 161-164 and 236 and encoding a modification described herein.
[0023] Soybean glycinins are a family of abundant seed storage proteins. Most of the total glycinin protein is encoded by five genes named Gyl, Gy2, Gy3, Gy4, and Gy5 (Nielsen et al., 1989, Plant Cell 1 :313-328). The corresponding proteins encoded by these genes were named GY1 (also referred to herein as Gl), GY2 (also referred to herein as G2), GY3 (also referred to herein as G3), GY4 (also referred to herein as G4), and GY5 (also referred to herein as G5). Subsequent studies identified an additional but weakly expressing glycinin gene, Gy 7, and two glycinin pseudogenes, Gy6 and Gy8 (Beilinson et al., 2002, Theor Appl Genet 104: 1132-1140; Li and Zhang, 2011, Heredity 106:633-641). In soybeans, glycinin polypeptides are initially translated on cytosolic ribosomes as the precursor polypeptide preproglycinin (e.g., SEQ ID NO: 237 for wild-type Glycinin 1), and upon entry into the endoplasmic reticulum, the signal peptide (e g., positions 1-19 of SEQ ID NO: 237 for wild-type Glycinin 1) is removed, resulting in the formation of proglycinin polypeptides (e.g., SEQ ID NO: 4 for proglycinin 1) that form trimers. The proglycinin trimers then move to the protein storage vacuole where a specific protease, the vacuolar processing enzyme, cleaves the proglycinin polypeptide into acidic and basic polypeptides. This cleavage facilitates the interaction of two proglycinin trimers to form glycinin hexamers. Proglycinin trimers and glycinin hexamers can exist either as homo-oligomers, or as hetero-oligomers that include multiple glycinin family members. As used herein, “GY1” or “Gl” is included in the names of preproglycinin 1, proglycinin 1, or glycinin 1 polypeptides or the corresponding CDS that encode them, and “Gy7” or “GF is included in the name of the corresponding glycinin 1 genomic DNA sequences, including introns.
[0024] One aspect of the disclosure provides a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 modifications selected from the group consisting of a methionine at the position corresponding to position 11 of SEQ ID NO: 4, a methionine at the position corresponding to position 20 of SEQ ID NO: 4, a methionine at the position corresponding to position 21 of SEQ ID NO: 4, a methionine at the position corresponding to position 23 of SEQ ID NO: 4, a methionine at the position corresponding to position 24 of SEQ ID NO: 4, a methionine at the position corresponding to position 26 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 28 of SEQ ID NO: 4, a methionine at the position corresponding to position 33 of SEQ ID NO: 4, a methionine at the position corresponding to position 34 of SEQ ID NO: 4, a methionine at the position corresponding to position 35 of SEQ ID NO: 4, a methionine at the position corresponding to position 40 of SEQ ID NO: 4, a methionine at the position corresponding to position 43 of SEQ ID NO: 4, a methionine at the position corresponding to position 48 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 49 of SEQ ID NO: 4, a methionine at the position corresponding to position 51 of SEQ ID NO: 4, a methionine at the position corresponding to position 52 of SEQ ID NO: 4, a methionine at the position corresponding to position 53 of SEQ ID NO: 4, a methionine at the position corresponding to position 59 of SEQ ID NO: 4, a methionine at the position corresponding to position 61 of SEQ ID NO: 4, a methionine at the position corresponding to position 64 of SEQ ID NO: 4, a methionine at the position corresponding to position 66 of SEQ ID NO: 4, a methionine at the position corresponding to position 70 of SEQ ID NO: 4, a methionine at the position corresponding to position 71 of SEQ ID NO: 4, a methionine at the position corresponding to position 75 of SEQ ID NO: 4, a methionine at the position corresponding to position 80 of SEQ ID NO: 4, a methionine at the position corresponding to position 81 of SEQ ID NO: 4, a methionine at the position corresponding to position 84 of SEQ ID NO: 4, a methionine at the position corresponding to position 85 of SEQ ID NO: 4, a methionine at the position corresponding to position 95 of SEQ ID NO: 4, a methionine at the position corresponding to position 97 of SEQ ID NO: 4, a methionine at the position corresponding to position 98 of SEQ ID NO: 4, a methionine at the position corresponding to position 99 of SEQ ID NO: 4, a methionine at the position corresponding to position 100 of SEQ ID NO: 4, a methionine at the position corresponding to position 103 of SEQ ID NO: 4, a methionine at the position corresponding to position 111 of SEQ ID NO: 4, a methionine at the position corresponding to position 112 of SEQ ID NO: 4, a methionine at the position corresponding to position 113 of SEQ ID NO: 4, a methionine at the position corresponding to position 115 of SEQ ID NO: 4, a methionine at the position corresponding to position 117 of SEQ ID NO: 4, a methionine at the position corresponding to position 121 of SEQ ID NO: 4, a methionine at the position corresponding to position 129 of SEQ ID NO: 4, a methionine at the position corresponding to position 131 of SEQ ID NO: 4, a methionine at the position corresponding to position 135 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 143 of SEQ ID NO: 4, a methionine at the position corresponding to position 145 of SEQ ID NO: 4, a methionine at the position corresponding to position 147 of SEQ ID NO: 4, a methionine at the position corresponding to position 149 of SEQ ID NO: 4, a methionine at the position corresponding to position 150 of SEQ ID NO: 4, a methionine at the position corresponding to position 151 of SEQ ID NO: 4, a methionine at the position corresponding to position 153 of SEQ ID NO: 4, a methionine at the position corresponding to position 154 of SEQ ID NO: 4, a methionine at the position corresponding to position 155 of SEQ ID NO: 4, a methionine at the position corresponding to position 156 of SEQ ID NO: 4, a methionine at the position corresponding to position 158 of SEQ ID NO: 4, a methionine at the position corresponding to position 160 of SEQ ID NO: 4, a methionine at the position corresponding to position 162 of SEQ ID NO: 4, a methionine at the position corresponding to position 164 of SEQ ID NO: 4, a methionine at the position corresponding to position 169 of SEQ ID NO: 4, a methionine at the position corresponding to position 171 of SEQ ID NO: 4, a methionine at the position corresponding to position 172 of SEQ ID NO: 4, a methionine at the position corresponding to position 173 of SEQ ID NO: 4, a methionine at the position corresponding to position 174 of SEQ ID NO: 4, a methionine at the position corresponding to position 176 of SEQ ID NO: 4, a methionine at the position corresponding to position 177 of SEQ ID NO: 4, a methionine at the position corresponding to position 180 of SEQ ID NO: 4, a methionine at the position corresponding to position 184 of SEQ ID NO: 4, a methionine at the position corresponding to position 192 of SEQ ID NO: 4, a methionine at the position corresponding to position 196 of SEQ ID NO: 4, a methionine at the position corresponding to position 209 of SEQ ID NO: 4, a methionine at the position corresponding to position 211 of SEQ ID NO: 4, a methionine at the position corresponding to position 215 of SEQ ID NO: 4, a methionine at the position corresponding to position 223 of SEQ ID NO: 4, a methionine at the position corresponding to position 225 of SEQ ID NO: 4, a methionine at the position corresponding to position 228 of SEQ ID NO: 4, a methionine at the position corresponding to position 229 of SEQ ID NO: 4, a methionine at the position corresponding to position 230 of SEQ ID NO: 4, a methionine at the position corresponding to position 235 of SEQ ID NO: 4, a methionine at the position corresponding to position 246 of SEQ ID NO: 4, a methionine at the position corresponding to position 247 of SEQ ID NO: 4, a methionine at the position corresponding to position 249 of SEQ ID NO: 4, a methionine at the position corresponding to position 253 of SEQ ID NO: 4, a methionine at the position corresponding to position 255 of SEQ ID NO: 4, a methionine at the position corresponding to position 257 of SEQ ID NO: 4, a methionine at the position corresponding to position 258 of SEQ ID NO: 4, a methionine at the position corresponding to position 269 of SEQ ID NO: 4, a methionine at the position corresponding to position 270 of SEQ ID NO: 4, a methionine at the position corresponding to position 277 of SEQ ID NO: 4, a methionine at the position corresponding to position 279 of SEQ ID NO: 4, a methionine at the position corresponding to position 302 of SEQ ID NO: 4, a methionine at the position corresponding to position 303 of SEQ ID NO: 4, a methionine at the position corresponding to position 308 of SEQ ID NO: 4, a methionine at the position corresponding to position 310 of SEQ ID NO: 4, a methionine at the position corresponding to position 313 of SEQ ID NO: 4, a methionine at the position corresponding to position 316 of SEQ ID NO: 4, a methionine at the position corresponding to position 321 of SEQ ID NO: 4, a methionine at the position corresponding to position 322 of SEQ ID NO: 4, a methionine at the position corresponding to position 323 of SEQ ID NO: 4, a methionine at the position corresponding to position 324 of SEQ ID NO: 4, a methionine at the position corresponding to position 326 of SEQ ID NO: 4, a methionine at the position corresponding to position 328 of SEQ ID NO: 4, a methionine at the position corresponding to position 335 of SEQ ID NO: 4, a methionine at the position corresponding to position 338 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 339 of SEQ ID NO: 4, a methionine at the position corresponding to position 341 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 343 of SEQ ID NO: 4, a methionine at the position corresponding to position 349 of SEQ ID NO: 4, a methionine or valine at the position corresponding to position 351 of SEQ ID NO: 4, a methionine at the position corresponding to position 354 of SEQ ID NO: 4, a methionine at the position corresponding to position 356 of SEQ ID NO: 4, a methionine at the position corresponding to position 357 of SEQ ID NO: 4, a methionine at the position corresponding to position 359 of SEQ ID NO: 4, a methionine at the position corresponding to position 361 of SEQ ID NO: 4, a methionine at the position corresponding to position 363 of SEQ ID NO: 4, a methionine at the position corresponding to position 364 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 365 of SEQ ID NO: 4, a methionine at the position corresponding to position 371 of SEQ ID NO: 4, a methionine at the position corresponding to position 372 of SEQ ID NO: 4, a methionine at the position corresponding to position 373 of SEQ ID NO: 4, a methionine at the position corresponding to position 376 of SEQ ID NO: 4, a methionine at the position corresponding to position 382 of SEQ ID NO: 4, a methionine at the position corresponding to position 383 of SEQ ID NO: 4, a methionine at the position corresponding to position 385 of SEQ ID NO: 4, a methionine at the position corresponding to position 391 of SEQ ID NO: 4, a methionine at the position corresponding to position 393 of SEQ ID NO: 4, a methionine at the position corresponding to position 396 of SEQ ID NO: 4, a methionine at the position corresponding to position 399 of SEQ ID NO: 4, a methionine at the position corresponding to position 401 of SEQ ID NO: 4, a methionine at the position corresponding to position 402 of SEQ ID NO: 4, a methionine at the position corresponding to position 403 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 405 of SEQ ID NO: 4, a methionine at the position corresponding to position 406 of SEQ ID NO: 4, a methionine at the position corresponding to position 409 of SEQ ID NO: 4, a methionine at the position corresponding to position 410 of SEQ ID NO: 4, a methionine at the position corresponding to position 411 of SEQ ID NO: 4, a methionine at the position corresponding to position 412 of SEQ ID NO: 4, a methionine at the position corresponding to position 414 of SEQ ID NO: 4, a methionine at the position corresponding to position 416 of SEQ ID NO: 4, a methionine at the position corresponding to position 420 of SEQ ID NO: 4, a methionine or glycine at the position corresponding to position 421 of SEQ ID NO: 4, a methionine at the position corresponding to position 423 of SEQ ID NO: 4, a methionine at the position corresponding to position 425 of SEQ ID NO: 4, a methionine at the position corresponding to position 430 of SEQ ID NO: 4, a methionine at the position corresponding to position 434 of SEQ ID NO: 4, a methionine at the position corresponding to position 436 of SEQ ID NO: 4, a methionine at the position corresponding to position 442 of SEQ ID NO: 4, a methionine at the position corresponding to position 443 of SEQ ID NO: 4, a methionine at the position corresponding to position 454 of SEQ ID NO: 4, a methionine at the position corresponding to position 456 of SEQ ID NO: 4, a methionine at the position corresponding to position 461 of SEQ ID NO: 4, a methionine at the position corresponding to position 462 of SEQ ID NO: 4, a methionine at the position corresponding to position 463 of SEQ ID NO: 4, a methionine at the position corresponding to position 464 of SEQ ID NO: 4, and a methionine at the position corresponding to position 468 of SEQ ID NO: 4. In certain embodiments, the modified glycinin polypeptide comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, or 30 introduced methionine residues as compared to the glycinin polypeptide of SEQ ID NO: 4. In certain embodiments, at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17) of the 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more modifications comprises a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 51, 53, 61, 66, 70, 71, 80, 145, 162, 164, 172, 209, 313, 335, 341, 343, 351, 354, 356, 361, 412, 414, 416, 436, 442, 462, and 468.
[0025] Also provided is a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17) modification selected from the group consisting of a methionine at the position corresponding to position 51 of SEQ ID NO: 4, a methionine at the position corresponding to position 61 of SEQ ID NO: 4, a methionine at the position corresponding to position 66 of SEQ ID NO: 4, a methionine at the position corresponding to position 70 of SEQ ID NO: 4, a methionine at the position corresponding to position 145 of SEQ ID NO: 4, a methionine at the position corresponding to position 162 of SEQ ID NO: 4, a methionine at the position corresponding to position 164 of SEQ ID NO: 4, a methionine at the position corresponding to position 172 of SEQ ID NO: 4, a methionine at the position corresponding to position 313 of SEQ ID NO: 4, a methionine at the position corresponding to position 341 of SEQ ID NO: 4, a methionine at the position corresponding to position 354 of SEQ ID NO: 4, a methionine at the position corresponding to position 356 of SEQ ID NO: 4, a methionine at the position corresponding to position 361 of SEQ ID NO: 4, a methionine at the position corresponding to position 412 of SEQ ID NO: 4, a methionine at the position corresponding to position 414 of SEQ ID NO: 4, a methionine at the position corresponding to position 416 of SEQ ID NO: 4, and a methionine at the position corresponding to position 462 of SEQ ID NO: 4. In certain embodiments, the modified glycinin polypeptide further comprises at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) additional modification comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 53, 71, 80, 209, 335, 343, 351, 436, 442, and 468.
[0026] Further provided is polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 4 and comprising at least one modification selected from the group consisting of a methionine at the position corresponding to position 53 of SEQ ID NO: 4, a methionine at the position corresponding to position 71 of SEQ ID NO: 4, a methionine at the position corresponding to position 80 of SEQ ID NO: 4, a methionine at the position corresponding to position 209 of SEQ ID NO: 4, a methionine at the position corresponding to position 335 of SEQ ID NO: 4, a methionine or cysteine at the position corresponding to position 343 of SEQ ID NO: 4, a methionine or valine at the position corresponding to position 351 of SEQ ID NO: 4, a methionine at the position corresponding to position 436 of SEQ ID NO: 4, a methionine at the position corresponding to position 442 of SEQ ID NO: 4, and a methionine at the position corresponding to position 468 of SEQ ID NO: 4.
[0027] As used herein "percent (%) sequence identity" with respect to a reference sequence (subject) is determined as the percentage of amino acid residues or nucleotides in a candidate sequence (query) that are identical with the respective amino acid residues or nucleotides in the reference sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any amino acid conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (e.g., percent identity of query sequence = number of identical positions between query and subject sequences/total number of positions of query sequence * 100).
[0028] Unless otherwise stated, sequence identity/ similarity values provided herein refer to the value obtained using the BLAST 2.0 suite of programs using default parameters (Altschul, et al., (1997) Nucleic Acids Res. 25:3389-402).
[0029] In certain embodiments, the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18-86. In certain embodiments, the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to amino acid positions 20 to 495 of any one of SEQ ID NOs: 18-86. In certain embodiments, the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18-86 or positions 20 to 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 positions corresponding to SEQ ID NO: 4 described herein. In certain embodiments, the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18-86 or positions 20 to 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more positions corresponding to amino acid position 51, 61, 66, 70, 145, 162, 164, 172, 313, 341, 354, 356, 361, 412, 414, 416, and 462 of SEQ ID NO: 4. In certain embodiments, the modified glycinin polypeptides described herein comprise an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 18- 86 or positions 20 to 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at the position corresponding to amino acid position 51, 61, 66, 70, 145, 162, 164, 172, 313, 341, 354, 356, 361, 412, 414, 416, and 462 of SEQ ID NO: 4.
[0030] In certain embodiments, the modified glycinin polypeptides described herein comprise a modification described herein and comprise an amino acid sequence that is at least, or at least about, 10%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical SEQ ID NO: 4 and comprises an AlphaFold2 predicted structure having a TM-score of at least 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, or 1 as compared to the AlphaFol d2 predicted structure of SEQ ID NO: 4 or amino acid positions 20 to 495 of any one of SEQ ID NOs: 18-86. Representative predicted structures of various modified glycinin polypeptides for use in the compositions and methods described herein generated using AlphaFold2 can be found in Figs. 6, 7A, 7B, and 7C.
[0031] AlphaFold is a computational method that can predict protein structures with atomic accuracy even in cases in which no similar structure is known. The AlphaFold network directly predicts the 3D coordinates of all heavy atoms for a given protein using the primary amino acid sequence and aligned sequences of homologues as inputs. The AlphaFold methods are scalable to very long proteins with accurate domains and domain-packing, and the model provides precise, per-residue estimates of its reliability that enables confident use of its structure predictions. (Jumper, J., Evans, R., Pritzel, A. et al., Nature 596, 583-589 (2021)). The structures provided herein were predicted using AlphaFold2 version 2.3.1 (accessible on the internet at github.com/google-deepmind/alphafold) in the monomer mode with relaxed model prediction and “uniref90” release-2022_05 was used as the reference database (accessible on the internet at ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2022_05/).
[0032] As used herein, a “template modeling score” “TM-score” “structural similarity score” or the like refers to a measure of structural similarity between two protein tertiary structures. TM- scores range from 0 (no structural identity) to 1 (perfect structural identity) and can be computed using multiple different publicly available approaches, including TM-align (zhanggroup.org/TM- align/) or Foldseek (search.foldseek.com/search), accessible using the prefix “www” on the internet. Those skilled in the art can determine appropriate parameters for optimally aligning structures. [0033] Unless otherwise stated, TM-scores provided herein refer to the value obtained using the TM-align program using default parameters to compare protein structures predicted using AlphaFold2 version 2.3.1 in the monomer mode with relaxed model prediction and the “uniref90” reference database.
[0034] In certain embodiments, the modified glycinin polypeptides described herein comprise a signal peptide operably linked to the modified glycinin polypeptide. In certain embodiments, the signal peptide is operably linked at the N-terminus of the modified glycinin polypeptide. In certain embodiments, the signal peptide is operably linked at the C-terminus of the modified glycinin polypeptide. In certain embodiments, the modified glycinin polypeptide comprises 2 or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more) signal peptides. The 2 or more signal peptides may be operably linked the N-terminus, the C-terminus, or a combination thereof. In certain embodiments, the signal peptide comprises an amino acid sequence that is at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to amino acid positions 1-19 of SEQ ID NO: 237. In certain embodiments, the modified glycinin polypeptide comprises a linker sequence between the modified glycinin polypeptide and the signal peptide. The sequence and length of the linker is not particularly limited so long as the signal peptide can direct the modified glycinin polypeptide to the desired location
[0035] Also provided are polynucleotides encoding modified glycinin polypeptides comprising 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more methionine residues and comprising an amino acid sequence that is at least, or at least about, 10%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 4 and comprising an AlphaFold2 predicted structure having a TM-score of at least 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95, or 1 as compared to the AlphaFold2 predicted structure of SEQ ID NO: 4 or to the AlphaFold2 predicted structure of amino acid positions 20-495 of any one of SEQ ID NOs: 18-86. The position of the methionine residues is not particularly limited. In certain embodiments, at least 5 (e.g., at least 5, 6, 7, 8, 9, or 10) of the 10 or more methionine residues introduced within a beta barrel motif of the AlphaFold2 predicted structure of SEQ ID NO: 4. In certain embodiments, at least five methionine residues are introduced at a position corresponding to position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61 , 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154,
155, 156, 158, 160, 162, 164, 169, 171, 172, 173, 174, 176, 177, 180, 184 192, 196, 209, 211,
215, 223, 225, 228, 229, 230, 235, 246, 247, 249, 253, 255, 257 258 269, 270, 277, 279, 302,
303, 308, 310, 313, 316, 321, 322, 323, 324, 326, 328 335 338 339, 341, 343, 349, 351, 354,
356, 357, 359, 361, 363, 364, 365, 371, 372 373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468 of SEQ ID NO: 4. In certain embodiments, at least five methionine residues are introduced at a position corresponding to position 24, 26, 28, 33, 35, 49, 51, 53, 55, 59, 61, 64, 66, 70, 72, 74, 81, 83, 115, 117, 121, 123, 125, 131, 133, 135, 141, 143,
145, 147, 162, 164, 172, 173, 176, 302, 313, 314, 322, 324, 326, 339, 341, 345, 351, 354, 356,
361, 363, 365, 370, 372, 374, 382, 383, 387, 393, 395, 401, 103, 405, 410, 412, 414, 416, 421,
423, 425, 462, 463, and 464 of SEQ ID NO: 4.
[0036] As used herein “encoding,” “encoded,” or the like, with respect to a specified nucleic acid, means comprising the information for translation into the specified protein. A nucleic acid encoding a protein may comprise non-translated sequences (e.g., introns) within translated regions of the nucleic acid, or may lack such intervening non-translated sequences (e.g., as in cDNA). The information by which a protein is encoded is specified by the use of codons.
Typically, the amino acid sequence is encoded by the nucleic acid using the “universal” genetic code. However, variants of the universal code, such as is present in some plant, animal and fungal mitochondria, the bacterium Mycoplasma capricolum (Yamao, et al., (1985) Proc. Natl. Acad. Sci. USA 82:2306-9) or the ciliate Macronucleus, may be used when the nucleic acid is expressed using these organisms.
[0037] When the nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended host where the nucleic acid is to be expressed. For example, although nucleic acid sequences of the present invention may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific codon preferences and GC content preferences of monocotyledonous plants or dicotyledonous plants as these preferences have been shown to differ (Murray, et al., (1989) Nucleic Acids Res. 17:477-98 and herein incorporated by reference). [0038] As used herein, “polynucleotide” includes reference to a deoxyribopolynucleotide, ribopolynucleotide or analogs thereof that have the essential nature of a natural ribonucleotide in that they hybridize, under stringent hybridization conditions, to substantially the same nucleotide sequence as naturally occurring nucleotides and/or allow translation into the same amino acid(s) as the naturally occurring nucleotide(s). A polynucleotide can be full-length or a subsequence of a structural or regulatory gene. Unless otherwise indicated, the term includes reference to the specified sequence as well as the complementary sequence thereof. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples, are polynucleotides as the term is used herein. It will be appreciated that a great variety of modifications have been made to DNA and RNA that serve many useful purposes known to those of skill in the art. The term polynucleotide as it is employed herein embraces such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including inter alia, simple and complex cells.
[0039] The terms “polypeptide,” “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers.
[0040] In certain embodiments, the polynucleotide encoding the modified glycinin polypeptide is operably linked to at least one (e.g., at least 1, 2, 3, 4, 5, 6, 7 or more) regulatory element. In certain embodiments, the regulatory element is a promoter. In certain embodiments, the regulatory element is a heterologous regulatory element (e.g., heterologous promoter). In certain embodiments, the heterologous regulatory element is heterologous to the polynucleotide sequence encoding the polypeptide. In certain embodiments, in which the polynucleotide operably linked to the heterologous regulatory element is introduced in a cell the regulatory element is heterologous to the cell.
[0041] As used herein “operably linked” is intended to mean a functional linkage between two or more elements. For, example, an operable linkage between a polynucleotide of interest and a regulatory sequence (e.g., a promoter) is a functional link that allows for expression of the polynucleotide of interest. Operably linked elements may be contiguous or non-contiguous. When used to refer to the joining of two protein coding regions, operably linked is intended that the coding regions are in the same reading frame.
[0042] As used herein “regulatory element” generally refers to a transcriptional regulatory element involved in regulating the transcription of a nucleic acid molecule such as a gene or a target gene. The regulatory element is a nucleic acid and may include a promoter, an enhancer, an intron, expression modulating elements (EMEs), a 5 ’-untranslated region (5’-UTR, also known as a leader sequence), or a 3’-UTR, or a combination thereof. A regulatory element may act in "cis" or "trans", and generally it acts in "cis", i.e., it activates expression of genes located on the same nucleic acid molecule, e.g., a chromosome, where the regulatory element is located. [0043] An “enhancer” element is any nucleic acid molecule that increases transcription of a nucleic acid molecule when functionally linked to a promoter regardless of its relative position. Various enhancers are known in the art including for example, introns with gene expression enhancing properties in plants, the ubiquitin intron (i.e., the maize ubiquitin intron 1 (see, for example, NCBI sequence S94464)), the omega enhancer or the omega prime enhancer (Gallie, et al., (1989) Molecular Biology ofRNA ed. Cech (Liss, New York) 237-256 and Gallie, et al., (1987) Gene 60:217-25), the CaMV 35S enhancer (see, e.g., Benfey, et al., (1990) EMBO J. 9: 1685-96) and the enhancers of US Patent Number 7,803,992 may also be used. The above list of transcriptional enhancers is not meant to be limiting. Any appropriate transcriptional enhancer can be used in the embodiments described herein.
[0044] A “repressor” (also sometimes called herein silencer) is defined as any nucleic acid molecule which inhibits the transcription when functionally linked to a promoter regardless of relative position. The term "cis-element" generally refers to a transcriptional regulatory element that affects or modulates expression of an operably linked transcribable polynucleotide, where the transcribable polynucleotide is present in the same DNA sequence. A cis-element may function to bind transcription factors, which are trans-acting polypeptides that regulate transcription.
[0045] An “intron” is an intervening sequence in a gene that is transcribed into RNA but is then excised in the process of generating the mature mRNA. The term is also used for the excised RNA sequences. An “exon” is a portion of the sequence of a gene that is transcribed and is found in the mature mRNA derived from the gene but is not necessarily a part of the sequence that encodes the final gene product. The 5' untranslated region (5’UTR) (also known as a translational leader sequence or leader RNA) is the region of an mRNA that is directly upstream from the initiation codon. This region is involved in the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. The “3' non-coding sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3' end of the mRNA precursor.
[0046] As used herein “promoter” refers to a region of DNA upstream from the start of transcription and involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. A “plant promoter” is a promoter capable of initiating transcription in plant cells. In certain embodiments, the polynucleotides described herein are operably linked to a promoter that drives expression in a plant cell. Any promoter known in the art can be used in the methods of the present disclosure including, but not limited to, constitutive promoters, pathogeninducible promoters, wound-inducible promoters, tissue-preferred promoters, and chemical- regulated promoters. The choice of promoter may depend on the desired timing and location of expression in the transformed plant as well as other factors, which are known to those of skill in the art. Such constitutive promoters include, for example, the core promoter of the Rsyn7 promoter and other constitutive promoters disclosed in WO 99/43838 and U.S. Patent No.
6,072,050; the core CaMV 35S promoter; rice actin; ubiquitin; pEMU; MAS; ALS; and the like. Other constitutive promoters include, for example, those disclosed in U.S. Patent Nos.
5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; 5,608,142; and 6,177,611, which are known in the art, and can be contemplated for use in the present disclosure. [0047] Generally, it can be beneficial to express the gene from an inducible promoter, particularly from a pathogen-inducible promoter. Such promoters include those from pathogenesis-related proteins (PR proteins), which are induced following infection by a pathogen, e.g., PR proteins, SAR proteins, beta-1,3 -glucanase, chitinase, etc.
[0048] Of interest are promoters that are expressed locally at or near the site of pathogen infection. Additionally, as pathogens find entry into plants through wounds or insect damage, a wound-inducible promoter can be used in the constructions of the disclosure. Such woundinducible promoters include potato proteinase inhibitor (pin II) gene, wunl and wun2, winl and win2, systemin, WIP1, MPI gene, and the like. [0049] Chemi cal -regulated promoters can be used to modulate the expression of a gene in a plant through the application of an exogenous chemical regulator. Depending upon the objective, the promoter can be a chemical-inducible promoter, where application of the chemical induces gene expression, or a chemical-repressible promoter, where application of the chemical represses gene expression. Chemical -inducible promoters are known in the art and include, but are not limited to, the maize In2-2 promoter, which is activated by benzenesulfonamide herbicide safeners, the maize GST promoter, which is activated by hydrophobic electrophilic compounds that are used as pre-emergent herbicides, and the tobacco PR-la promoter, which is activated by salicylic acid. Other chemical-regulated promoters of interest include steroid-responsive promoters (e.g., the glucocorticoid-inducible promoter, and tetracycline-inducible and tetracycline-repressible promoters).
[0050] Tissue-preferred promoters can be utilized to target enhanced expression of the target genes or proteins within a particular plant tissue. Such tissue-preferred promoters include, but are not limited to, leaf-preferred promoters, root-preferred promoters, seed-preferred promoters, and stem-preferred promoters. Tissue-preferred promoters include Yamamoto et al. (1997) Plant J. 12(2): 255 -265; Kawamata et al. (1997) Plant Cell Physiol. 38(7):792-803; Hansen et al. (1997) Mol. Gen Genet. 254(3):337-343; Russell et al. (1997) Transgenic Res. 6(2): 157-168; Rinehart et al. (1996) Plant Physiol. 112(3): 1331-1341; Van Camp et al. (1996) Plant Physiol.
112(2): 525-535; Canevascini et al. (1996) Plant Physiol. 112(2):513-524; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778; Lam (1994) Results Probl. Cell Differ. 20: 181-196; Orozco et al. (1993) Plant Mol Biol. 23(6): 1129-1138; Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA 90(20):9586-9590; and Guevara-Garcia et al. (1993) Plant J. 4(3):495-505. Such promoters can be modified.
[0051] Leaf-specific promoters are known in the art. See, for example, Yamamoto et al. (1997) Plant J. 12(2)255-265; Kwon et al. (1994) Plant Physiol. 105:357-67; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778; Gotor et al. (1993) Plant J. 3:509-18; Orozco et al. (1993) Plant Mol. Biol. 23(6): 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90(20):9586-9590.
[0052] "Seed-preferred" promoters include both "seed-specific" promoters (those promoters active during seed development such as promoters of seed storage proteins) as well as "seedgerminating" promoters (those promoters active during seed germination). Such seed-preferred promoters include, but are not limited to, Ciml (cytokinin-induced message), cZ19Bl (maize 19 kDa zein), milps (myo-inositol- 1 -phosphate synthase), and celA (cellulose synthase) (see WO 00/11177, herein incorporated by reference). Gama-zein is a preferred endosperm-specific promoter. Glob-1 is a preferred embryo-specific promoter. For dicots, seed-specific promoters include, but are not limited to, bean P-phaseolin, napin, -conglycinin, soybean lectin, cruciferin, and the like. For monocots, seed-specific promoters include, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, g-zein, waxy, shrunken 1, shrunken 2, globulin 1, etc. See also WO 00/12733, where seed-preferred promoters from endl and end2 genes are disclosed; herein incorporated by reference.
[0053] In certain embodiments, the polynucleotides of the present disclosure can involve the use of the intact, native glycinin genes, wherein the expression is driven by a cognate 5' upstream promoter sequence(s).
[0054] Also contemplated are synthetic promoters which include a combination of one or more heterologous regulatory elements.
[0055] In certain embodiments, the glycinin polynucleotides encoding the modified glycinin polypeptides described above are inserted into a recombinant DNA construct. In certain embodiments, the recombinant DNA construct further comprises at least one regulatory element. In certain embodiments, the at least one regulatory element of the recombinant DNA construct comprises a promoter, preferably a heterologous promoter. In certain embodiments, the recombinant DNA construct, described herein is expressed in a plant or seed. In certain embodiment, the plant or seed is a soybean plant or soybean seed.
[0056] As used herein, a “recombinant DNA construct” comprises two or more operably linked DNA segments which are not found operably linked in nature. Non-limiting examples of recombinant DNA constructs include a polynucleotide of interest operably linked to heterologous sequences, also referred to as “regulatory elements,” which aid in the expression, autologous replication, and/or genomic insertion of the sequence of interest. Such regulatory elements include, for example, promoters, termination sequences, enhancers, etc., or any component of an expression cassette; a plasmid, cosmid, virus, autonomously replicating sequence, phage, or linear or circular single-stranded or double-stranded DNA or RNA nucleotide sequence; and/or sequences that encode heterologous polypeptides. [0057] The modified glycinin described herein can be provided for expression in a plant of interest or an organism of interest. The cassette can include 5' and 3' regulatory sequences operably linked to a modified glycinin polynucleotide.
[0058] The promoter of the recombinant DNA constructs of the invention can be any type or class of promoter known in the art, such that any one of a number of promoters can be used to express the various modified glycinin sequences disclosed herein, including the native promoter of the polynucleotide sequence of interest. The promoters for use in the recombinant DNA constructs of the invention can be selected based on the desired outcome.
[0059] In certain embodiments, the polynucleotides encoding the modified glycinin polypeptides described herein are provided in expression cassettes (e.g., a plasmid, cosmid, virus, autonomously replicating sequence, phage, or linear or circular single-stranded or doublestranded DNA or RNA nucleotide sequence) for expression in a plant of interest or any organism of interest. The cassette can include 5' and 3' regulatory sequences operably linked to a polynucleotide encoding the modified glycinin polypeptide. The cassette may additionally contain at least one additional gene to be cotransformed into the organism. Alternatively, the additional gene(s) can be provided on multiple expression cassettes. Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotide encoding the modified glycinin polypeptide to be under the transcriptional regulation of the regulatory regions. The expression cassette may additionally contain selectable marker genes.
[0060] The expression cassette can include in the 5'-3 ' direction of transcription, a transcriptional and translational initiation region (e.g., a promoter), a polynucleotide encoding the modified glycinin polypeptide, and a transcriptional and translational termination region (e.g., termination region) functional in plants. The regulatory regions (e.g., promoters, transcriptional regulatory regions, and translational termination regions) and/or the polynucleotide encoding the modified glycinin polypeptide may be native/analogous to the host cell or to each other. Alternatively, the regulatory regions and/or the polynucleotide encoding the modified glycinin polypeptide may be heterologous to the host cell or to each other.
[0061] The termination region may be native with the transcriptional initiation region, with the plant host, or may be derived from another source (i.e., foreign or heterologous) than the promoter, the polynucleotide encoding the modified glycinin polypeptide, the plant host, or any combination thereof.
[0062] The expression cassette may additionally contain a 5' leader sequences. Such leader sequences can act to enhance translation. Translation leaders are known in the art and include viral translational leader sequences.
[0063] Generally, the expression cassette can comprise a selectable marker gene for the selection of transformed cells. Selectable marker genes are utilized for the selection of transformed cells or tissues. Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compounds, such as glyphosate, glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). The above list of selectable marker genes is not meant to be limiting. Any selectable marker gene can be used in the present disclosure.
[0064] In preparing the expression cassette, the various DNA fragments may be manipulated, to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
[0065] In certain embodiments, the nucleic acid construct or expression cassette, described herein, is expressed in a plant or seed. In certain embodiments, the plant or seed is a soybean plant or soybean seed. The nucleic acid constructs or expression cassettes disclosed herein may be used for transformation of any plant species.
B. Plants, Plant Parts, Plant Cells, or Seeds Comprising a Modified Glycinin Protein [0066] The present disclosure further provides plants, plant parts, plant cells and seeds expressing any of the modified glycinin polypeptides described herein.
[0067] As used herein, the term “plant” includes plant protoplasts, plant cell tissue cultures from which plants can be regenerated, plant calli, plant clumps, and plant cells that are intact in plants or parts of plants such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, and the like. Grain is intended to mean the mature seed produced by commercial growers for purposes other than growing or reproducing the species. Progeny, variants, and mutants of the regenerated plants are also included within the scope of the disclosure, provided that these parts comprise the polynucleotide encoding the modified glycinin polypeptide.
[0068] The plant species of the compositions and methods of the present disclosure can be any plant species for which improvement of the nutrition value (e.g., increased methionine content or increased essential amino acid content) is desired, including, but not limited to, monocots and di cots.
[0069] Examples of plant species of interest include, but are not limited to, maize (Zea mays), Brassica sp. (e.g., B. napus, B. rapa, B.juncea), alfalfa (Medicago sativa), rice (Oryza saliva), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum), peanuts (Arachis hypogaea), coconut (Cocos nucifera), olive (Olea europaea), cashew (Anacardium occidentale), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), and peas (Lathyrus spp.) alfalfaMedicago sativa),' clover or trefoil (Trifolium spp.),' pea, including (Pisum satinum and Pisum sativum), pigeon pea (Cajanus cajari), cowpea (Cigna unguiculata) and Lathyrus spp.; bean (Fabaceae or Leguminosaej, lentil Lens culinaris), lupin (Lupinus spp ), mesquite (Prosopis spp.),' carob (Ceratonia siliqua), peanut (Arachis hypogaea) or tamarind (Tamarindus indica). [0070] In certain embodiments, plants of the compositions and methods described herein are plants used to produce protein compositions for animal feed or human food including, but not limited to, legume crop species, (including, but not limited to, alfalfa, clover or trefoil, pea, including, pigeon pea, cowpea and Lathyrus spp., bean (Fabaceae ox Leguminosae), lentil, lupin, mesquite, carob, soybean, peanut, or tamarind), safflower, sunflower, Brassica, maize, palm, and coconut.
[0071] In certain embodiments, the plants of the compositions and methods described herein is a legume crop species, including, but not limited to, alfalfa (Medicago sativa),' clover or trefoil (Trifolium sppf, pea, including (Pisum satinum), pigeon pea (Cajanus cajan), cowpea (Cigna unguiculata) and Lathyrus spp. bean (Fabaceae or Legnminosae lentil (Lens culinaris),' lupin (Lupinus sppi), mesquite (Prosopis spp.y carob (Ceratonia siliqua), soybean (Glycine max)' , peanut (Arachis hypogaea) or tamarind (Tamarindus indica).
[0072] In certain embodiments, the plants of the compositions and methods described herein are elite plant lines (e.g., elite soybean line or elite pea line). As used herein, “elite line” refers to any line that has resulted from breeding and selection for superior agronomic performance that allows a producer to harvest a product of commercial significance. Numerous elite lines are available and known to those of skill in the art of plant breeding (e.g., soybean breeding).
[0073] In certain embodiments, the seeds or seeds of the plants of the compositions and methods described herein comprise at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methionine on a dry weight of seed basis, as compared to a control seed (e.g., seed not comprising the modified glycinin polypeptide).
[0074] In certain embodiments, the seed or seeds of the plants of the compositions and methods described herein comprises at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight of seed basis.
[0075] As used herein, total methionine refers to the overall quantity of methionine both as free methionine and proteogenic methionine (methionine as part of a protein or polypeptide), such that total methionine equals free methionine plus proteogenic methionine. Similarly, a total amino acid such as cysteine, threonine, lysine or tryptophan refers to the overall quantity of each amino acid, both as free amino acid and proteogenic amino acid (amino acid as part of a protein or polypeptide). Total methionine or other amino acid in the seed can be expressed on a dry weight basis of the seed.
[0076] In certain embodiments, the seeds or seeds of the plants of the composition and methods described herein comprise at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a control seed, as compared to a corresponding control seed (e.g., seed or seed of a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0077] As used herein, "percentage point" (pp) difference, change, increase or decrease refers to the arithmetic difference of two percentages, e.g. [transgenic or genetically modified value (%) - control value (%)] = percentage points. For example, a modified seed may contain 20% by weight of a component and the corresponding unmodified control seed may contain 15% by weight of that component. The difference in the component between the control and transgenic seed would be expressed as 5 percentage points.
[0078] As used herein, "percent increase" refers to a change or difference expressed as a fraction of the control value, e.g. {[modified/transgenic/test value (%) - control value (%)]/control value (%)} x 100% = percent change., or {[value obtained in a first location (%) - value obtained in second location (%)]/ value in the second location (%)}xl00 = percent change.
[0079] In certain embodiments, the seeds or seeds of the plants of the composition and methods described herein comprise a modified glycinin content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of corresponding control seed (e.g., wild-type glycinin from a seed or seed of a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0080] In certain embodiments, at least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed or seed of the plants of the composition and methods described herein comprises a modified glycinin polypeptide described herein and the seed or seed of the plant comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a corresponding control seed (e.g., seed or seed of a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0081] In certain embodiments, the plants or plants generated from the plant parts, plant cells and seeds of the compositions and method described herein have a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the introduced mutations.
[0082] As used herein, “yield” refers to the amount of agricultural production harvested per unit of land and may include reference to bushels per acre or kilograms per hectare of a crop at harvest, as adjusted for grain moisture. Grain moisture is measured in the grain at harvest. The adjusted test weight of grain is determined to be the weight in pounds per bushel or kilogram, adjusted for grain moisture level at harvest.
[0083] In certain embodiments, a polynucleotide encoding a modified glycinin polypeptide described herein is introduced in an endogenous glycinin gene (e.g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy8) locus. In certain embodiments, the modified glycinin polypeptide is introduced in an endogenous glycinin gene locus by modifying the endogenous glycinin gene (e g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy 8) sequence to encode a modified glycinin protein described herein. In certain embodiments, the modified glycinin polypeptide is introduced in an endogenous glycinin gene (e.g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy8) locus by replacing the endogenous glycinin gene with a polynucleotide encoding a modified glycinin polypeptide described herein. In certain embodiments, the modified glycinin polypeptide is introduced in an endogenous glycinin gene (e.g., Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, and Gy 8) locus by replacing a polynucleotide encoding an endogenous glycinin protein with a polynucleotide encoding a modified glycinin polypeptide described herein, such that the polynucleotide encoding a modified glycinin polypeptide is operably linked to the endogenous glycinin gene promoter. Gyl, Gy2, and Gy4 are the highest expressing glycinins in seed at each of 25, 28, 35, and 42 days after flowering. Accordingly, in certain embodiments the polynucleotide encoding a modified glycinin polypeptide described herein is introduced at a Gyl, Gy2, or Gy4 gene locus. In certain embodiments, a polynucleotide encoding a modified glycinin polypeptide is introduced in two or more endogenous glycinin gene loci (e.g., two or more of Gyl, Gy2, Gy 3, Gy 4, Gy 5, Gy6, Gy 7, and Gy8). In certain embodiments, the encoded modified glycinin polypeptide at the two or more endogenous glycinin gene loci comprise the same amino acid sequence. In certain embodiments, the encoded modified glycinin polypeptide at the two or more endogenous glycinin gene loci comprise different amino acid sequences. In embodiments, when three or more endogenous glycinin gene loci are modified, the modified glycinin polypeptides may comprise the same sequence, different sequences, or a combination thereof. Tn certain two or more endogenous glycinin gene loci comprises one or more of Gyl, Gy2, or Gy4 gene locus. As would be understood by a person of ordinary skill in the art multiple copies of a polynucleotide encoding a modified glycinin polypeptide can be introduced at a glycinin locus, such that, for example 2, 3, 4, 5, 10, 20, 30 or more copies are introduced at a locus or at multiple glycinin loci. In certain embodiments, the polynucleotide encoding the modified glycinin polypeptide is introduce into a glycinin gene locus using a genome modification enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), and engineered site-specific meganucleases.
[0084] In certain embodiments, a polynucleotide encoding a modified glycinin polypeptide described herein is introduced in at a non-native locus (e.g., a locus other than Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, or Gy 8) in the plant. The non-native locus is not particularly limited such that any locus in which leads to the expression of the modified glycinin polypeptide is acceptable. In certain embodiments, the polynucleotide encoding a modified glycinin polypeptide is introduced into one or more beta-conglycinin loci described herein, such that in certain embodiment at least one (e.g., 1, 2, 3, 4, 5, 6, or 7) endogenous beta-conglycinin locus is replaced with a polynucleotide encoding a modified glycinin polypeptide. In certain embodiments, at least one (e.g., 1, 2, or 3) beta-conglycinin loci on chromosome 10 such as, for example, Glyma.10g246300, Glyma.l0g246500, and Glyma.10g246400, is replaced with a polynucleotide encoding a modified glycinin polypeptide. In certain embodiments, each of eGlyma.10g246300, Glyma.10g246500, and Glyma.10g246400 is replaced with a polynucleotide encoding a modified glycinin polypeptide. In certain embodiments, at least one (e.g., 1, 2, 3, or 4) beta-conglycinin loci on chromosome 20 such as, for example, Glyma.20gl48200, Glyma.20gl48300, Glyma.20gl48400, and Glyma.20gl46200, is replaced with a polynucleotide encoding a modified glycinin polypeptide. In certain embodiments, at least one locus on chromosome 20 and at least one locus on chromosome 10 is replaced with a polynucleotide encoding a modified glycinin polypeptide. In certain embodiments, multiple copies of a polynucleotide encoding a modified glycinin polypeptide can be introduced at the at least one (e.g., 1, 2, 3, 4, 5, 6, or 7) endogenous beta-conglycinin locus. The multiple copies can encode the same modified glycinin, different modified glycinins, or a combination thereof. As would be understood by a person of ordinary skill in the art multiple copies of a polynucleotide encoding a modified glycinin polypeptide can be introduced at a non-native locus, such that, for example 2, 3, 4, 5, 10, 20, 30 or more copies are introduced at a locus or at multiple glycinin loci. Additionally, plants may comprise a polynucleotide encoding a modified glycinin polypeptide at multiple non-native loci, at a non-native loci and a glycinin locus, multiple nonnative loci and a glycinin locus, or multiple non-native loci and multiple glycinin loci. In certain embodiments, the polynucleotide encoding the modified glycinin polypeptide is introduce into a non-native locus using a genome modification enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), and engineered site-specific meganucleases.
[0085] In certain embodiments, the polynucleotides encoding the modified glycinin polypeptides are introduced into the plant, plant cell, plant part, ore seed using a nucleic acid construct, or an expression cassette described herein. In certain embodiments, the polynucleotides are stably expressed in the plant, plant cell, plant part, or seed using a stable transformation technique. [0086] In certain embodiments, the plant cells or plant parts are grown into plants. The method for generating the plants from the plant cells or plant parts is not particularly limited. These plants can then be grown, and either pollinated with the same strain or different strains, and the resulting progeny having constitutive expression of the desired phenotypic characteristic identified. Two or more generations can be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved. In some aspects of the present disclosure, the transformed seed, genome modified seed, or transgenic seed having a modified glycinin, nucleotide construct, or an expression cassette is stably incorporated into their genome.
[0087] In certain embodiments, the plant, plant part, plant cell or seed described herein, further comprises at least one additional modification, the at least one additional modification associated with increased protein content, increased seed glycinin content, increased methionine, or any combination thereof.
[0088] In certain embodiments, the at least one additional modification is selected from the group consisting of a modification decreasing the expression of beta-conglycinin, a modification increasing the expression or activity of at least one cystathionine-gamma-synthase (CGS) polypeptide (e.g. GM-CGS1 and/or GM-CGS2), a modification decreasing the expression and/or activity of methionine gamma-lyase (MGL), a modification decreasing the expression and/or activity of lysine-ketoglutarate reductase /Saccharopine dehydrogenase (LKR/SDH), a modification increasing the activity of dihydrodipicolinate synthase (DHPS), a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 1 (BS1) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 2 (BS2) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Sugars Will Eventually be Exported Transporter (SWT) polypeptide, a modification increasing expression or activity of an ABI3 polypeptide, a modification increasing expression or activity of an ODP1 polypeptide, (vii) a modification decreasing expression, activity, and/or stability of a Kix8-l polypeptide, a modification decreasing raffinose family oligosaccharides (RFO) content or any combination thereof.
[0089] In certain embodiments, the plant, plant part, plant cell or seed comprises at least one of a modification decreasing the expression of beta-conglycinin, a modification increasing the expression or activity of at least one CGS polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, or any combination thereof.
[0090] In certain embodiments, the plant, plant part, plant cell or seed comprises a modification decreasing the expression of beta-conglycinin, a modification increasing the expression or activity of CGS1, CGS2, or both, a modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, and a polynucleotide encoding a modified glycinin polypeptide described herein, such as, for example, a polynucleotide encoding a modified glycinin polypeptide having at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity with SEQ ID NO: 18-86. In certain embodiments, the plant, plant part, plant cell or seed further comprises a modification decreasing the expression and/or activity of methionine gamma-lyase (MGL) and/or a modification decreasing RFO content.
[0091] In soybean seeds, P-conglycinin, also referred to herein as conglycinin, the abundant 7S globulin storage protein, and glycinin consist of about 21% and 33% of total protein content, respectively (Utsumi et al., Food Science and Technology, 257-292 (1997)). Total soybean protein content did not change after silencing a and a’ subunits of P-conglycinin by RNAi (Kinney et al., The Plant Cell, 13, 623-629 (2001)). The resulting engineered seeds accumulated more glycinin that accounts for more than 50% of total seed protein, which compensated for the missing P-conglycinin in the engineered seeds.
[0092] P-conglycinin consists of 3 isoforms, a, a’ and p. Among them, only a and a’ contain Met and Trp residues in the mature protein. Glycinin has 5 isoforms, all of which have higher Met and Trp content compared to these of P-conglycinin (Utsumi et al., Food Science and Technology, 257-292 (1997)).
[0093] In certain embodiments, the modification decreasing the expression of beta-conglycinin comprises a knockdown or knockout of one or more (e.g., 2 or more, 3 or more, or 4 or more) isoforms of a beta-conglycinin gene. In certain embodiments, the one or more isoforms of the beta-conglycinin gene encodes a beta-conglycinin isoform comprising an amino acid sequence that is at least 90% identical to any one of SEQ ID NOs: 185-191. In certain embodiments, the modification decreasing the expression of beta-conglycinin comprises a knockdown or knockout of the beta-conglycinin genes encoding the beta-conglycinin isoforms comprising SEQ ID NOs: 185-191. In certain embodiments, the modification decreasing the expression of beta-conglycinin comprises the introduction of an inverted repeat sequence in the conglycinin gene cluster on chromosome 10, chromosome 20, or both, such as, for example, an inverted repeat sequence described in WO2025049884. In certain embodiments, the knockdown or knockout uses RNAi such as, for example, a dominate hairpin RNAi. In certain embodiments, the ratio of glycinin to conglycinin in the soybean plant or soybean seed is at least 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 65, 70, 75, 80, 85, 90, 95, 100, 500 or 1000 to 1.
[0094] Cystathionine-gamma-synthase (CGS) catalyzes the formation of cystathionine that is subsequently converted to homocysteine and finally to methionine (Kreft et al., Plant Physiology, 131, 1843-1854 (2003)). Methionine, the product of CGS, functions not only as a protein storage component, but also as a metabolite in plant cells. In Arabidopsis, CGS expression is regulated at the level of mRNA stability as a feedback mechanism from its product, such as Met or its metabolites. Exon 1 of CGS acts as a cis regulatory element to down-regulate its own mRNA stability in response to excess accumulation of Met (Chiba et al., Science, 286, 1371-1374 (1999)). There are two CGS genes in soybean, Glyma.09g235400 (GM-CGS1) and Glyma.18g261600 (GM-CGS2).
[0095] In certain embodiments, the method for increasing the expression or activity of a CGS polypeptide comprises a targeted genetic modification that removes a self-regulatory domain of a CGS gene. In certain embodiments, the CGS self-regulatory domain encodes a polypeptide comprising an amino acid sequence that is at least 80% identical to SEQ ID NO: 192. In certain embodiments, the modified CGS gene encodes a CGS protein comprising an amino acid sequence that is at least 70% identical to any one of SEQ ID NOs: 195-197.
[0096] Lysine is one of the essential amino acids that are present in limiting amounts in crop seeds. The lysine biosynthetic pathway is feedback inhibited by lysine at a rate limiting step, catalyzed by dihydrodipicolinate synthase (DHPS). Seed specific expression of feedback insensitive bacterial DHPS enzyme in various plants resulted in significant seed lysine over production (Falco et al., Bio/Technology, 13, 577-582 (1995); Mazur et al., Science, 285, 372- 375 (1999)). The enhanced lysine production may be associated with increased activity of the lysine catabolic enzyme, such as the bi-functional enzyme Lysine-ketoglutarate reductase /Saccharopine dehydrogenase (LKR/SDH) and enhanced levels of lysine catabolic products (Falco et al., Bio/Technology, 13, 577-582 (1995); Mazur et al., Science, 285, 372-375 (1999)). The modification decreasing the expression and/or activity of LKR/SDH may be any modification known in the art such as, for example, a modification described in
WO2021/216482. In certain embodiments, the modification increasing the activity of DHPS comprises a targeted genetic modification of the DHPS gene to remove a feedback inhibition domain of a DHPS gene.
[0097] In certain embodiments, the modification decreasing the expression, activity, and/or stability of an endogenous MGL polypeptide comprises a knockdown or knockout of an endogenous MGL gene, the endogenous MGL gene encoding an MGL polypeptide comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 182-184. In certain embodiments, the endogenous MGL gene comprises a nucleic acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 216-218.
[0098] Mother of FT (flowering time) and TFL1 (terminated flowering locus 1) (referred to herein as MFT) polypeptides are members of the phosphatidylethanolamine binding protein (PEBP) family.
[0099] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide comprises a modification introducing a polymorphism in an endogenous MFT gene to encode a modified MFT polypeptide, the modified MFT polypeptide comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NO: 167 or 168 and comprises a modification at a position other than the amino acid corresponding to LI 06 in SEQ ID NO: 167. In certain embodiments, the modified MFT polypeptide comprises a nonleucine at the amino acid residue corresponding to position L140 of SEQ ID NO: 167. In certain embodiments, the modified MFT polypeptide comprises a non-threonine at the amino acid residue corresponding to position T82 of SEQ ID NO: 167. In certain embodiments, the modified MFT polypeptide comprises both a non-leucine at the amino acid residue corresponding to position L140 of SEQ ID NO: 167 and a non-threonine at the amino acid residue corresponding to position T82 of SEQ ID NO: 167. In certain embodiments, the modified MFT polypeptide comprises an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 168.
[0100] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide comprises a knockout of an endogenous MFT gene, the endogenous MFT gene encoding a MFT polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 167 or 168. [0101 ] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide comprises a knockdown of an endogenous MFT gene, the endogenous MFT gene encoding a MFT polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 167. In certain embodiments, the plant cell (e.g., legume plant cell, soybean cell or pea cell), seed (e.g., legume seed, soybean seed or pea seed) or the plant (e.g., legume plant, soybean plant or pea plant) comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of MFT, as compared to a control seed.
[0102] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide comprises a knockout of an endogenous CCT gene, the endogenous CCT gene encoding a CCT protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 169 or 170. In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide comprises a knockdown of an endogenous CCT gene, the endogenous CCT gene encoding a CCT protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 169 or 170. In certain embodiments, the plant cell (e.g., legume plant cell, soybean cell or pea cell), seed (e.g., legume seed, soybean seed or pea seed) or the plant (e.g., legume plant, soybean plant or pea plant) comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of CCT, as compared to a control seed. In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide is introduced by introgressing a high protein CCT QTL, such as, for example, the high protein CCT QTL from PI678444 or PI678444 (e.g., SEQ ID NO: 204 (DNA) and 170 (protein)).
[0103] Two soybean BS genes were identified by blasting Medicago BS1 against a soybean genome sequence. Glyma.10g244400 was named as GmBSl and glyma.20Gl 50000 was named as GmBS2. GmBSl and GmBS2 show 70.9% identity and 71 .4% identity to Medicago BS1, respectively.
[0104] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous BS1 polypeptide comprises a knockout of an endogenous BS1 gene, the BS1 gene encoding a BS1 protein comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 165. In certain embodiments, the knockout of the endogenous BS1 gene is generated by introducing a frame-shift mutation in the endogenous BS1 gene. In certain embodiments, the knockout of the endogenous BS1 gene is generated by introducing a modification removing the endogenous BS1 gene. In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous BS1 polypeptide comprises a knockdown of an endogenous BS1 gene, the BS1 gene encoding a BS1 protein comprising an amino acid sequence that is at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 165. In certain embodiments, the knockdown of the endogenous BS1 gene is generated by introducing an in-frame deletion in an endogenous BS1 gene. In certain embodiments, the knockdown of the endogenous BS1 gene is generated by introducing a modification in the endogenous promoter and/or replacing the endogenous promoter with promoter that results in decreased gene expression as compared to the endogenous promoter. In certain embodiments, the knockdown of the endogenous BS1 gene is generated by seed specific silencing of the gene such as, for example, editing an endogenous seed specific miRNA to target the endogenous BS1 gene (WO2021150469). In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous BS1 polypeptide comprises a modification introducing a polymorphism in an endogenous BS1 gene to encode a modified BS1 polypeptide, the modified BS1 polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 165 and comprises at least one amino acid substitution as compared to SEQ ID NO: 165. In certain embodiments, the plant cell (e.g., legume plant cell, soybean cell or pea cell), seed (e.g., legume seed, soybean seed or pea seed) or the plant (e.g., legume plant, soybean plant or pea plant) comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of BS1, as compared to a control seed.
[0105] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous BS2 polypeptide comprises a knockout of an endogenous BS2 gene, the BS2 gene encoding a BS2 protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 166. In certain embodiments, the knockout of the endogenous BS2 gene is generated by introducing a frame-shift mutation in the endogenous BS2 gene. In certain embodiments, the knockout of the endogenous BS2 gene is generated by introducing a mutation removing the endogenous BS2 gene. In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous BS2 polypeptide comprises a knockdown of an endogenous BS2 gene, the BS2 gene encoding a BS2 protein comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 166. In certain embodiments, the plant cell (e.g., legume plant cell, soybean cell or pea cell), seed (e.g., legume seed, soybean seed or pea seed) or the plant (e.g., legume plant, soybean plant or pea plant) comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of BS2, as compared to a control seed. In certain embodiments, the knockdown of the endogenous BS2 gene is generated by introducing an in-frame deletion in an endogenous BS2 gene. In certain embodiments, the knockdown of the endogenous BS2 gene is generated by introducing a modification in the endogenous promoter and/or replacing the endogenous promoter with promoter that results in decreased gene expression as compared to the endogenous promoter. In certain embodiments, the knockdown of the endogenous BS2 gene is generated by seed specific silencing of the gene such as, for example, editing an endogenous seed specific miRNA to target the endogenous BS2 gene (WO2021150469). In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous BS2 polypeptide comprises a modification introducing a polymorphism in an endogenous BS2 gene to encode a modified BS2 polypeptide, the modified BS2 polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 166 and comprises at least one amino acid substitution as compared to SEQ ID NO: 166.
[0106] The Sugars Will Eventually be Exported Transporters (SWTs) is a group of efflux transporters which play critical roles in sugar efflux, phloem loading, and sugar import into developing seeds. Carbohydrates are translocated from source to seeds through seed coat. SWTs expressed in the seed coat facilitate sucrose efflux from the seed coat to developing seed cotyledons, affecting seed development. In certain embodiments of the compositions and methods described herein the SWT gene is selected from the group consisting of SWT4, SWT5, SWT16, SWT17, SWT24, SWT39.
[0107] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide comprises a knockout of an endogenous SWT gene, the endogenous SWT gene encoding a SWT4 polypeptide, SWT5 polypeptide, SWT 16 polypeptide, SWT 17 polypeptide, SWT24 polypeptide, SWT39 polypeptide, or any combination thereof. In certain embodiments, the endogenous SWT gene encodes a polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOs: 172-177. In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide comprises a knockdown of an endogenous SWT gene, the endogenous SWT gene encoding a SWT polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to any one of SEQ ID NOs: 172-177. In certain embodiments, the plant cell (e.g., legume plant cell, soybean cell or pea cell), seed (e.g., legume seed, soybean seed or pea seed) or the plant (e.g., legume plant, soybean plant or pea plant) comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of SWT4, SWT5, SWT 16, SWT 17, SWT24, SWT39, or any combination thereof, as compared to a control seed. [0108] In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous SWT polypeptide is introduced by introgressing a high protein SWT QTL. In certain embodiments, the high protein SWT QTL is a SWT39 high protein QTL (SEQ ID NO: 171), such as, for example, the SWT39 QTL from PI678444.
[0109] In certain embodiments, the modification increasing expression or activity of a AB 13 comprises introducing a genetic modification in the endogenous gene sequence. In certain embodiments, the genetic modification is in the promoter region, for example, a promoter swap so that the endogenous gene is operably linked to a heterologous promoter. In certain embodiments, the modification increasing expression or activity of ABI3 comprises introducing a nucleic acid construct comprising a polynucleotide encoding the ABI3 polypeptide operably linked to a heterologous regulatory element. In certain embodiments, the polynucleotide encodes a polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 178 or 179. In certain embodiments, the ABI3 comprises a nucleic acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 212 or 213. [0110] In certain embodiments, the modification increasing expression or activity of a ODP1 comprises introducing a genetic modification in the endogenous gene sequence. In certain embodiments, the genetic modification is in the promoter region, for example, a promoter swap so that the endogenous gene is operably linked to a heterologous promoter. In certain embodiments, the modification increasing expression or activity of ODP1 comprises introducing a nucleic acid construct comprising a polynucleotide encoding the ODP1 polypeptide operably linked to a heterologous regulatory element. In certain embodiments, the polynucleotide encodes a polypeptide comprising an amino acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 180 or 181. In certain embodiments, the ODP1 comprises a nucleic acid sequence that is at least, or at least about, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 214 or 215. [0111 ] Raffinose family oligosaccharides (RFOs) are alpha-galactosyl derivatives of sucrose, and include, for example, raffinose and stachyose. RFOs are anti -nutritional factors that reduce metabolizable energy, cause poor digestibility, and an increase in flatulence and diarrhea in monogastric animals. As used herein, raffinose family oligosaccharides (RFO) content refers to the content of raffinose and stachyose. The RFO content can be measured using methods known in the art such as those described in US Patent Publication No. 2019-0383733.
[0112] In certain embodiments, the modification decreasing RFO content comprises a decrease in the expression, activity and/or stability of a raffinose synthase. In certain embodiments, the modification comprises a decrease in the expression, activity and/or activity of raffinose synthase 2 (RS2), raffinose synthase 3 (RS3) and/or raffinose synthase 4 (RS4). In certain embodiments, the plant (e.g., legume, soybean or pea) seed comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of RS2, RS4, or RS2 and RS4, as compared to a control seed. In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous RS polypeptide comprises a knockout of an endogenous RS gene, the endogenous RS gene encoding a RS polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 198. In certain embodiments of the compositions and methods described herein, the modification decreasing expression, activity, and/or stability of an endogenous RS polypeptide comprises a knockdown of an endogenous RS gene, the endogenous RS gene encoding an RS polypeptide comprising an amino acid sequence that is at least, or at least about, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 198. In certain embodiments, the cell, seed, or plant comprises at least a 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% decrease in expression of a RS, as compared to a control seed.
[0113] As used herein, “decrease in expression” “decreased expression” or the like refers to any detectable reduction in expression of a gene and/or the corresponding polypeptide. Similarly, “decrease in activity” “decreased activity” or the like refers to any detectable reduction in the activity (e.g., enzymatic activity) of the encoded polypeptide. The method by which the expression or activity of a gene or polypeptide described herein is decreased is not particularly limited and can be done using methods known in the art such at RNAi, gene knockdown, gene knockout, or targeted amino acid modification.
[0114] As used herein a “gene knockout” is used to refer to gene in which there is no detectable expression of the mRNA or protein encoded by the gene, whereas “gene knockdown” is used to refer to a gene in which there is reduced expression of the mRNA or protein encoded by the gene. As used herein, “decreased expression” encompasses both gene knockout and gene knockdown.
[0115] As used herein, a “targeted” genetic modification refers to the direct manipulation of an organism’s genes. The targeted modification may be introduced using any technique known in the art, such as, for example, plant breeding, genome editing, or single locus conversion.
[0116] As used herein, “increase in activity” “increased activity” and the like refers to any detectable gain in activity (e g., enzymatic activity) of the polypeptide. The method by which the activity of a polypeptide described herein is decreased is not particularly limited and can be done using methods known in the art such as increasing expression of the gene encoding the polypeptide (e.g., transgenic expression, promoter swap, gene modification) or a targeted modification the gene encoding the polypeptide to, for example, remove a self-regulatory domain.
C. Protein Composition
[0117] The disclosure also provides protein compositions (e.g., soy protein composition or pea protein composition) comprising any of the modified glycinin polypeptides described herein. In certain embodiments, the protein composition comprises a methionine content of at least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20%.
[0118] In certain embodiments, the methionine content in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein. In certain embodiments, the sum of the methionine and tryptophan in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein. In certain embodiments, the sum of the methionine, lysine, threonine and tryptophan in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
[0119] As used herein, “protein composition” refers to food ingredients for humans or animals which contain plant proteins, such as, for example, legume proteins (e.g., soy protein or pea protein). In certain embodiments, the composition is an animal feed composition. In certain embodiments, the composition is a human food composition. In certain embodiments, the human food composition is a composition selected from the group consisting of soybean meal; soy flour; defatted soy flour; soymilk; spray-dried soymilk; soy protein concentrate; texturized soy protein concentrate; hydrolyzed soy protein; soy protein isolate; spray-dried tofu; soy meat analog; soy cheese analog; and soy coffee creamer.
II. Methods
A. Method for Generating Plants Producing Seeds with an Increased Methionine Content [0120] The disclosure also provides methods for producing a plant producing seed having increased methionine content comprising introducing into a regenerable plant cell a polynucleotide encoding any of the modified glycinin polypeptides described herein; and generating the plant, wherein the plant comprises the polynucleotide encoding the modified glycinin polypeptide and produces a seed having an increased amount of methionine as compared to seed of a plant not comprising the modified glycinin polypeptide. In certain embodiments, the method further comprises introducing at least one additional modification associated with increased glycinin, increased methionine, increased protein, or any combination thereof, such as for example, the modifications described herein. The method for introducing the at least one additional modification may be any method known in the art or described herein. In certain embodiments, the at least one additional modification is introduced by genome editing or transformation. In certain embodiments, the at least one additional modification is introduced by crossing the plant produced by the method with a second plant comprising the at least one additional modification, harvesting the seed produced thereby, and generating a progeny plant, the progeny plant comprising the modified glycinin polypeptide and the at least one additional modification.
[0121] In certain embodiments, the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methionine on a dry weight of seed basis, as compared to a seed from a control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide).
[0122] In certain embodiments, the seed of the plant produced by the method at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight basis.
[0123] In certain embodiments, the seed of the plant produced by the method comprises at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0124] In certain embodiments, the seed of the plant produced by the method comprises a modified glycinin content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0125] In certain embodiments, at least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the plant produced by the method comprises a modified glycinin polypeptide described herein and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0126] In certain embodiments, the method generates plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the introduced mutations.
[0127] Various methods can be used to introduce the polynucleotide sequences into a plant, plant part, plant cell, seed, and/or grain. "Introducing" is intended to mean presenting to the plant, plant cell, seed, and/or grain the inventive polynucleotide or resulting polypeptide in such a manner that the sequence gains access to the interior of a cell of the plant. The methods of the disclosure do not depend on a particular method for introducing a sequence into a plant, plant cell, seed, and/or grain, only that the polynucleotide or polypeptide gains access to the interior of at least one cell of the plant.
[0128] In certain embodiments, the method for introducing the modified glycinin polypeptide and/or the modification associated with increased glycinin, increased methionine, and/or increased protein, comprises transforming a regenerable plant cell with a nucleic acid construct or expression cassette comprising a polynucleotide described herein. The transformation technique of the methods is not particularly limited and includes both stable transformation methods and transient transformation methods.
[0129] "Stable transformation" is intended to mean that the polynucleotide introduced into a plant integrates into the genome of the plant of interest and is capable of being inherited by the progeny thereof. "Transient transformation" is intended to mean that a polynucleotide is introduced into the plant of interest and does not integrate into the genome of the plant or organism, or a polypeptide is introduced into a plant or organism.
[0130] Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing polypeptides and polynucleotides into plant cells include microinjection (Crossway et al. (1986) Biotechmques 4:320-334), electroporation (Riggs et al. (1986) Proc. Natl. Acad. Sci. USA 83:5602-5606), 4<q/Y> 7c/c7c/7z//7?-mediated transformation (U.S. Patent No. 5,563,055 and U.S. Patent No.
5,981,840), Ochrobacterium-mediated transformation (U.S. Patent Application Publication 2018/0216123 and WO20/092494) direct gene transfer (Paszkowski et al. (1984) EMBO J. 3:2717-2722), and ballistic particle acceleration (see, for example, U.S. Patent Nos. 4,945,050; U.S. Patent No. 5,879,918; U.S. Patent No. 5,886,244; and, 5,932,782; Tomes et al. (1995) in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg and Phillips (Springer- Verlag, Berlin); McCabe et al. (1988) Biotechnology 6:923-926); and Lecl transformation (WO 00/28058). D'Halluin et al. (1992) Plant Cell 4: 1495-1505 (electroporation); Li et al. (1993) Plant Cell Reports 12:250-255 and Christou and Ford (1995) Annals of Botany 75:407-413 (rice); Osjoda et al. (1996) Nature Biotechnology 14:745-750 (maize via Agrobacterium lumefaciens) all of which are herein incorporated by reference.
[0131] Methods are known in the art for the targeted insertion of a polynucleotide at a specific location in the plant genome. In one embodiment, the insertion of the polynucleotide at a desired genomic location is achieved using a site-specific recombination system. See, for example, WO99/25821, WO99/25854, WO99/25840, WO99/25855, and WO99/25853, all of which are herein incorporated by reference. Briefly, the polynucleotide disclosed herein can be contained in a transfer cassette flanked by two non-recombinogenic recombination sites. The transfer cassette is introduced into a plant having stably incorporated into its genome a target site which is flanked by two non-recombinogenic recombination sites that correspond to the sites of the transfer cassette. An appropriate recombinase is provided, and the transfer cassette is integrated at the target site. The polynucleotide of interest is thereby integrated at a specific chromosomal position in the plant genome.
[0132] One of skill will recognize that after the expression cassette containing the inventive polynucleotide is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
[0133] Parts obtained from the regenerated plants described herein, such as flowers, seeds, leaves, branches, fruit, and the like are included, provided that these parts comprise cells comprising the inventive polynucleotide. Progeny and variants, and mutants of the regenerated plants are also included, provided that these parts comprise the introduced nucleic acid sequences.
[0134] In one embodiment, a homozygous transgenic plant can be obtained by sexually mating (selfing) a heterozygous transgenic plant that contains a single added heterologous nucleic acid, germinating some of the seed produced and analyzing the resulting plants produced. Back- crossing to a parental plant and out-crossing with a non-transgenic plant are also contemplated. [0135] In certain embodiments, the method for introducing the modified glycinin polypeptide and/or the modification associated with increased glycinin, increased methionine, and/or increased protein, into the regenerable plant cell comprises using genome editing technologies. In certain embodiments, the method comprises editing the endogenous gene or a previously introduced gene.
[0136] The genome editing technology for use in the methods and compositions described herein is not particularly limited and may be any genome editing technique that allows for the modification or targeted introduction of the desired polynucleotide.
[0137] In certain embodiments the genome editing technique uses an enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, zinc finger nuclease, a transcription activator-like effector nuclease (TALEN), or an engineered site-specific meganuclease.
[0138] In certain embodiments, the genome modification may be facilitated through the induction of a double-stranded break (DSB) or single-strand break, in a defined position in the genome near the desired alteration. DSBs can be induced using any DSB-inducing agent available, including, but not limited to, TALENs, meganucleases, zinc finger nucleases, Cas9- gRNA systems (based on bacterial CRISPR-Cas systems), guided cpfl endonuclease systems, and the like. In some embodiments, the introduction of a DSB can be combined with the introduction of a polynucleotide modification template.
[0139] In certain embodiments, the method comprises: (a) providing a guide RNA, at least one polynucleotide modification template, and at least one Cas endonuclease to the regenerable plant cell, wherein the at least one Cas endonuclease introduces a double stranded break at an endogenous gene to be modified (e.g., beta-conglycinins, Gyl, Gy2, Gy3, Gy4, Gy5, Gy6, Gy7, Gy ) in the plant cell, and wherein the polynucleotide modification template generates a modified gene that encodes any of the modifications described herein; (b) obtaining a plant from the plant cell; and (c) generating a progeny plant.
[0140] Double-strand breaks induced by double-strand-break-inducing agents, such as endonucleases that cleave the phosphodiester bond within a polynucleotide chain, can result in the induction of DNA repair mechanisms, including the non-homologous end-joining pathway, and homologous recombination. Endonucleases include a range of different enzymes, including restriction endonucleases (see e.g. Roberts et al., (2003) Nucleic Acids Res 1:418-20), Roberts et al., (2003) Nucleic Acids Res 31 : 1805-12, and Belfort et al., (2002) in Mobile DNA II, pp. 761- 783, Eds. Craigie et al., (ASM Press, Washington, DC)), meganucleases (see e.g., WO 2009/114321; Gao et al. (2010) Plant Journal 1 :176-187), TAL effector nucleases or TALENs (see e.g., US20110145940, Christian, M., T. Cermak, et al. 2010. Targeting DNA double-strand breaks with TAL effector nucleases. Genetics 186(2): 757-61 and Boch et al., (2009), Science 326(5959): 1509-12), zinc finger nucleases (see e.g. Kim, Y. G., J. Cha, et al. (1996). "Hybrid restriction enzymes: zinc finger fusions to FokI cleavage”), and CRISPR-Cas endonucleases (see e.g. W02007/025097 application published March 1, 2007).
[0141] Once a double-strand break is induced in the genome, cellular DNA repair mechanisms are activated to repair the break. There are two DNA repair pathways. One is termed nonhomologous end-joining (NHEJ) pathway (Bleuyard et al., (2006) DNA Repair 5: 1-12) and the other is homology-directed repair (HDR). The structural integrity of chromosomes is typically preserved by NHEJ, but deletions, insertions, or other rearrangements (such as chromosomal translocations) are possible (Siebert and Puchta, 2002, Plant Cell 14: 1121-31; Pacher et al., 2007, Genetics 175:21-9. The HDR pathway is another cellular mechanism to repair double-stranded DNA breaks and includes homologous recombination (HR) and singlestrand annealing (SSA) (Lieber. 2010 Annu. Rev. Biochem. 79: 181-211).
[0142] In addition to the double-strand break inducing agents, site-specific base conversions can also be achieved to engineer one or more nucleotide changes to create one or more modifications described herein into the genome. These include for example, a site-specific base edit mediated by an C»G to T»A or an A»T to G»C base editing deaminase enzymes (Gaudelli et al., Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage." Nature (2017); Nishida et al. “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.” Science 353 (6305) (2016); Komor et al. “Programmable editing of a target base in genomic DNA without double- stranded DNA cleavage.” Nature 533 (7603) (2016):420-4.
[0143] In the methods described herein, the endogenous gene may be modified by a CRISPR associated (Cas) endonuclease, a Zn-finger nuclease-mediated system, a meganuclease-mediated system, an oligonucleobase-mediated system, or any gene modification system known to one of ordinary skill in the art.
[0144] In certain embodiments the endogenous gene is modified by a CRISPR associated (Cas) endonuclease.
[0145] Class I Cas endonucleases comprise multisubunit effector complexes (Types I, III, and IV), while Class 2 systems comprise single protein effectors (Types II, V, and VI) (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15; Zetsche et al., 2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular Cell 60, 1-13; Haft et al., 2005, Computational Biology, PLoS Comput Biol 1(6): e60; and Koonin et al. 2017, Curr Opinion Microbiology 37:67-78). In Class 2 Type II systems, the Cas endonuclease acts in complex with a guide polynucleotide.
[0146] Accordingly, in certain embodiments of the methods described herein the Cas endonuclease forms a complex with a guide polynucleotide (e.g., guide polynucleotide/Cas endonuclease complex).
[0147] As used herein, the term “guide polynucleotide”, relates to a polynucleotide sequence that can form a complex with a Cas endonuclease, including the Cas endonucleases described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site. The guide polynucleotide sequence can be an RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence). The guide polynucleotide may further comprise a chemically modified base, such as, but not limited, to Locked Nucleic Acid (LNA), 5-methyl dC, 2,6-Diaminopurine, 2’-Fluoro A, 2’-Fluoro U, 2'-O-Methyl RNA, Phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5’ to 3’ covalent linkage resulting in circularization.
[0148] In certain embodiments, the Cas endonuclease forms a complex with a guide polynucleotide (e.g., gRNA) that directs the Cas endonuclease to cleave the DNA target to enable target recognition, binding, and cleavage by the Cas endonuclease. The guide polynucleotide (e.g., gRNA) may comprise a Cas endonuclease recognition (CER) domain that interacts with the Cas endonuclease, and a Variable Targeting (VT) domain that hybridizes to a nucleotide sequence in a target DNA. In certain embodiments, the guide polynucleotide (e.g., gRNA) comprises a CRISPR nucleotide (crNucleotide; e.g., crRNA) and a trans-activating CRISPR nucleotide (tracrNucleotide; e.g., tracrRNA) to guide the Cas endonuclease to its DNA target. The guide polynucleotide (e.g., gRNA) comprises a spacer region complementary to one strand of the double strand DNA target and a region that base pairs with the tracrNucleotide (e.g., tracrRNA), forming a nucleotide duplex (e.g. RNA duplex).
[0149] In certain embodiments, the gRNA is a “single guide RNA” (sgRNA) that comprises a synthetic fusion of crRNA and tracrRNA. In many systems, the Cas endonuclease-guide polynucleotide complex recognizes a short nucleotide sequence adjacent to the target sequence (protospacer), called a “protospacer adjacent motif’ (PAM).
[0150] The terms “single guide RNA" and “sgRNA” are used interchangeably herein and relate to a synthetic fusion of two RNA molecules, a crRNA (CRISPR RNA) comprising a variable targeting domain (linked to a tracr mate sequence that hybridizes to a tracrRNA), fused to a tracrRNA (trans-activating CRISPR RNA). The single guide RNA can comprise a crRNA or crRNA fragment and a tracrRNA or tracrRNA fragment of the type II CRISPR/Cas system that can form a complex with a type II Cas endonuclease, wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, optionally bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site.
[0151] The nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a RNA sequence, a DNA sequence, or a RNA-DNA combination sequence. In one embodiment, the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can be at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 nucleotides in length. In one embodiment, the nucleotide sequence linking the crNucleotide and the tracrNucleotide of a single guide polynucleotide can comprise a tetraloop sequence, such as, but not limiting to a GAAA tetraloop sequence.
[0152] The term “variable targeting domain” or “VT domain” is used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target site. The percent complementation between the first nucleotide sequence domain (VT domain) and the target sequence can be at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%, 67%,
68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%,
84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or
100%. The variable targeting domain can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29 or 30 nucleotides in length. In some embodiments, the variable targeting domain comprises a contiguous stretch of 12 to 30 nucleotides. The variable targeting domain can be composed of a DNA sequence, an RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.
[0153] The term “Cas endonuclease recognition domain” or “CER domain” (of a guide polynucleotide) is used interchangeably herein and includes a nucleotide sequence that interacts with a Cas endonuclease polypeptide. A CER domain comprises a (trans-acting) tracrNucleotide mate sequence followed by a tracrNucleotide sequence. The CER domain can be composed of a DNA sequence, an RNA sequence, a modified DNA sequence, a modified RNA sequence (see for example US20150059010A1, published 26 February 2015), or any combination thereof. [0154] A “protospacer adjacent motif’ (PAM) as used herein refers to a short nucleotide sequence adjacent to a target sequence (protospacer) that is recognized (targeted) by a guide polynucleotide/Cas endonuclease system described herein. In certain embodiments, the Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not adjacent to, or near, a PAM sequence. In certain embodiments, the PAM precedes the target sequence (e.g., Casl2a). In certain embodiments, the PAM follows the target sequence (e g., S. pyogenes Cas9). The sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used. The PAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long. [0155] As used herein, the terms “guide polynucleotide/Cas endonuclease complex”, “guide polynucleotide/Cas endonuclease system”, “ guide polynucleotide/Cas complex”, “guide polynucleotide/Cas system” and “guided Cas system” “polynucleotide-guided endonuclease”, and “PGEN” are used interchangeably herein and refer to at least one guide polynucleotide and at least one Cas endonuclease, that are capable of forming a complex, wherein said guide polynucleotide/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site. A guide polynucleotide/Cas endonuclease complex herein can comprise Cas protein(s) and suitable polynucleotide component s) of any of the known CRISPR systems (Horvath and Barrangou, 2010, Science 327: 167-170; Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15; Zetsche et al., 2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular Cell 60, 1-13). In certain embodiments, the guide polynucleotide/Cas endonuclease complex is provided as a ribonucleoprotein (RNP), wherein the Cas endonuclease component is provided as a protein and the guide polynucleotide component is provided as a ribonucleotide.
[0156] Examples of Cas endonucleases for use in the methods described herein include, but are not limited to, Cas9 and Cpfl. Cas9 (formerly referred to as Cas5, Csnl, or Csxl2) is a Class 2 Type II Cas endonuclease (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15). A Cas9-gRNA complex recognizes a 3’ PAM sequence (NGG for the S. pyogenes Cas9) at the target site, permitting the spacer of the guide RNA to invade the double-stranded DNA target, and, if sufficient homology between the spacer and protospacer exists, generate a double-strand break cleavage. Cas9 endonucleases comprise RuvC and HNH domains that together produce double strand breaks, and separately can produce single strand breaks. For the S. pyogenes Cas9 endonuclease, the double-strand break leaves a blunt end. Cpfl is a Clas 2 Type V Cas endonuclease, and comprises nuclease RuvC domain but lacks an HNH domain (Yamane et al., 2016, Cell 165:949-962). Cpfl endonucleases create “sticky” overhang ends.
[0157] Some uses for Cas9-gRNA systems at a genomic target site include, but are not limited to, insertions, deletions, substitutions, or modifications of one or more nucleotides at the target site; modifying or replacing nucleotide sequences of interest (such as a regulatory elements); insertion of polynucleotides of interest; gene knock-out; gene-knock in; modification of splicing sites and/or introducing alternate splicing sites; modifications of nucleotide sequences encoding a protein of interest; amino acid and/or protein fusions; and gene silencing by expressing an inverted repeat into a gene of interest.
[0158] The terms “target site”, “target sequence”, “target site sequence, ’’target DNA”, “target locus”, “genomic target site”, “genomic target sequence”, “genomic target locus” and “protospacer”, are used interchangeably herein and refer to a polynucleotide sequence such as, but not limited to, a nucleotide sequence on a chromosome, episome, a locus, or any other DNA molecule in the genome (including chromosomal, chloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which a guide polynucleotide/Cas endonuclease complex can recognize, bind to, and optionally nick or cleave . The target site can be an endogenous site in the genome of a cell, or alternatively, the target site can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target site can be found in a heterologous genomic location compared to where it occurs in nature. As used herein, terms “endogenous target sequence” and “native target sequence” are used interchangeable herein to refer to a target sequence that is endogenous or native to the genome of a cell and is at the endogenous or native position of that target sequence in the genome of the cell. An “artificial target site” or “artificial target sequence” are used interchangeably herein and refer to a target sequence that has been introduced into the genome of a cell. Such an artificial target sequence can be identical in sequence to an endogenous or native target sequence in the genome of a cell but be located in a different position (i.e., a non-endogenous or non-native position) in the genome of a cell. An “altered target site”, “altered target sequence”, “modified target site”, “modified target sequence” are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence. Such “alterations” include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, or (iv) any combination of (i) - (iii).
[0159] A “polynucleotide modification template” is also provided that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited. A nucleotide modification can be at least one nucleotide substitution, addition, deletion, or chemical alteration. Optionally, the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
[0160] In certain embodiments of the methods disclosed herein, a polynucleotide of interest is inserted at a target site and provided as part of a “donor DNA” molecule. As used herein, “donor DNA” is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas endonuclease. The donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest. The first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome. The donor DNA can be tethered to the guide polynucleotide. Tethered donor DNAs can allow for co- localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al., 2013, Nature Methods Vol. 10: 957-963). The amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions.
[0161] The process for editing a genomic sequence at a Cas9-gRNA double-strand-break site with a modification template generally comprises: providing a host cell with a Cas9-gRNA complex that recognizes a target sequence in the genome of the host cell and is able to induce a double-strand-break in the genomic sequence, and at least one polynucleotide modification template comprising at least one nucleotide alteration when compared to the nucleotide sequence to be edited. The polynucleotide modification template can further comprise nucleotide sequences flanking the at least one nucleotide alteration, in which the flanking sequences are substantially homologous to the chromosomal region flanking the double-strand break. Genome editing using double-strand-break-inducing agents, such as Cas9-gRNA complexes, has been described, for example in US20150082478 published on 19 March 2015, WO2015026886 published on 26 February 2015, W02016007347 published 14 January 2016, and W02016025131 published on 18 February 2016.
[0162] To facilitate optimal expression and nuclear localization for eukaryotic cells, the gene comprising the Cas endonuclease may be optimized as described in WO2016186953 published 24 November 2016, and then delivered into cells as DNA expression cassettes by methods known in the art. In certain embodiments, the Cas endonuclease is provided as a polypeptide. In certain embodiments, the Cas endonuclease is provided as a polynucleotide encoding a polypeptide. In certain embodiments, the guide RNA is provided as a DNA molecule encoding one or more RNA molecules. In certain embodiments, the guide RNA is provided as RNA or chemically modified RNA. In certain embodiments, the Cas endonuclease protein and guide RNA are provided as a ribonucleoprotein complex (RNP).
[0163] In certain embodiments of the inventive methods described herein the endogenous gene is modified by a zinc-finger-mediated genome editing process. The zinc-finger-mediated genome editing process for editing a chromosomal sequence includes for example: (a) introducing into a cell at least one nucleic acid encoding a zinc finger nuclease that recognizes a target sequence in the chromosomal sequence and is able to cleave a site in the chromosomal sequence, and, optionally, (i) at least one donor polynucleotide that includes a sequence for integration flanked by an upstream sequence and a downstream sequence that exhibit substantial sequence identity with either side of the cleavage site, or (ii) at least one exchange polynucleotide comprising a sequence that is substantially identical to a portion of the chromosomal sequence at the cleavage site and which further comprises at least one nucleotide change; and (b) culturing the cell to allow expression of the zinc finger nuclease such that the zinc finger nuclease introduces a double-stranded break into the chromosomal sequence, and wherein the double-stranded break is repaired by (i) a non-homologous end-joining repair process such that an inactivating mutation is introduced into the chromosomal sequence, or (ii) a homology-directed repair process such that the sequence in the donor polynucleotide is integrated into the chromosomal sequence or the sequence in the exchange polynucleotide is exchanged with the portion of the chromosomal sequence.
[0164] A zinc finger nuclease includes a DNA binding domain (i.e., zinc finger) and a cleavage domain (i.e., nuclease). The nucleic acid encoding a zinc finger nuclease may include DNA or RNA. Zinc finger binding domains may be engineered to recognize and bind to any nucleic acid sequence of choice. See, for example, Beerli et al. (2002) Nat. Biotechnol. 20: 135-141; Pabo et al. (2001) Ann. Rev. Biochem. 70:313-340; Choo et al. (2000) Curr. Opin. Struct. Biol. 10:411- 416; and Doyon et al. (2008) Nat. Biotechnol. 26:702-708; Santiago et al. (2008) Proc. Natl. Acad. Sci. USA 105:5809-5814; Urnov, et al., (2010) Nat Rev Genet. 11(9):636-46; and Shukla, et al., (2009) Nature 459 (7245):437-41. An engineered zinc finger binding domain may have a novel binding specificity compared to a naturally occurring zinc finger protein. As an example, the algorithm of described in U.S. Pat. No. 6,453,242 may be used to design a zinc finger binding domain to target a preselected sequence. Nondegenerate recognition code tables may also be used to design a zinc finger binding domain to target a specific sequence (Sera et al. (2002) Biochemistry 41 :7074-7081). Tools for identifying potential target sites in DNA sequences and designing zinc finger binding domains may be used (Mandell et al. (2006) Nuc. Acid Res. 34:W516-W523; Sander et al. (2007) Nuc. Acid Res. 35:W599-W605).
[0165] An exemplary zinc finger DNA binding domain recognizes and binds a sequence having at least about 80% sequence identity with the desired target sequence. In other embodiments, the sequence identity may be about 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%. [0166] A zinc finger nuclease also includes a cleavage domain. The cleavage domain portion of the zinc finger nucleases may be obtained from any endonuclease or exonuclease. Non-limiting examples of endonucleases from which a cleavage domain may be derived include, but are not limited to, restriction endonucleases and homing endonucleases. See, for example, 2010-2011 Catalog, New England Biolabs, Beverly, Mass.; and Belfort et al. (1997) Nucleic Acids Res. 25:3379-3388. Additional enzymes that cleave DNA are known (e.g., SI Nuclease; mung bean nuclease; pancreatic DNase I; micrococcal nuclease; yeast HO endonuclease). One or more of these enzymes (or functional fragments thereof) may be used as a source of cleavage domains. [0167] In certain embodiments of the methods described herein the endogenous gene is modified by using “custom" meganucleases produced to modify plant genomes (see e.g., WO 2009/114321; Gao et al. (2010) Plant Journal 1 : 176-187). The term "meganuclease" generally refers to a naturally occurring homing endonuclease that binds double-stranded DNA at a recognition sequence that is greater than 12 base pairs and encompasses the corresponding intron insertion site. Naturally occurring meganucleases can be monomeric (e.g., I-Scel) or dimeric (e.g., I-Crel). The term meganuclease, as used herein, can be used to refer to monomeric meganucleases, dimeric meganucleases, or to the monomers which associate to form a dimeric meganuclease.
[0168] Naturally occurring meganucleases, for example, from the LAGLID ADG family, have been used to effectively promote site-specific genome modification in plants, yeast, Drosophila, mammalian cells and mice. Engineered meganucleases such as, for example, LIG-34 meganucleases, which recognize and cut a 22 basepair DNA sequence found in the genome of Zea mays (maize) are known (see e.g., US 20110113509).
[0169] In certain embodiments of the methods described herein the endogenous gene is modified by using TAL endonucleases (TALEN). TAL (transcription activator-like) effectors from plant pathogenic Xanthomonas are important virulence factors that act as transcriptional activators in the plant cell nucleus, where they directly bind to DNA via a central domain of tandem repeats. A transcription activator-like (TAL) effector-DNA modifying enzymes (TALE or TALEN) are also used to engineer genetic changes. See e.g., US20110145940, Boch et al., (2009), Science 326(5959): 1509-12. Fusions of TAL effectors to the FokI nuclease provide TALENs that bind and cleave DNA at specific locations. Target specificity is determined by developing customized amino acid repeats in the TAL effectors. [0170] In certain embodiments of the methods described herein the endogenous gene is modified by using base editing, such as an oligonucleobase-mediated system. In addition to the doublestrand break inducing agents, site-specific base conversions can also be achieved to engineer one or more nucleotide changes to create one or more EMEs described herein into the genome. These include for example, a site-specific base edit mediated by a OG to T*A or an A*T to G*C base editing deaminase enzymes (Gaudelli et al., Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage." Nature (2017); Nishida et al. “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.” Science 353 (6305) (2016); Komor et al. “Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage.” Nature 533 (7603) (2016):420-4. Catalytically dead dCas9 fused to a cytidine deaminase or an adenine deaminase protein becomes a specific base editor that can alter DNA bases without inducing a DNA break. Base editors convert C->T (or G->A on the opposite strand) or an adenine base editor that would convert adenine to inosine, resulting in an A->G change within an editing window specified by the gRNA.
[0171] The present disclosure further provides a method of generating plants (e.g., soybean plants or pea plants) producing seeds with an increased methionine content as compared to a control seed comprising crossing a first plant (e.g., soybean plant or pea plant) comprising a polynucleotide encoding a modified glycinin polypeptide described herein with a second plant (e.g., soybean plant or pea plant) comprising a polynucleotide encoding a modified glycinin polypeptide and/or at least one modification associated with increased glycinin, increased methionine, and/or increased protein, described herein, harvesting the seed produced thereby, and generating a progeny plant. In certain embodiments, the progeny plant comprises the polynucleotide encoding the modified glycinin polypeptide of the first plant (e.g., soybean plant or pea plant) and the polynucleotide encoding the modified glycinin polypeptide and/or the at least one modification associated with increased glycinin, increased methionine, and/or increased protein present in the second plant (e.g., soybean plant or pea plant). In certain embodiments, the modified glycinin polypeptide of the first plant and the second plant comprise the same amino acid sequence. In certain embodiments, the modified glycinin polypeptide of the first plant and the second plant comprise different amino acid sequences.
[0172] The at least one modification may be introduced into the first plant and the second plant using any method described herein or known in the art. In certain embodiments, one or more of the modifications of the first and/or second plant (e.g., soybean plant or pea plant) is introduced by genome editing. In certain embodiments, one or more of the modifications of the first and/or second plant (e.g., soybean plant or pea plant) is a native modification, such as for example, a high protein QTL.
[0173] In certain embodiments, the seed of the progeny plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methionine on a dry weight of seed basis, as compared to a seed from a control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide).
[0174] In certain embodiments, the seed of the progeny plant produced by the method comprises at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight basis. [0175] In certain embodiments, the seed of the progeny plant produced by the method comprises at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0176] In certain embodiments, the seed of the progeny plant produced by the method comprises a modified glycinin content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0177] In certain embodiments, at least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the progeny plant produced by the method comprises a modified glycinin polypeptide described herein and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising the modified glycinin polypeptide) on a dry weight of seed basis.
[0178] In certain embodiments, the method generates progeny plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the introduced mutations.
[0179] Further provided are methods of plant breeding comprising crossing any of the plants (e.g., soybean plant or pea plant) described herein with a second plant to produce a progeny seed comprising a polynucleotide encoding a modified glycinin polypeptide described herein and optionally further comprising at least one modification associated with increased glycinin, increased methionine, and/or increased protein, described herein. In certain embodiments, a plant is produced from the progeny seed.
B. Method for Producing and Using a Protein Composition
[0180] Also provided herein are methods comprising producing a protein composition from the plants comprising a polynucleotide encoding a modified glycinin polypeptide and/or plants comprising a polynucleotide encoding a modified glycinin polypeptide generated from the methods described herein.
[0181] In certain embodiments, the methionine content in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
[0182] In certain embodiments, the protein composition (e g., soy protein composition or pea protein composition) has an essential amino acid content of at least 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, or 85% and less than 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, or 45%. In certain embodiments, the sum of the methionine and tryptophan in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein. In certain embodiments, the sum of the methionine, lysine, threonine and tryptophan in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
[0183] In certain embodiments, the protein composition is an animal feed composition. In certain embodiments, the protein composition is a human food composition. In certain embodiments, the human food composition is a composition selected from the group consisting of legume meal; soy flour; defatted soy flour; soymilk; spray-dried soymilk; soy protein concentrate; texturized soy protein concentrate; hydrolyzed soy protein; soy protein isolate; spray-dried tofu; soy meat analog; soy cheese analog; and soy coffee creamer.
[0184] Also provided are methods for feeding animals comprising administering to an animal a feed comprising any of the protein compositions (e.g., soy protein composition or pea protein composition) described herein. In certain embodiments, the animal is a chicken or a pig. In certain embodiments, the feeding does not require a synthetic or manufactured amino acid supplement to maintain animal growth as compared with a control protein composition (e.g., soy protein composition or pea protein composition) from comparable plants (e.g., soybean or pea not comprising the modified glycinin polypeptide). In certain embodiments, the animal gains weight at a similar rate as a control animal under the same feeding regimen except that the control animal receives a feed comprising a protein composition (e.g., soy protein composition or pea protein composition) produced from commodity non-modified soybeans and receives supplementary essential amino acids in an amount sufficient to optimize weight gain in the animal.
[0185] Further provided is a method of producing a protein composition (e.g., soy protein composition or pea protein composition), the method comprising crushing the seed or seeds of the plants described herein and extracting protein (e.g., soy protein or pea protein) from the crushed seed to form the protein composition (e.g., soy protein composition or pea protein composition). In certain embodiments, the methionine content in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein. In certain embodiments, the protein composition (e.g., soy protein composition or pea protein composition) has an essential amino acid content of at least 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, or 85% and less than 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, or 45%. In certain embodiments, the sum of the methionine and tryptophan in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein. In certain embodiments, the sum of the methionine, lysine, threonine and tryptophan in the protein composition (e.g., soy protein composition or pea protein composition) is greater than about 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, or 200 mg/g protein and less than 500, 400, 300, 250, 200, 150, 100, 75, 50, or 40 mg/g protein.
C. Methods for Generating High-Methionine Variants
[0186] Also provided herein are methods for generating or producing high-methionine seed storage protein variants and the high-methionine seed storage proteins so made. The method comprises generating an in silico population of high-methionine seed storage protein (e.g., glycinin) variants by inputting the 3D structural coordinates and/or the primary amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage protein 3D structure and/or sequential information, calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silico population, and selecting from the in silico population one or more candidate high-methionine seed storage protein variants having (i) a predicted solubility score that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of a predicted solubility score for the candidate seed storage polypeptide, (ii) a predicted stability score that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of a predicted stability score for the candidate seed storage polypeptide, or (iii) a predicted aggregation propensity score that is less than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 20%, or 10% of a predicted aggregation propensity score for the candidate seed storage polypeptide. [0187] As used herein, a high-methionine variant refers to a polypeptide comprising an amino acid sequence having at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more total methionine residues. In certain embodiments, the high- methionine variant comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more and fewer than 150, 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42,
41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 total methionine residues. In certain embodiments, the high-methionine variant comprises an additional 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45,
44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 methionine residues as compared to a control polypeptide (e.g., candidate seed storage polypeptide).
[0188] The seed storage protein for use in the method is not particularly limited and may be any seed storage protein in which increased methionine content is desired. In certain embodiments, the seed storage protein is a globulin protein, such as, for example a 7S globulin or an 1 IS globulin. In certain embodiments, the seed storage protein is a glycinin, such as, for example, GY1, GY2, GY3, GY4, GY5, GY6, GY7, G8. In certain embodiments, the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to any one of SEQ ID NOs: 4 and 18-86. In certain embodiments, the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to amino acid positions 20-495 of any one of SEQ ID NOs: 18-86.
[0189] The 3D structural coordinates of a candidate seed storage polypeptide (e.g., glycinin) can be generated and/or selected using any method known in the art. In certain embodiments, the 3D structural coordinates are structural coordinates that have been previously determined and disclosed in the art, such as those provided in the protein data bank. In certain embodiments, the 3D structural coordinates of the candidate seed storage protein are experimentally determined using structural biology techniques including, but not limited to, protein crystallography. In certain embodiments, the 3D structural coordinates in the methods for generating high- methionine variants described herein are predicted and/or modeled using in silico methods. In certain embodiments, the in silico method uses AlphaFold2 version 2.3.1 in the monomer/multimer mode with relaxed model prediction. The reference database used for model generation in the methods described herein may be any reference database known in the art including, but not limited to, UniRef90 (accessible on the internet at uniprot.org), BFD (accessible on the internet at bfd.mmseqs.com), MGnify (accessible on the internet at ebi.ac.uk/metagenomics), PDB70, UniRefBO (accessible on the internet at uniclust.mmseqs.com), PDB seqres (accessible on the internet at wwpdb.org), UniProt (accessible on the internet at uniprot.org), and PDB (accessible on the internet at wwpdb.org). In certain embodiments, UniRefPO is used as the reference database.
[0190] Any suitable Al model may be used in the in the methods described herein. In certain embodiments, the Al model is generative model, also referred to as generative Al, a large language model, a machine learning model. Types of machine learning models include without limitation statistical models, such as probability models, regression models, and those involving deep learning, such as supervised, self-supervised, unsupervised models, and reinforcement learning, or combinations thereof. In certain embodiments, the machine learning model is a classification model, a regression model, a clustering model, a dimensionality reduction model, a distribution model, for example, a multivariate or univariate Gaussian distribution model, or a deep learning model. In certain embodiments, the deep learning model is part of an ensemble model. In certain embodiments, the deep learning model is an ensemble model comprising two or more models. In some embodiments, the deep learning model is a supervised learning model. The supervised learning model may be a classification or regression model. The machine learning models include support vector machines, neural networks, such as SVM-DA (Support Vector machines) or ANN (Artificial Neural Networks), or deep learning algorithms and the like. In certain embodiments, the machine learning model comprises a deep learning model, such as, for example ProteinMPNN.
[0191] In certain embodiments, the method for generating or producing high-methionine seed storage protein variants further comprises expressing one or more (e.g., 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 500, 1000 or more) of the selected candidate high-methionine seed storage protein variants in a model organism, determining the solubility, the stability, or a combination thereof of the one or more candidate high-methionine seed storage protein variants in the model organism, and selecting high-methionine seed storage variants having solubility and/or stability in the model organism. In certain embodiments, candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of an experimentally determined solubility and/or stability score in the model organism for the candidate seed storage polypeptide. In certain embodiments, candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the predicted solubility and/or stability score for the candidate seed storage polypeptide.
[0192] Any suitable model organism in which the generated variant polypeptides can be expressed may be used in the in the methods described herein. In certain embodiments, the model organism is a plant model organism, such as, for example Arabidopsis, N Benthamiana or a legume. In certain embodiments, the model organism is a cell culture system, including, but not limited to, plant cell culture, insect cell culture (e.g., Sf9 and Sf21 cells), bacterial cell culture (e.g., E. coll), yeast cell culture, or mammalian cell culture. In certain embodiments, the model organism is E. coll.
[0193] Also provided is a method for increasing seed methionine content in a plant, comprising expressing in a plant one or more of the high-methionine seed storage protein variants selected in methods described herein. In certain embodiments, the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, or 350% and less than about a 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in total methionine on a dry weight of seed basis, as compared to a seed from a control plant (e.g., seed from a plant not comprising the high-methionine seed storage protein variant). In certain embodiments, the seed of the plant produced by the method comprises at least or at least about 0.7%, 0.8%, 0.9%, 1%, 1.1%, 1.2%, 1.3%, 1.4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, 2%, 2.1%, 2.2%, 2.3%, 2.4%, 2.5%, 2.6%, 2.7%, 2.8%, 2.9%, or 3% and less than or less than about 4%, 3.9%, 3.8%, 3.7%, 3.6%, 3.5%, 3.4%, 3.3%, 3.2%, 3.1%, 3%, 2.9%, 2.8%, 2.7%, 2.6%, 2.5%, 2.4%, 2.3%, 2.2%, 2.1%, or 2% total methionine on a dry weight basis. In certain embodiments, the seed of the plant produced by the method comprises at least or at least about a l, 1 .5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the high-methionine seed storage protein variant) on a dry weight of seed basis.
[0194] In certain embodiments, the seed of the plant produced by the method comprises a high- methionine seed storage protein variant content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the high-methionine seed storage protein variant) on a dry weight of seed basis.
[0195] In certain embodiments, at least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the plant produced by the method comprises the high-methionine seed storage protein variant and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising high- methionine seed storage protein variant) on a dry weight of seed basis.
[0196] In certain embodiments, the method generates plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the high-methionine seed storage protein variant.
Methods for Generating High-Essential Amino Acid Variants
[0197] Also provided herein are methods for generating or producing high-essential amino acid (e.g., arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine) seed storage protein variants, and the high-essential amino seed storage protein variants so made. The method comprises generating an in silica population of high- essential amino acid seed storage protein (e.g., glycinin) variants by inputting the 3D structural coordinates and/or the primary amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage protein 3D structure and/or sequential information, calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silica population, and selecting from the in silica population one or more candidate high-essential amino acid seed storage protein variants having (i) a predicted solubility score that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of a predicted solubility score for the candidate seed storage polypeptide, (ii) a predicted stability score that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of a predicted stability score for the candidate seed storage polypeptide, or (iii) a predicted aggregation propensity score that is less than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 20%, or 10% of a predicted aggregation propensity score for the candidate seed storage polypeptide. [0198] As used herein, a high-essential amino acid variant refers to a polypeptide comprising an amino acid sequence having at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more total arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, or valine residues, or any combination thereof. In certain embodiments, the high-methionine variant comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more and fewer than 150, 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 total arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, or valine residues, or any combination thereof. In certain embodiments, the high-essential amino acid variant comprises an additional 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, or valine residues, or any combination thereof as compared to a control polypeptide (e.g., candidate seed storage polypeptide). In certain embodiments, the high-essential amino acid variant comprises an amino acid sequence having at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more total methionine, lysine, tryptophan or threonine residues, or any combination thereof. In certain embodiments, the high-methionine variant comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, 60, 75 or 100 or more and fewer than 150, 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 total methionine, lysine, tryptophan or threonine residues, or any combination thereof. In certain embodiments, the high- essential amino acid variant comprises an additional 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 30, 35, 40, 45, 50, 55, or 60 or more and fewer than 100, 90, 80, 70, 60, 55, 50, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30 methionine, lysine, tryptophan or threonine residues, or any combination thereof as compared to a control polypeptide (e.g., candidate seed storage polypeptide).
[0199] The seed storage protein for use in the method is not particularly limited and may be any seed storage protein in which increased essential amino acid content is desired. In certain embodiments, the seed storage protein is a globulin protein, such as, for example a 7S globulin or an 1 IS globulin. In certain embodiments, the seed storage protein is a glycinin, such as, for example, GY1, GY2, GY3, GY4, GY5, GY6, GY7, G8. In certain embodiments, the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to any one of SEQ ID NOs: 4 and 18-86. In certain embodiments, the candidate seed storage protein comprises an amino acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% to amino acid positions 20-495 of any one of SEQ ID NOs: 18-86.
[0200] The 3D structural coordinates of a candidate seed storage polypeptide (e.g., glycinin) can be generated and/or selected using any method known in the art or described herein.
Additionally, any suitable Al model known in the art or described herein may be used in the in the method.
[0201] In certain embodiments, the method for generating or producing high-essential amino acid seed storage protein variants further comprises expressing one or more (e.g., 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 500, 1000 or more) of the selected candidate high-essential amino acid seed storage protein variants in a model organism, determining the solubility, the stability, or a combination thereof of the one or more candidate high-essential amino acid seed storage protein variants in the model organism, and selecting high-essential amino acid seed storage variants having solubility and/or stability in the model organism. In certain embodiments, candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of an experimentally determined solubility and/or stability score in the model organism for the candidate seed storage polypeptide. In certain embodiments, candidates are selected that have an experimentally determined solubility score, stability score, or both, in the model organism that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the predicted solubility and/or stability score for the candidate seed storage polypeptide.
[0202] Any suitable model organism known in the art or described herein in which the generated variant polypeptides can be expressed may be used in the in the method.
[0203] Also provided is a method for increasing seed essential amino acid content in a plant, comprising expressing in a plant one or more of the high-essential amino acid seed storage protein variants selected in methods described herein. In certain embodiments, the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, 350%, or 400% and less than about a 750%, 700%, 650%, 600%, 550%, 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in essential amino acid on a dry weight of seed basis, as compared to a seed from a control plant (e.g., seed from a plant not comprising the high- methionine seed storage protein variant). In certain embodiments, the seed of the plant produced by the method comprises at least or at least about a 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, 22%, 24%, 26%, 28%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 200%, 250%, 300%, 350%, or 400% and less than about a 750%, 700%, 650%, 600%, 550%, 500%, 450%, 400%, 350%, 300%, 250%, 200%, 150%, 140%, 130%, 120%, 110%, 100%, 90%, 80%, 70%, 60%, or 50% increase in methionine, lysine, tryptophan, threonine, or any combination thereof on a dry weight of seed basis, as compared to a seed from a control plant (e.g., seed from a plant not comprising the high-methionine seed storage protein variant). In certain embodiments, the seed of the plant produced by the method comprises at least or at least about a 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 10, or 15 and less than 30, 25, 20, 15, 10, 9, 8, 7, 6, or 5 percentage point increase in total protein measured on a dry weight basis, as compared to a seed from a corresponding control plant (e.g., a plant not comprising the high-methionine seed storage protein variant) on a dry weight of seed basis.
[0204] In certain embodiments, the seed of the plant produced by the method comprises a high- essential amino acid seed storage protein variant content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the glycinin content of a seed from a corresponding control plant (e.g., wild-type glycinin from a plant not comprising the high-essential amino acid seed storage protein variant) on a dry weight of seed basis.
[0205] In certain embodiments, at least or at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 12%, 14%, 16%, 18%, 20%, and less than about 30%, 28%, 26%, 24%, 22% or 20% of the total protein in the seed of the plant produced by the method comprises the high-essential amino acid seed storage protein variant and the seed comprises a total protein content that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to a seed from a corresponding control plant (e.g., seed from a plant not comprising high-essential amino acid seed storage protein variant) on a dry weight of seed basis.
[0206] In certain embodiments, the method generates plants having a yield that is greater than or within 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 15%, or 20% as compared to the corresponding control plant, for example, one which has a similar genetic background but lacks the high-essential amino acid seed storage protein variant.
[0207] The following are examples of specific embodiments of some aspects of the invention. The examples are offered for illustrative purposes only and are not intended to limit the scope of the invention in any way.
EXAMPLE 1
[0208] This example demonstrates the design of high-methionine glycinin variants.
[0209] The crystal structure of the proglycinin 1 homotrimer revealed that each monomer of the trimer contains two beta-barrels (Adachi et al., 2001, J Mol Biol 305:291-305). Using the three- dimensional structure, a set of four soybean GY1 variants was designed which contain 43, 35, 25, or 17 met substitutions (subs) mostly inside of the beta-barrels (Set 1 variants of Table 2) to avoid altering the trimeric/hexameric packing integrity and general surface properties of the protein. Although most of these substitutions were from charged or otherwise hydrophilic amino acids to more hydrophobic methionine, multiple methionine residues with linear (non-branched) hydrophobic side chains clustered together inside of the beta-barrels were predicted to attract each other and pack well with themselves and/or with other hydrophobic residues in the core due to their flexibility, and therefore may not be destabilizing. GY1 ALT1 was the most modified variant of Set 1, with 43 methionine substitutions. GY1_ALT2 contained 35 methionine substitutions, GY1 ALT3 contained 25 methionine substitutions, and GY1 ALT4 was the least modified variant, with only 17 methionine substitutions.
[0210] GY1_ALT4 design was based on a specific glycinin structural feature. Glycinin belongs to a large superfamily with a conserved structural core of double-stranded beta helix barrel (DSBH) including catalytic enzymes such as metal -dependent oxalate decarboxylase, mannose- 6-phosphate isomerase, a-ketoglutarate dehydrogenase, and non-enzymatic seed storage proteins as well. For the enzymatic members, the active-site is located inside the barrel and consists of the cofactor metal cation chelating and substrate binding residues. Seed protein glycinin does not have the catalytic function but many of the corresponding pseudo catalytic residues inside the DSBH barrel remain hydrophilic/apolar in nature. In GY1 ALT4, all these pseudo catalytic residues (total 17) are chosen for methionine substitution.
[0211] GY1 ALT4 was then used as a starting scaffold to further engineer an additional 27 high- met variants (Set 2 variants of Table 2). GY1_ALT4 was chosen as the scaffold for Set 2 because it was more soluble and more stable than the wild-type protein, following expression as the proglycinin form in E. coli, as described in Example 2.
[0212] The GY1_ALT4_5 variant of Set 2 was then used as a starting scaffold to further engineer an additional 20 high-met variants (Set 3 variants of Table 2). GY1 ALT4 5 was chosen as the scaffold for Set 3 because of its solubility, observed following expression as the proglycinin form in E. coli, as described in Example 2. Some of the Set 3 variants contained some methionine substitutions located outside of the beta-barrels. For example, GY1 ALT4 38 had methionine substitutions at the barrel -barrel interface. GY1 ALT4 39 had some methionine substitutions in the trimeric interface, sometimes referred to as the “donut hole” at the center of the trimer. GY1 ALT4 40 and GY1 ALT4 47 had methionine substitutions in a surface loop on the face of the trimer opposite to the face involved in hexamer formation. GY1 ALT4 41 had numerous methionine substitutions in a flexible loop. GY1 ALT4 44 had methionine substitutions at three surface hydrophobic amino acids, one of which was also included in ALT4_47.
[0213] As an alternative to rational design, an artificial intelligence (Al) approach with the ProteinMPNN program (Dauparas et al., 2022, Science 378:49-56) with greedy search algorithm was used to further engineer high-met variants (Set 4 and Set 5 variants of Table 2). GY 1_ALT4 was used as the scaffold for the Set 4 and Set 5 variants. ProteinMPNN is a message passing neural network (MPNN) with 3 encoder and 3 decoder layers (128 hidden dimensions) which predicts protein sequences in an agnostic autoregressive manner by using backbone features of an input protein 3D structure. This neural network was trained with diverse protein structures and sequences in PDB and CATH. During the inference process, ProteinMPNN calculates the per-residue probability of amino acid by using encoded geometrical information of an input protein 3D structure and sequential information of neighbor residues. Designed sequences based on this per-residue probability of amino acid, sequences having low ProteinMPNN score which is a negative average log per-residue probability of amino acid, indicate the formation of a molecular structure identical to input structure and appropriate properties such as stability and solubility in E. coli.
[0214] To find optimal combinations of methionine (or other amino acids such as, for example, cysteine, lysine, tryptophan, threonine) substitutions by ProteinMPNN with greedy search algorithm, the 3D structure model of the GY1 ALT4 proglycinin trimer was generated by AlphaFold2-multimer and used for encoding geometrical information by ProteinMPNN. Then the ProteinMPNN scores of all cases of single methionine substitution of the GY1_ALT4 sequence (initial scaffold sequence) were calculated, and the best case, which lower the ProteinMPNN score the most, was chosen for introducing the methionine substitution. This new sequence was used as a scaffold sequence for the next round of the greedy search, finding a best single methionine substitution case. This greedy search (search and substitution process) was iteratively performed until the targeted number of methionine was introduced. For encoding geometrical information and calculating the ProteinMPNN score, original parameters of ProteinMPNN were used (v_48_020, version with 48 edges and 0.20 A training noise), and same sequential changes were introduced to all monomers of trimer to maintain the symmetry. Based on structural and knowledge-based investigation, residues showing a poor structural model quality (large loops those are longer than 5 amino acid length and indicate pLDDT lower than 70), those possibly involved in trimer-trimer interactions to form the hexamer, and those methionine substitutions that may alter trimer stability were omitted in the greedy search process. 100 trials for introducing 17 additional methionine substitutions and another 100 trials for introducing 27 additional methionine substitutions into GY1 ALT4 were performed and ranked based on ProteinMPNN score of the final sequence.
[0215] After ranking variants, Aggrescan3D (Kuriata et al., 2019, Nucleic Acids Research 47.WEW300-W307) and ESMFold-based in silico melting (Hermosilla et al., 2023, biorxiv 2023.06.06.543955) were used as supplementary metrics to avoid the methionine substitutions which drastically decrease the stability and solubility. Briefly, Aggrescan3D is calculating the aggregation propensity of each residue based on physicochemical properties of amino acid and structural information. ESMFold-based in silico melting predicts the stability of protein by evaluating the distribution of favorable contacts. Variants indicating maximal Aggrescan3D score value higher than 2.5 or inflection point of ESMFold-based in silico melting curve lower (Succso) than 0.4454 (observed value from experimentally verified negative variant) were removed.
[0216] Table 3 indicated top-ranked variants from the final ranked list of 17 methionine substitutions (GY1 AI 1-8) and 27 methionine substitutions (GY1 AI 9-18).
[0217] Structural models were generated for the glycinin variants using AlphaFold 2 (version 2.3.1) and compared to the structural model for wild-type GY1 (Fig. 7A). The TM-scores to measure structural similarity of the glycinin variants to the wild-type GY1 can be found in Table 2.
[0218] This method can be applied to other proteins as well as the glycinin family for altering the amino acid composition while maintaining or improving stability and solubility. The substitutions presented in Table 1 could be made at the corresponding positions of other glycinin family members such as GY2, GY3, GY4, GY5, GY7 (Figs. 5A-5B), or in other cupin superfamily members with similar protein folds, such as the conglycinin family. Methods as described in Example 1 (e.g., ProteinMPNN) could be used to increase the percentage of other nutritionally essential amino acids, such as tryptophan, lysine, and threonine.
Table 2: Description of amino acid substitutions (subs) in soybean GY1 high-met variants.
*Numbering of the methionine substitutions is according to the proglycinin 1 sequence (SEQ ID NO: 4) * Addition of 19 to these position numbers will give the corresponding positions in the full-length preproglycinin sequences that are provided in parentheses for each variant
Table 3: Computational data of soybean GY1 high-met variants designed by Al approach.
EXAMPLE 2
[0219] This example demonstrates the solubility of high-methionine proglycinin 1 variants expressed in E. coli.
[0220] To determine whether the multiple methionine substitutions affected structure or stability of the modified glycinin variants, the solubility of the high-methionine variants of Table 2 was determined following expression in E. coli.
[0221] The coding sequences (CDS) of the GY1 variants of Table 2 were expressed in the proglycinin form in E. coli strain BL21-CodonPlus (DE3) RIPL(Agilent Technologies). The public GY1 CDS sequence for soybean cultivar Williams was obtained from Soybase and was identical to the CDS of NCBI accession M36686. The 19-amino acid signal peptide was excluded, a non-native methionine (F1M) served as the start codon, and a C-terminal fusion comprising a five-glycine linker and a six-histidine tag was included to facilitate purification by immobilized metal affinity chromatography. These proteins were not expected to form glycinin hexamers because E. coli lacks the soybean vacuolar processing enzyme that cleaves proglycinin to allow hexameric glycinin formation. Cultures were grown with the appropriate antibiotics in 2xYT medium at 37°C until an absorbance at 600 nm of about 0.4 to 0.6 was reached.
Expression was then induced with 0.1 mM IPTG, and the cultures were grown overnight at 18°C prior to harvest by centrifugation.
[0222] The uninduced and induced proteins from equal absorbances of cultures expressing Set 1 variants are presented in Figs. 1 A-1E. Induced protein was apparent for all the Set 1 variants along with the wild-type (WT) protein. As the methionine content increased, the mobility in SDS-PAGE increased, such that the WT protein ran above the 50 kD marker, while the GY1 ALT1 protein, with the greatest methionine content, ran with the 50 kD marker. These results demonstrated that large amounts of His-tagged high-met proglycinin variants could be expressed with this E. coli system.
[0223] To determine soluble and insoluble protein in a high-throughput manner, 150 pl volumes of induced cultures were harvested by centrifugation at 5,000g for 10 min, and the pellets were frozen at -80°C. The thawed pellets were lysed overnight with inverted mixing at room temperature in 150 pl of 50 mM Tris-HCl pH 8, 50 mM NaCl, 0.1 mg/ml lysozyme, with 1 TOO HALT protease inhibitor cocktail (lOOx) (Thermo Scientific) and 0.01 pl Benzonase nuclease (Novagen). Centrifugation was done for 10 min at 5,000 g, and the supernatant was considered to be the soluble protein fraction. The insoluble pellet was resuspended in 5% SDS with pipetting, heated at 95°C for 5 min, and pipetted again if needed for complete resuspension. Three pl each of soluble and insoluble protein were assessed by SDS-PAGE.
[0224] The solubility of high-methionine proglycinin variants is presented in Figs. 2A-2L. For Set 1 variants, some soluble protein was evident with WT, and a greater amount of soluble protein was evident with the GY1 ALT4 variant. The GY1 ALT1, GY1 ALT2, and GY1_ALT3 variants were insoluble. For Set 2 variants, the GY1_ALT4_5 variant was the only variant with significant amounts of soluble protein. Small amounts of soluble protein were also noticed with GY1 ALT4 9 and GY1 ALT4 16. For Set 2 variants, soluble protein was observed for GY1_ALT4_29, GY1_ALT4_39, GY1_ALT4_4O, GY1_ALT4_42, GY1 ALT4 43, GY1_ALT4_44, and GY1_ALT4_47. GY1_ALT4_42 and GY1_ALT4_43 methionine substitutions were subsets of GY1 ALT4 29 methionine substitutions, and all these variants had some soluble protein. Also, GY1 ALT4 29 methionine substitutions were a combination of GY1 ALT4 5 and GY1 ALT4 16 methionine substitutions, and all these variants had some soluble protein. The Set 1, 2 and 3 results of Figs. 2A-2L demonstrated introducing numerous methionine substitutions in proglycinin while still retaining sufficient solubility to allow purification of the proteins.
[0225] For Set 4 variants, GY1 AI 1, GY1 AI 4, GY1 AI 5, GY1 AI 7, and GY1 AI 8 had substantial amounts of soluble protein evident, and GY1 AI 3 and GY1 AI 6 also had small amounts of soluble protein visible. For Set 5 variants, GY1 AI 9, GY1 AI 11, GY1 AI 12, GY1 AI 14, GY1 AI 15, GY1 AI 16, GY1 AI 17, and GY1 AI 18 had soluble protein, and GY1 AI 10 also had detectable, though less abundant, soluble protein. These results showed that the Al approach to add further met subs to the GY1 ALT4 scaffold was effective in providing soluble protein from E. coll.
EXAMPLE 3
[0226] This example demonstrates the purification of high-met proglycinin variants expressed in E. coll, and their stability against unfolding by a chemical denaturant.
[0227] Several of the high-met proglycinin proteins marked with asterisks in Figs. 1A-1E and 2A-2L were purified by immobilized metal affinity chromatography. Induced 800-ml cultures were harvested and frozen. Thawed pellets were lysed in 8 ml of 50 mM Tris-HCl pH 8, 50 mM NaCl, 0.1 mg/ml lysozyme, 15 mM imidazole, 80 pl of lOOx HALT protease inhibitor cocktail (100X), and 0.5 pl Benzonase nuclease. Sonication, in addition to the inclusion of lysozyme, was done to achieve more complete lysis. The lysate was centrifuged for 10 min at 10,000 g, and the supernatant was filtered through one layer of Miracloth and then slowly run through a 2 ml column of HisPur Cobalt Superflow Agarose (Thermo Scientific #25229) equilibrated with 50 mM Tris-HCl pH 8, 50 mM NaCl, 15 mM imidazole. The column was washed with 10 ml of the same equilibration buffer to remove all the non-bound protein. The column was then washed with 10 ml of 50 mM Tris-HCl pH 8, 50 mM NaCl, 30 mM imidazole, and eluted with 8 ml of 50 mM Tris-HCl pH 8, 50 mM NaCl, 150 mM imidazole. The purified proteins were concentrated by ultrafiltration. Buffer was changed to the storage buffer comprising 20 mM Tris- HCl pH 8, 300 mM NaCl, and 10% glycerol. The purified proteins were quantitated by absorbance at 205 nm, using an extinction coefficent for 1 mg/ml of 31. The proteins were aliquoted and quickly frozen in liquid nitrogen prior to long-term storage at -80°C.
[0228] To determine whether the methionine substitutions affected the stability of the high- methionine proglycinin variants, the chemical denaturant guanidine hydrochloride (GuHCl) was used to unfold the purified proteins. Unfolding was monitored by the decrease in intrinsic fluorescence intensity at 323 nm, following excitation at 280 nm. Assays were done at 25°C in 200 pl volumes in 96-well UV-transparent plates with 1 pM protein concentrations in 20 mM Tris-HCl pH 8, 300 mM NaCl. Three replicates were performed, with each replicate being the mean of three technical replicates. [0229] The stability data for high-met proglycinin variants from Sets 1, 2, and 3 are presented in Fig. 3A. More denaturant was required to completely unfold all the variants (about 5M) compared with WT (about 3.5 M). At lower denaturant concentrations, however, changes in fluorescence intensity were slightly more pronounced for some variants compared with WT. Considering only the 2 M GuHCl data, the ranking from more stable to less stable appeared to be GY1 ALT4 > WT > GY1 ALT4 47 = GY1 ALT4 40 > GY1 ALT4 29 = GY1 ALT4 5 > GY1 ALT4 39. The GY1 ALT4 variant appeared to be more stable than WT across denaturant concentrations. These results showed the introduction of multiple methionine residues simultaneously in proglycinin without greatly destabilizing the protein.
[0230] The stability data for high-met proglycinin variants from Set 4 are presented in Fig. 3B. For Set 4, all variants examined were substantially more stable than WT. The stability data for high-met proglycinin variants from Set 5 are presented in Fig. 3C. Two of the five variants examined, GY1 AI 9 and GY1 AI 14, had greater stability than WT. Two other Set 5 variants, GY1 AI 12 and GY1 AI 17, had slightly greater conformational change than WT at 2 M GuHCl, but appeared to be more stable than WT at higher denaturant concentrations. The GY1_AI_11 variant was the least stable of the Set 5 variants. The results of the Set 4 and Set 5 variants demonstrated that the Protein MPNN method was an effective way to add further methionine substitutions to the GY1 ALT4 scaffold, without destabilizing the protein.
EXAMPLE 4
[0231] This example demonstrates the stability of high-met proglycinin variants against digestion by trypsin.
[0232] The structural integrity of engineered proteins is commonly assessed by determining their stability against digestion by proteases (Tsuboyama et al, 2023, Nature 620: 434-444). A correctly and compactly folded protein is expected to be more resistant to digestion compared with a misfolded or less compactly folded variant that has more exposed cleavage sites accessible to the protease. The purified high-methionine proglycinin variants were digested by the protease trypsin, which cleaves after exposed lysine and arginine residues. The digests were done at 25°C for the indicated times with a 1 :500 wt:wt ratio of trypsin :proglycinin variant. At the indicated times, 13 pl aliquots containing 3 pg substrate protein and 0.006 pg trypsin were moved from the digest to a stop solution. The digests were analyzed by SDS-PAGE. The 0- minute control lanes in Figs. 4A-4P contained only the proglycinin variant, without trypsin. [0233] The trypsin digests of the high-methionine proglycinin variants are presented in Figs. 4A- 4P. Similar digestion patterns and stabilities as WT were observed for the GY1 ALT4 variant and for all of the Al-designed variants tested, including GY1 AI 4, GY1 AI 5, GY1 AI 7, GY1 AI 8, GY1 AI 9, GY1 AI 11, GY1 AI 12, GY1 AI 14, and GY1 AI 17, with some intact protein still visible at 15 m, but either barely visible or no visible intact protein at 30 m, and similar banding patterns as WT. For the other rationally -designed variants GY1 ALT4 5, GY1 ALT4 29, GY1 ALT4 39, GY1 ALT4 40, and GY1 ALT4 47, there appeared to be slightly less intact protein visible at 15 m and/or a more complex banding pattern compared with WT, suggesting slightly less stability, but no variants appeared to be greatly destabilized. Although the proglycinin variants of Figs. 4A-4P did not contain identical arginine + lysine residues (51 for WT, 45 to 47 for the rest), these results still demonstrated that the numerous methionine substitutions in the proglycinin variants were not greatly destabilizing.
EXAMPLE 5
[0234] This example demonstrates the expression of high-methionine glycinin variants in soybean.
[0235] Expression vectors were constructed to express the CDS of high-methionine glycinin variants as transgenes in soybeans, with or without increased methionine biosynthesis or decreased methionine catabolism (Table 4). The full preproglycinin sequences were used, including signal peptides but omitting introns. The seed-specific promoter and the terminator from the soybean Gyl gene (SEQ ID Nos: 232 and 233) were used to control expression of the glycinin variants. To increase methionine biosynthesis, some constructs also expressed a GM- CGS1 (Glyma.09g235400) variant CDS (SEQ ID NO: 231) that contained a 78-amino acid deletion from K66 to S143 inclusive, to remove part of the regulatory region to reduce feedback regulation. The promoter from a soybean ubiquitin gene, GmUBQ (Seq ID No: 234), and the terminator from the phaseolin gene of Phaseolus vulgaris (SEQ ID NO: 235) were used to control GM-CGS1 (78 aa del) expression. Constructs for expression of high-methionine glycinin variants were also transformed into soybeans that were deficient in GM-MGL 3
(Glyma.lOGl 72700) activity to reduce methionine catabolism. Gene editing was used to create a frameshift mutation that knocked out the GM-MGL 3 gene. Standard Agrobacterium transformation methods were used to transform soybeans with the expression vectors. [0236] As an alternative to transgenic constructs, gene editing could be used to replace the Gyl gene with either the CDS or the gene (including introns) encoding high-methionine glycininl variants. Alternative ways of increasing CGS activity are also envisioned, including increasing the expression of wild-type GM-CGS1 or GM-CGS2 (Glyma.l8g261600), or increasing expression via insertion of small sequence motifs into the GM-CGS promoter, as an alternative to replacing the native promoter with GmUBQ or with other strong promoters. Gene editing methods could be used to make the desired deletions in GM-CGS or to replace the native promoter with the GmUBQ promoter.
Table 4: Examples of constructs expressing high-met glycinin variants in soybeans, with or without increased met biosynthesis or decreased met catabolism
EXAMPLE 6
[0237] This example demonstrates the impact of expressing high-methionine glycinin variants on soybean seed methionine and cysteine content in greenhouse conditions.
[0238] For T1 seed analysis, 40 T1 seeds each from soybean transgenic events expressing high- methionine variants were genotyped. For each event, comprising a single plant, those seeds identified as homozygous were pooled, ground, and analyzed for total seed methionine content and cysteine content. Likewise, the null seeds for each event were pooled, ground, and analyzed. The results are presented in Table 5. Overexpression of GY1 ALT4 without an increased methionine source (Construct 1) resulted in an average increase of 7.67% in seed total methionine (free methionine plus proteogenic methionine) content. At the construct level, comparing the four homozygous values versus the four null values across events, this increase in methionine content was statistically significant when analyzed with two-tailed T-tests, assuming equal variances and using a cutoff for the p value of 0.05. Overexpression of GY1_ALT4 combined with decreased methionine catabolism (Construct 3) also resulted in a statistically significant increase in methionine content of 7.1% when averaged across events. In contrast to the methionine increases observed with Constructs 1 and 3 which expressed a high- methionine variant that was soluble in E. coli, neither Construct 5 nor Construct 6, which expressed a high- methionine variant that was insoluble in E. coli, resulted in statistically significant methionine increases in T1 homozygous seed. Statistically significant cysteine decreases were observed for Construct 5, and there were trends for cysteine decreases for Constructs 1, 3, and 6 (Table 5). Collectively, these T1 seed results demonstrated increased soybean seed methionine content by overexpressing a high-met GY1 variant. These results also provided rationale for using the GY1 ALT4 protein as a scaffold for further increasing methionine content, as described in Table 2.
[0239] For T2 seed analysis, five events of Construct 1, one event of Construct 2, and one event of Construct 4 were analyzed (Table 6). Seed from three homozygous plants and three null plants per event were sampled for analysis, and the results are presented in Table 6. Three Construct 1 events had statistically significant met increases compared with the corresponding null values, and the other two events also trended higher for methionine content. These results corroborated the methionine increases observed with T1 seed for Construct 1 (Table 5). The single Construct 2 event and single Construct 4 event also had statistically significant increases in met content (Table 6). Two Construct 1 events and the single Construct 4 event had decreased cysteine content in T2 seeds, and there were trends for decreased cysteine content with the other Construct 1 events (Table 6). By contrast, the single event for Construct 2 had a non-significant increase in cysteine content (Table 6).
[0240] T1 and T2 seeds from transgenic plants homozygous for constructs 7-17 of Table 4 are tested for methionine content. Seeds from transgenic plants expressing one of construct 7-17 are expected to have increased methionine content as compared to controls.
Table 5: Effects of expressing hi h-methionine glycinin variants on T1 soybean seed methionine and cysteine content
Table 6: Effects of expressing high-methionine glycinin variants on T2 soybean seed methionine and cysteine content
EXAMPLE 7
[0241] This example demonstrates providing multiple copies of hi h-methionine glycinin variants to soybean.
[0242] The protein abundance of high-methionine glycinin variants could be further increased in soybeans by using gene editing methods to introduce multiple copies of the desired CDS or gene. For example, a high-methionine glycinin variant gene could be introduced at its native Gyl locus as well as at the nearby Gy2 locus, thus replacing both native genes. This approach would simultaneously remove two genes encoding low- methionine proteins while introducing two copies of a gene encoding a high- methionine protein. Alternatively, low- methionine beta- conglycinin genes could be knocked out and replaced with one or more copies of a high- methionine glycinin variant. In yet another approach to increase the copy number of high- methionine glycinin variants, the methionine subs in GY1 variants with acceptable solubility and stability could be made at the corresponding positions in other glycinin family members, such as GY2, GY3, GY4, GY5, and GY7. Alternatively, the GY1 variant replaces any or all of the glycinin family members, GY2, GY3, GY4, GY5, and GY7. Seeds expressing a high-methionine glycinin variant at one or more native glycinin loci have increased methionine on a seed dry weight basis compared to an unmodified or wild-type seed.
EXAMPLE 8
[0243] This example demonstrates gene editing to produce high-methionine glycinin variants. [0244] Glycinin and conglycinin are two major soybean storage proteins in soybean seeds (Table 7 and Table 8). In soybean seeds, P-conglycinin, the abundant 7S globulin storage protein, and glycinin consist of about 21% and 33% of total protein content, respectively (Utsumi et al., 1997). The genes encoding these storage proteins are used as gene editing targets for high-methionine glycinin variant over-expression. With a template-based genome editing technology, the native GY1, GY2, GY3, GY4, GY5 and all the conglycinin alpha, alpha’ or beta subunit genes can be replaced with the high-methionine glycinin variants at those soybean storage protein native loci.
Table 7: Expression profiling of glycinin 1 and other putative glycinin family members in soybean
Table 8: Expression level of 7 P-conglycinin isoforms in soybean seeds 30 or 50 days after flowering.
[0245] Example 8-1 : Replace the native GY1 with a high-met GY1 variant (GY1 ALT4) on the GY1 native locus.
[0246] With the CRISPR/Cas9 system, specific gRNAs (GM-GY-CR1, SEQ ID NO: 1; and GM-GY-CR3, SEQ ID NO: 2) to target the Glycinin 1 (GY1) gene (glyma.03G163500, SEQ ID NO: 3 for the nucleotide sequences, SEQ ID NO: 4 for the peptide sequences) were designed. GM-GY1-CR1 was designed to target a site near the beginning of the exon 1 of the pro-glycinin 1 protein. GM-GY1-CR3 was designed to target the beginning of the 3’ UTR of the glycinin 1 gene. The binary vector contained CR1/CR3 gRNA combinations and their corresponding donor DNA templates (SEQ ID NO: 5). The homology recombination (HR) fragments were used to flank the high-methionine GY 1 variant sequences to facilitate the homology-mediated recombination process. The CR1 or CR3 gRNA target sites were also used to flank the donor DNAs to enable them to be excised from the binary vectors for double strand break repair process. The binary vectors were introduced into soybean plants by Agrobacterhim -mediated soybean embryonic axis transformation. With site-specific integration of the donor DNA by homology-mediated double strand break DNA repair process, genome editing variants of the GY1 with high-methionine GY1 variant were created by replacing the entire GY1 coding sequences, including introns and exons. Molecular analyses of 844 TO plants, seven TO plants were identified with gene replacement with homologydependent repair at both sites (2xHDR variants). Plants are grown in the greenhouse. T1 seeds are harvested, and T1 planting is conducted to get homozygous T2 seeds with the high- methionine GY 1 gene replacement variants. [0247] T2 seed expressing the high-methionine glycinin variants Gyl Al 4 or Gyl ALT4 47 at the GY1 gene locus in a wild-type or the CGS edited background were analyzed for total methionine, total cysteine, and total protein content. The Gyl Al 4 gene replacement, paired with or without the CGS edit, resulted in significant increases in methionine (Table 9). There was a 38% increase in methionine with Gyl_AI_4 gene replacement on its own and a 51% increase in methionine when paired with the CGS edit. Total protein content was significantly increased with the Gyl_AI_4 edit + CGS edit and oil was decreased. There were no significant changes in cysteine content. The increases in Gyl_AI_4 protein were visualized by non-reducing SDS-PAGE and anti-glycinin immunoblot, including a decrease in the amount of wild-type glycinins (Figs. 10A-10B).
Table 9: Effects of replacing Gyl with Gyl Al 4 on soybean seed sulfur amino acid and protein contents from T2 seed of plants grown in the greenhouse n = five biological replicates for CGS edit + Gyl_AI_4 edit n = three biological replicates for the other genotypes
/r-values were determined by two-tailed t-tests assuming equal variances
[0248] The Gyl_ALT4_47 gene replacement, when paired with CGS edit, significantly increased total methionine with a 32% increase in one event and 47% increase in a second event (Table 10). Total cysteine levels were significantly increased the Gyl_ALT4_47 edit + CGS events. The CGS edit on its own showed increases in methionine and cysteine content as well. The protein and oil content was decreased for one Gyl ALT4 47 + CGS event but not the other. The expression of Gyl_ALT4_47 protein was visualized by non-reducing SDS-PAGE (Fig. 11).
Table 10: Effects of replacing Gyl with Gyl_ALT4_47 on soybean seed sulfur amino acid and protein contents from T2 seed of plants grown in the greenhouse n = seven biological replicates for CGS edit + Gyl_ALT4_47 edit n = three biological replicates for the other genotypes
/^-values were determined by two-tailed t-tests assuming equal variances
[0249] Example 8-2: Replace both GY1 and GY2 with a high-methionine GY1 variant (Gyl_ALT4_47 and Gyl_AI_4) on the GY1 and GY2 native loci.
[0250] To replace both the GY1 and GY2 gene cluster on soybean chromosome 3, GM-GY1- CR12 (SEQ ID NO: 6) was designed to target a site near the end of the exon 4 of the pro- glycinin2 protein (glyma03g32020, SEQ ID NO: 7 for nucleotide sequences, SEQ ID NO: 8 for peptide sequences). The same binary vector design was used to combine CR1/CR12 gRNA and their corresponding donor DNA templates (SEQ ID NO: 9 for the Gyl_ALT4_47 variant; SEQ ID NO: 10 for the Gyl_AI_4 variant). The binary vectors are introduced into soybean plants by Agrobacterium-mediated soybean embryonic axis transformation. With site-specific integration of the donor DNA by homology-mediated double strand break DNA repair process, genome editing variants of the GY1 with high-methionine GY1 variant will be created by replacing both the GY1 and GY2 genomic sequences. TO plants are generated and molecularly analyzed to identify gene replacement with homology-dependent repair at both sites (2xHDR variants). Plants are grown in the greenhouse. T1 seeds are harvested, and T1 planting is conducted to get homozygous T2 seeds with the high-methionine GY1 gene replacement variants. The high-methionine GY1 protein is quantified, and the methionine, cysteine, and total amino acid composition is analyzed to demonstrate the impact of the high-methionine GY1 variant as a replacement of the native GY1 and GY2 protein in soybean seeds. Seeds have measurable increases in total methionine without a significant loss in total protein content.
[0251] Example 8-3: Replace the conglycinin subunit gene clusters with three copies of a high-methionine GY1 variant (GY1 ALT4) on native conglycinin loci.
[0252] A two-step process was used for this editing approach. First, two gRNAs were used to dropout the conglycinin gene cluster on chromosome 20 (Gm20); in a separate experiment, another two gRNAs were used to dropout the conglycinin gene cluster on chromosome 10 (Gm 10). The T2 homozygous plants with either Gm20 or Gm 10 conglycinin gene cluster dropouts were genetically crossed, and homozygous double dropout lines were identified. In the second step, a new gRNA was designed to the new dropout junction sequences to enable the homology-dependent insertion of multiple copies of high-methionine GY1 variant (GY1 ALT4) into the conglycinin native loci, resulting the replacement of conglycinin proteins with the high-met GY1 variants.
[0253] As shown in Table 8, there are seven P-conglycinin candidates including 3 a, 2 a’ and 2 P isoforms. Except for Glyma.l0g246400 (a) and Glyma.20G146200 (P), all other isoforms show relatively high expression level at 30 or 50 days after flowering (DAF) in soybean seeds. [0254] Four gRNAs were used to delete six of the seven P-conglycinin isoforms. The GM- CONG-CR1 (SEQ ID NO: 11) and GM-CONG-CR2 (SEQ ID NO: 12) was used to dropout the conglycinin cluster on chromosome 20 (Gm20); the GM-CONG-CR3 (SEQ ID NO: 13) and GM-CONG-CR4 (SEQ ID NO: 14) were used to dropout the conglycinin cluster on chromosome 10 (Gm 10). T2 homozygous seed from the conglycinin Gm 10 locus dropout experiment was generated. Seed protein analyses was conducted by SDS-PAGE Coomassie Blue gel staining analyses. No alpha’ subunits of conglycinin proteins can be detected in those T2 homozygous seeds from the Gm 10 locus dropout variants, demonstrating complete removal of the conglycinin alpha’ subunit proteins in soybean seeds, in agreement with the complete removal of their genes from soybean genome. For the second editing experiment, the T2 seeds from the Gm20 locus dropout were analyzed by protein gel analyses. Our results indicated the conglycinin alpha subunit proteins had been completely removed in soybean seeds of the homozygous dropout plants. Genetic crosses between the GmlO conglycinin dropout variant and the Gm20 conglycinin dropout variant were conducted, double homozygous dropout plants in both GmlO and Gm20 loci were identified. Seed protein analyses was completed by SDS-PAGE Coomassie Blue gel staining analyses, showing complete conglycinin knockout in soybean seeds.
[0255] This double conglycinin dropout line was used for the insertion of multiple copies of the high-methionine GY1 variant (GY1 ALT4) into the conglycinin native locus by a template-based genome editing technology. GM-CONG-CR12 (SEQ ID NO: 15) was designed to cleave the new junction in the GmlO dropout line; and GM-CONG-CR13 (SEQ ID NO: 16) was designed for the Gm20 dropout line. The donor DNA (SEQ ID NO: 17) in the binary vector contained three copies of the high-methionine GY1 variant (GY1 ALT4) under the control of native beta-conglycinin alpha’ promoter and terminator, flanked by homology recombination (HR) fragments to facilitate the homology-mediated recombination process. The GM-CONG-CR12 gRNA target sites were also used to flank our donor DNA to enable them to be excised from the binary vectors for double strand break repair process. The binary vectors are introduced into soybean plants by Agrobacterium-mediated soybean embryonic axis transformation. With site-specific integration of the donor DNA by homology-mediated double strand break DNA repair process, genome editing variants of three copies of GY1 ALT4 under the controls of native conglycinin promoter and terminator are created at the native conglycinin GmlO locus. A similar approach can be used for inserting one or multiple copies of high-methionine glycininl variants at the Gm20 dropout locus with the GM-CONG-CR13 site. TO plants are generated and molecularly analyzed to identify gene replacement with homology-dependent repair at both sites (2xHDR). Plants are grown in the greenhouse. T1 seeds are harvested, and T1 planting is conducted to get homozygous T2 seeds with the high-methionine GY1 variant at the native conglycinin GmlO locus or Gm20 locus. [0256] The T2 seed were generated with the high-methionine glycinin variant GY1 ALT4 at the native conglycinin Gm 10 locus and analyzed for total methionine, total cysteine, and total protein content. This edit contains three copies (3x) GY1 ALT4 replacing the three conglycinins on chromosome 10. There was a significant increase in total methionine content reaching an average of 1.16% met (dry weight-basis), which was in 75% increase compared to the WT (Table 11). The highest line had 1.20% met (dry weight-basis). Since this triple gene replacement was not in the CGS edited background, as in Example 8-1, there was a significant decrease in total cysteine. Stacking, either by crossing or re-transforming, this GY1_ALT4 3x with the CGS edited line should restore cysteine levels and further increase methionine. There was no change to total protein, however oil was decreased in the GY1_ALT4 3x gene replacement.
Table 11 : Effects of replacing conglycinins with GY1 ALT4 3x on soybean seed sulfur amino acid and protein contents from T2 seed of plants grown in the greenhouse n = five biological replicates for GY1 ALT4 3x in conglycinin rebalanced background edit n = two biological replicates for the other genotypes - values were determined by two-tailed t-tests assuming equal variances EXAMPLE 9
[0257] This example demonstrates the insertion of short sequence motifs to increase CGS1 expression.
[0258] Driving GM-CGS1 (78 aa del) transgene expression with the strong constitutive GmUBQ promoter was described in Examples 5 and 6. As an alternative to using a strong constitutive promoter, the efficacy of inserting two soybean expression modulating elements (EMEs) for increasing expression of GM-CGS1 (78 aa del; SEQ ID NO: 231) was assessed. EMEI (SEQ ID NO: 238) was comprised two copies of a 20 bp sequence separated by a 10 bp linker. EME2 (SEQ ID NO: 239) comprises two copies of a different 20 bp sequence separated by the 10 bp linker. EMEI was inserted 19 bp upstream of the TATA box in the GM-CGS1 promoter (SEQ ID NO: 240) to make the GM-CGS1 (EMEI) promoter (SEQ ID NO: 241). Likewise, EME2 was inserted 19 bp upstream of the TATA box in the GM-CGS1 promoter to make the GM-CGS1 (EME2) promoter (SEQ ID NO: 242). The effect of the GM-CGS1, GM- CGS1 (EMEI), GM-CGS1 (EME2) and GmUBQ promoters on GM-CGS1 (78 aa del) expression is presented in Table 12. The non-soybean phaseolin terminator was used for these four constructs, thus facilitating the use of a qRT-PCR assay with primers in the phaseolin terminator region to assess only transgene expression, without interference from native expression. Expression in leaves of T3 plants grown in the field in short-row plots is summarized in Table 12. Three leaf punches, one each on three different plants in the row, were averaged to get the row value. Expression in null leaves was below the detection limit as expected with this assay. Transgene expression in leaves was much greater with the GM- CGS1 (EMEI) and GmUBQ promoters compared with the GM-CGS1 and GM-CGS1 (EME2) promoters, which was similar as greenhouse grown plants. The GmUBQ promoter was environmentally variable, with approximately 10-fold greater expression in leaves in the field compared with greenhouse. These results demonstrated that the EMEI insertion into the GM- CGS1 promoter, but not the EME2 insertion, was highly effective in increasing expression in leaves. For the two strongly expressing promoters, expression in developing seed at 35 days after pollination was also examined. Both the GM-CGS1 (EMEI) and GmUBQ promoters were effective in driving transgene expression in developing seeds, with the GmUBQ promoter achieving the greatest expression levels (Table 12). In hypocotyls, the GmUBQ promoter provided the greatest level of expression. In roots, both GmCGS (EMEI) and GmUBQ promoters had higher expression than the other promoters.
Table 12: Effect of promoter on GM-CGS (78 aa del) transgene expression in soybean leaves and seeds
[0259] Seed Composition was determined from mature seed (Table 13 A). Free methionine was significantly increased when using the Gm-CGSl (EMEI) and GmUBQ promoters. Total methionine and total cysteine were also significantly increased in the seed when using the GmUBQ promoter, but not as greatly as free methionine (Table 13 A). Protein was slightly increased and oil was slightly decreased (Table 13B). RNAseq analysis of developing seed was performed and showed increases in three-cys-rich Bowman-Birk protease inhibitors (BBI) and two methionine-rich 2S albumin transcripts when overexpressing CGS with the GmQBQ promoter, suggesting that the increase in total cysteine and total methionine are from these genes. Since BBI is a trypsin and chymotrypsin inhibitor, trypsin-agarose column chromatography was used to determine that the increased protein (separated at a similar size to BBI) bound trypsin. Protease inhibition assays with both trypsin and chymotrypsin also confirmed the increases in BBI protein in the GmUBQ:CGS overexpressed seed. These in vitro assays showed significantly increased trypsin inhibition (decreased trypsin activity) and some increase in chymotrypsin inhibition in the presence of protein from GmUBQ:CGS seed. Protein mass spectrometry was performed on bands cut out from an SDS-PAGE gel that was loaded with soluble protein from GmUBQ:CGS lines and results confirmed increases in BBI and 2S albumin peptides, corroborating RNAseq results. Additionally, CGS was overexpressed without incurring a yield penalty in the soybean seeds. Overexpressing CGS (which uses cysteine as a substrate) increased free methionine more than the increases in total methionine and total cysteine content, suggesting a lack of native methionine-rich soybean proteins that can incorporate the additional free methionine into storage proteins. These results indicate that expressing a high-methionine glycinin variant and rebalancing the storage proteins to the high methionine variant(s) with CGS expression would further increase total methionine content.
Table 13A: Effects of ectopic expression of GmCGSl del on sulfur amino acid contents of mature soybean seed.
DW denotes dry weight, n = nine biological replicates for null and three biological replicates for each transgenic event, except n = two for event e30 of the GmCGS2 (EMEI) promoter construct Standard errors are given in parentheses
Statistically significant differences from null values were determined by two-tailed t-tests. Asterisks denote p values < 0.01.
Table 13B. Effects of ectopic expression of GmCGSl del on protein and oil contents of mature soybean seed
DW denotes dry weight, n = nine biological replicates for null and three biological replicates for each transgenic event, except n = two for event e30 of the GmCGS2 (EMEI) promoter construct Standard errors are given in parentheses
Statistically significant differences from null values were determined by two-tailed t-tests Asterisks denote p values < 0.01. EXAMPLE 10
[0260] This example demonstrates the impact of expressing high-methionine glycinin variant Gyl-ALT4 on soybean seed methionine and cysteine content.
[0261] Transgenic plants expressing the Gyl_ALT4 variant (Construct 1) were grown in the field in short rows. After harvest, seeds were analyzed for total methionine, total cysteine, and total protein content. The results are presented in Table 14. All five events showed increases in total methionine compared to the null, with four events having a statistically significant increase in total methionine content when analyzed with two-tailed T-tests, assuming equal variances and using a cutoff for the p value of 0.05. Total cysteine content was decreased significantly in three events. Total protein content remained relatively unchanged, with three events increased (two significantly) and two events slightly decreased.
Table 14: Effects of Gyl ALT4 transgene expression on soybean seed sulfur amino acid and protein contents from T2 seed of plants grown in the field n = five biological replicates for null and three biological replicates for transgenic events except n = two for e 12. values were determined by two-tailed t-tests assuming equal variances
[0262] Transcripts of the endogenous Gyl gene and Gyl_ALT4 transgene were quantified by Next-Gen amplicon sequencing of RT-PCR products. In brief, mRNA were extracted from immature seed of 35 DAF and 45 DAF sampled from soybean plants grown in the field. RT-PCR assays were performed to generate amplicons of 173 bp in length using a pair of primers that target the conserved region of Gyl, Gy2, Gy3, and Gyl_ALT4). Secondary PCR amplifications were performed to incorporate sample specific indices and Illumina specific sequences. Sequencing was carried out on a MiSeq (Illumina). Sequencing adapters and low-quality data were removed using the Cutdapt tool (version 1.12). For the alignment step, BWA-MEM was employed to align reads to the transgene or the endogenous genes. SAM tools were used to count reads. Relative expression levels are presented as percentage.
[0263] Among the three endogenous glycinin genes, Gyl had the highest expression in 35 ADF seed, accounting for 56.3% of the transcripts in WT (Table 15). In the Gyl_ALT4 transgenic plant, the expression levels of Gy2 and Gy3 were not significantly different from those in WT. However, the endogenous Gyl expression was significantly lower than WT. The transgenic Gyl_ALT4 transcripts account for 37.0% of the sequenced transcripts. Together, the two copies of the Gyl promoter in transgenic plants, one endogenous copy and one transgenic copy, contributed 57.2% of the transcripts, similar to that resulted from the single endogenous copy in WT and null segregants. In 45 DAF seed, the Gyl transcripts reached 68.7% in WT while the transgenic plant had 71.8%, of which 53.1% was derived from the endogenous Gyl and 18.7% from the transgene Gyl_ALT4. Although the expression levels of both Gy2 and Gy3 were lower than that in 35 DAF, there was no significant difference between WT and the transgenic plants, the same observation as in 35 DAF (Table 15). These results indicate that Gyl transcription in WT may have reached the maximal level in soybean developing seed and adding additional copies of the Gyl promoter would not increase the quantity of transcripts. This could be a result of competing transcriptional factors between the endogenous and the transgenic Gyl promoters. By replacing the endogenous Gyl gene with the engineered Gyl_ALT4, the protein-bound methionine content in such a variant is expected to be higher than that in the conventional transgenic events.
Table 15: Transcript expression of Gyl ALT 4 gene in immature soybean seed
Data are means of transcript percentages derived from sequenced amplicons
DAF, days after flowering. WT, wildtype. Null-seg, null segregant n = 9-10. ± stdev
[0264] The T2 seed were generated from gene editing of Gyl-ALT4 described in Example 8-1 in the MGL3 frameshift edited background and analyzed for total methionine, total cysteine, and total protein content. Total methionine in the Gyl-ALT4 seed was significantly increased as compared to the WT seed when analyzed with two-tailed T-tests, assuming equal variances and using a cutoff for the p value of 0.05. This replacement of the native GY1 gene with the engineered Gyl_ALT4 gave a 28% increase in total methionine (Table 16), which is greater than the total methionine increase observed with the transgenic overexpression of Gyl_ALT4 (Tables 5, 6 and 14) that had a maximum of 11% increase in methionine (Table 14). Total cysteine content was significantly decreased, and protein was significantly increased in the Gyl_ALT4 edit + MGL3 edit, compared to the WT (Table 16).
Table 16: Effects of replacing GY1 with Gyl ALT4 on soybean seed sulfur amino acid and protein contents from T2 seed of plants grown in the greenhouse. n = three biological replicates. ^-values were determined by two-tailed t-tests assuming equal variances
[0265] To visualize the increase in protein-bound methionine, non-reducing SDS-PAGE and immunoblots were performed on the seed from the Gyl-ALT4 transgenic and gene edited lines (Figs. 8A-8D). For the non-reducing SDS-PAGE, 30 pg of total soluble protein extracted from the seeds were loaded into each lane. For the transgenic expression of Glyl_ALT4, band intensity increases were observed for the Gyl_ALT4 acidic and basic chains disulfide-bonded together (arrow 2) and the Gyl_ALT4 acidic chain only (arrow 4; Figs. 8A-8D). These increases were confirmed in the immunoblot using an anti-glycinin group 1 acidic chain peptide antibody with 1.5 pg protein loaded. The edited Gyl_ALT4 gene replacement also showed band increases in both Gyl_ALT4 acidic and basic chains disulfide-bonded together (arrow 2) and the Gyl_ALT4 acidic chain only (arrow 4; Figs. 8A-8D). For the gene replacement Gyl_ALT4 edit, a decrease was observed for the wild-type glycinin family with the acidic and basic chains disulfide-bonded together (arrow 1) and the wild-type glycinins acidic chain only (arrow 3) compared to WT.
EXAMPLE 11
[0266] This example demonstrates the impact of transgenic expression of additional glycinin variants on soybean seed methionine and cysteine content of T2 seed from plants grown in the greenhouse.
[0267] Additional glycinin variants were transgenically expressed in soybean, including constructs 7 through 17 described in Table 4. The T2 seed from greenhouse grown plants were analyzed for total methionine, total cysteine, protein, and oil. Significant increases in methionine were observed in events for Gyl variants Gyl_ALT4_40, GY1_AI_4, Gyl_AI_5, Gyl_AI_7, _Gyl_AI_9, Gy_AI_14, Gyl_ALT4_47+CGS, Gyl_AI_4+CGS, and Gyl_AI_14+CGS (Table 17A). Significant increases in cysteine were observed in events containing both the CGS edit and the high methionine Gyl variant, including Gyl_ALT4_47+CGS, Gyl_AI_4+CGS, and Gyl_AI_14+CGS (Table 17A). There were significant decreases in cysteine in events for Gyl_ALT4_39 and Gyl_AI_4 (Table 17A). Protein and oil remained relatively unchanged with only a few variants having one out of three events significantly different from the null; the exception of Gyl_AI_9 had two out of three events significantly increased in protein content (Table 17B).
Table 17A: Effects of expressing Gyl high-methionine variants on T2 soybean seed methionine and cysteine content of T2 seed from plants grown in the greenhouse
/^-values were determined by two-tailed t-tests assuming equal variances
Table 17B: Effects of expressing Gyl high-methionine variants on T2 soybean seed protein and oil content of T2 seed from plants grown in the greenhouse n = 11 biological replicates for null and three biological replicates for transgenic events. /?-values were determined by two-tailed t-tests assuming equal variances
[0268] To visualize the increases in protein-bound methionine, non-reducing SDS-PAGE and immunoblots were performed on the seed from the transgenic Gyl high-methionine variants. The Gyl variant proteins were observed for Gyl_AI_4+CGS, Gyl_AI_14+CGS, Gyl_ALT4_40, Gyl_ALT4_39, Gyl_AI_4, Gyl_AI_5, GylAI_7, GylAI_14, and Gyl_AI_9 (Figs. 9A-9F).
EXAMPLE 12
[0269] This example demonstrates increasing total methionine content in soybean.
[0270] Soybean lines containing three copies of the GY1_ALT4 glycinin variant in the double conglycinin dropout background, described in Example 8-3, are further modified to have increased expression or activity of a CGS1, CGS2, or both a CGS1 and CGS2 polypeptide (e g., removing CGS regulatory domain by genome editing) in order to generate plants having increased expression or activity of at least one CGS polypeptide, decreased expression of beta- conglycinin, and expressing a glycinin variant. The modified CGS is introduced by genome editing, a transgene containing CGS or a variant, or by introgression. Seeds produced from the plants have increased total methionine and total cysteine content as compared to seeds from wild-type plants or plants containing three copies of the GY1 ALT4 glycinin variant in the double conglycinin dropout background. Similar results (e.g., increased total methionine and total cysteine) are expected using any combination of the high methionine glycinin variants listed in Table 2 or 17A including GY1 ALT4, in the 3x configuration on chromosome 10 (described in Example 8-3) to replace the conglycinin genes. Additionally, the conglycinins on chromosome 20, removed as described in Example 8-3, can also be replaced with one or more copies of a high methionine glycinin variant to further increase total methionine content. [0271] Plants having increased expression or activity of at least one CGS polypeptide (e.g., removing CGS regulatory domain by genome editing), decreased expression of beta- conglycinin, and expressing a high-methionine glycinin variant are further modified to have decreased expression, activity, and/or stability of an endogenous MFT polypeptide, and/or decreased expression, activity, and/or stability of an endogenous SWT polypeptide to generate a CGS, glycinin, conglycinin, MFT, and/or SWT modified plant. The modifications for each can independently be introduced by genome editing, transgenes, or introgression. Seeds from the CGS, glycinin, conglycinin, MFT, and/or SWT modified plants have increased total methionine content as compared to wild-type plants or plants having increased expression of a polynucleotide encoding at least one CGS, decreased expression of beta-conglycinin, and expressing a high-methionine glycinin variant. The CGS, glycinin, conglycinin, MFT, and/or SWT modified plants are further modified to introduce by genome editing, a transgene, or introgression a modification decreasing RFO content such as a genome editing or mutating a polynucleotide encoding a raffinose synthase to reduce or eliminate its expression. Seeds with decreased RFO content have increased protein and high-methionine on a dry weight basis as compared to the wild-type plants.
EXAMPLE 13
[0272] This example demonstrates increasing total methionine content in soybean.
[0273] Soybean plants are generated having decreased expression of beta-conglycinin and containing a high-methionine glycinin variant described herein such as GY1 ALT4 47 or GY1 Al 4 to generate high-methionine glycinin variant plants with decreased beta- conglycinin expression. The modification decreasing beta-conglycinin is by a gene edited knockout of one or more beta-conglycinin isoforms, introduction of an inverted repeat, or by a beta-conglycinin dominate hairpinRNAi. The high-methionine glycinin variant is introduced into one or more of the native glycinin gene loci, as described in Example 7. The plants are generated by genome editing, introducing transgenes, or by crossing plants containing one or more of the modifications. Seeds from the generated plants have increased total methionine as compared to wild-type plants.
[0274] The plants are further modified to have increased expression or activity of a CGS1, CGS2, or both a CGS1 and CGS2 polypeptide (e.g., removing CGS regulatory domain by genome editing) to generate plants having decreased expression of beta-conglycinin, expressing a high-methionine glycinin and increased expression or activity of at least one CGS polypeptide. The modified CGS is introduced by genome editing, a transgene containing CGS or a variant, or by introgression. Seeds produced from the plants have increased total methionine and total cysteine content as compared to seeds from wild-type plants or plants containing a high-methionine variant and decreased expression of beta-conglycinin.
[0275] Plants having increased expression or activity of at least one CGS polypeptide (e.g., removing CGS regulatory domain by genome editing), decreased expression of beta- conglycinin, and expressing a high-methionine glycinin variant are further modified to have decreased expression, activity, and/or stability of an endogenous MFT polypeptide, and/or decreased expression, activity, and/or stability of an endogenous SWT polypeptide to generate a CGS, glycinin, conglycinin, MFT, and/or SWT modified plant. The modifications for each can independently be introduced by genome editing, transgenes, or introgression. Seeds from the CGS, glycinin, conglycinin, MFT, and/or SWT modified plants have increased total methionine content (predicted to be at, or around, 3% on a dry weight basis) as compared to wild-type plants or plants having increased expression of a polynucleotide encoding at least one CGS, decreased expression of beta-conglycinin, and expressing a high-methionine glycinin variant. The CGS, glycinin, conglycinin, MFT, and/or SWT modified plants are further modified to introduce by genome editing, a transgene, or introgression a modification decreasing RFO content such as a genome editing or mutating a polynucleotide encoding a raffinose synthase to reduce or eliminate its expression. Seeds with decreased RFO content have increased protein and high-methionine on a dry weight basis as compared to the wild- type plants.
[0276] All publications and patent applications in this specification are indicative of the level of ordinary skill in the art to which this invention pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated by reference.
[0277] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Unless mentioned otherwise, the techniques employed or contemplated herein are standard methodologies well known to one of ordinary skill in the art. The materials, methods and examples are illustrative only and not limiting.
[0278] Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
[0279] Units, prefixes and symbols may be denoted in their SI accepted form. Unless otherwise indicated, nucleic acids are written left to right in 5’ to 3’ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. Numeric ranges are inclusive of the numbers defining the range. Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.

Claims

We claim:
1. A plant or seed comprising a modified polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151 , 153, 154, 155, 156, 158, 160, 162, 164, 169, 171, 172, 173, 174, 176, 177, 180, 184
192, 196, 209, 211, 215, 223, 225, 228, 229, 230, 235, 246, 247, 249, 253, 255, 257 258
269, 270, 277, 279, 302, 303, 308, 310, 313, 316, 321, 322, 323, 324, 326, 328 335 338
339, 341, 343, 349, 351, 354, 356, 357, 359, 361, 363, 364, 365, 371, 372 373 376 382 383
385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421,
423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
2. The plant or seed of claim 1, wherein at least one of the 10 or more modifications comprises a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 51, 53, 61, 66, 70, 71, 80, 145, 162, 164, 172, 209, 313, 335, 341 , 343, 351, 354, 356, 361, 412, 414, 416, 436, 442, 462, and 468.
3. A plant or seed comprising a modified polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 4 and comprising at least one modification, the at least one modification comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 51, 61, 66, 70, 145, 162, 164, 172, 313, 341, 354, 356, 361, 412, 414, 416, and 462.
4. The plant or seed of claim 3, further comprising at least one additional modification, the at least one additional modification comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 53, 71, 80, 209, 335, 343, 351, 436, 442, and 468.
5. A plant or seed comprising a modified polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 4 and comprising at least one modification, the at least one modification comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 53, 71, 80, 209, 335, 343, 351, 436, 442, and 468.
6. The plant or seed of any one of claims 1-5, wherein the modified glycinin polypeptide comprises an amino acid sequence that is at least 95% identical to amino acid positions 20- 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at 5 or more positions corresponding to amino acid position 51 , 61, 66, 70, 145, 162, 164, 172, 313, 341, 354, 356, 361, 412, 414, 416, and 462 of SEQ ID NO: 4.
7. The plant or seed of any one of claims 1-6, wherein the modified glycinin polypeptide comprises an AlphaFold2 predicted structure having a TM-score of at least 0.80 as compared to the AlphaFold2 predicted structure of SEQ ID NO: 4 or amino acid positions 20-495 of any one of SEQ ID NOs: 18-86.
8. A plant or seed comprising a polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising at least 10 methionine residues and an AlphaFold2 predicted structure having a TM-score of at least 0.80 as compared to the AlphaFold2 predicted structure of SEQ ID NO: 4.
9. The plant or seed of claim 8, wherein the modified glycinin polypeptide comprises an amino acid sequence that is at least 50% identical to SEQ ID NO: 4.
10. The plant or seed of any one of claims 1-9, wherein the modified glycinin protein is inserted into an endogenous glycinin gene locus.
11. The plant or seed of any one of claims 1-10, wherein the plant or seed further comprises a modification decreasing the expression of beta-conglycinin, a modification decreasing the expression and/or activity of methionine gamma-lyase (MGL), a modification increasing the expression of a polynucleotide encoding at least one cystathionine-gamma-synthase (CGS) (e g. CGS-1 and/or CGS-2), a modification decreasing the expression and/or activity of lysine-ketoglutarate reductase /Saccharopine dehydrogenase (LKR/SDH), a modification increasing the activity of dihydrodipicolinate synthase (DHPS), a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 1 (BS1) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 2 (BS2) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Sugars Will Eventually be Exported Transporter (SWT) polypeptide, a modification increasing expression or activity of an ABI3 polypeptide, a modification increasing expression or activity of an ODP1 polypeptide, (vii) a modification decreasing expression, activity, and/or stability of a Kix8-l polypeptide, a modification decreasing raffinose family oligosaccharides (RFO) content, or any combination thereof.
12. The plant or seed of claim 11, wherein the modification decreasing the expression and/or activity of MGL comprises a knockout of the MGL gene, wherein the MGL gene encodes an MGL protein comprising an amino acid sequence that is at least 90% identical to any of SEQ ID NOs: 182-184.
13. The plant or seed of claim 11 or 12, wherein the modification increasing the activity of the CGS gene comprises a modification that removes a self-regulatory domain of a CGS gene, wherein the self-regulatory domain of the CGS gene encodes a polypeptide comprising an amino acid sequence that is at least 90% identical to SEQ ID NO: 192.
14. The plant or seed of claim 13, wherein the modified CGS gene encodes a CGS protein comprising an amino acid sequence that is at least 90% identical to any one of SEQ ID NOs: 195-197 or 230.
15. The plant or seed of any one of claims 1-14, wherein the seed or a seed of the plant comprises at least 1% total methionine as measured on a dry weight of seed basis.
16. The plant or seed of any one of claims 1-15, wherein the amount of modified glycinin polypeptide in the seed or a seed of the plant is at least 90% of the amount of a comparable non-modified glycinin polypeptide in a corresponding control seed.
17. The plant or seed of any one of claims 1 -16, wherein the seed or a seed of the plant comprises at least a 5% increase in the amount of total methionine on a dry weight basis as compared to a control seed not comprising the modified glycinin polypeptide.
18. The plant or seed of any one of claims 1-17, wherein the yield of the plant measured by seed weight at 13% moisture is at least 90% of the yield of a control plant not comprising the modified glycinin polypeptide.
19. The plant or seed of any one of claims 1-18, wherein the plant or seed is a soybean plant or soybean seed.
20. A plant produced by the seed of any one of claims 1-19, wherein the plant comprises the polynucleotide encoding the modified glycinin polypeptide.
21. A seed produced by the plant of any one of claims 1-19, wherein the seed comprises the polynucleotide encoding the modified glycinin polypeptide.
22. A method of plant breeding, the method comprising crossing the plant of any one of claims 1-21 with a second soybean plant to produce progeny seed.
23. A method of producing a plant producing seed having increased methionine content, the method comprising: a. introducing into a regenerable plant cell a polynucleotide encoding modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154, 155, 156, 158, 160, 162, 164,
169, 171, 172, 173, 174, 176, 177, 180, 184 192, 196, 209, 211, 215, 223, 225, 228
229, 230, 235, 246, 247, 249, 253, 255, 257 258 269, 270, 277, 279, 302, 303, 308
310, 313, 316, 321, 322, 323, 324, 326, 328 335 338 339, 341, 343, 349, 351, 354
356, 357, 359, 361, 363, 364, 365, 371, 372 373 376 382 383 385, 391, 393, 396 399, 401, 402, 403, 405, 406, 409, 410, 411 , 412, 414, 416, 420, 421, 423, 425, 430,
434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468; and b. generating the plant, wherein the plant comprises the polynucleotide encoding the modified glycinin polypeptide and produces a seed having an increased amount of methionine as compared to seed of a plant not comprising the modified glycinin polypeptide.
24. The method of claim 23, wherein at least one of the 10 or more modifications comprises a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 51, 53, 61, 66, 70, 71, 80, 145, 162, 164, 172, 209, 313, 335, 341, 343, 351, 354, 356, 361, 412, 414, 416, 436, 442, 462, and 468.
25. The method of claim 23 or 24, wherein the polynucleotide is introduced into the regenerable plant cell using a nucleic acid construct comprising the polynucleotide operably linked to a regulatory element.
26. The method of claim 23 or 24, wherein the polynucleotide is introduced into the regenerable plant cell by a targeted genetic modification.
27. The method of claim 26, wherein the targeted genetic modification is introduced using an enzyme selected from the group consisting of a polynucleotide-guided endonuclease, CRISPR-Cas endonucleases, base editing deaminases, a zinc finger nuclease, a transcription activator-like effector nuclease (TALEN) and engineered site-specific endonucleases.
28. The method of any one of claims 23-27, wherein the modified glycinin polypeptide comprises an amino acid sequence that is at least 95% identical to amino acid positions 20- 495 of any one of SEQ ID NOs: 18-86 and comprises a methionine at 5 or more positions corresponding to amino acid position 51, 61, 66, 70, 145, 162, 164, 172, 313, 341, 354, 356, 361, 412, 414, 416, and 462 of SEQ ID NO: 4.
29. The method of any one of claims 23-28, wherein the method further comprises introducing into the regenerable plant cell or the generated plant a modification decreasing the expression of beta-conglycinin, a modification decreasing the expression and/or activity of methionine gamma-lyase (MGL), a modification increasing the expression of a polynucleotide encoding at least one cystathionine-gamma-synthase (CGS) (e.g. CGS-1 and/or CGS-2), a modification decreasing the expression and/or activity of lysine-ketoglutarate reductase /Saccharopine dehydrogenase (LKR/SDH), a modification increasing the activity of dihydrodipicolinate synthase (DHPS), a modification decreasing expression, activity, and/or stability of an endogenous MFT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous CCT polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 1 (BS1) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Big Seed 2 (BS2) polypeptide, a modification decreasing expression, activity, and/or stability of an endogenous Sugars Will Eventually be Exported Transporter (SWT) polypeptide, a modification increasing expression or activity of an AB 13 polypeptide, a modification increasing expression or activity of an 0DP1 polypeptide, (vii) a modification decreasing expression, activity, and/or stability of a Kix8-1 polypeptide, a modification decreasing raffinose family oligosaccharides (RFO) content, or any combination thereof.
30. The method of claim 29, wherein the modification decreasing the expression and/or activity of MGL comprises a knockout of the MGL gene, wherein the MGL gene encodes an MGL protein comprising an amino acid sequence that is at least 90% identical to any of SEQ ID NOs: 182-184.
31. The method of claim 29 or 30, wherein the modification increasing the activity of the CGS gene comprises a modification that removes a self-regulatory domain of a CGS gene, wherein the self-regulatory domain of the CGS gene encodes a polypeptide comprising an amino acid sequence that is at least 90% identical to SEQ ID NO: 192.
32. The method of any one of claims 23-31, wherein expression of the modified glycinin polypeptide in the generated plant or seed thereof is at least 90% of expression of a nonmodified glycinin polypeptide in a corresponding control plant.
33. The method of any one of claims 23-32, wherein the total protein in the seed of the generated plant is at least 90% of the total protein of a seed from a control plant not comprising the modified glycinin polypeptide.
34. The method of any one of claims 23-33, wherein the yield of the generated plant is at least 90% of the yield of a control plant not comprising the modified glycinin polypeptide.
35. The method of any one of claims 23-34, wherein a seed of the generated plant comprises at least 2% total methionine measured on a dry weight of seed basis.
36. The method of any one of claims 23-35, wherein the plant or seed is a soybean plant or soybean seed.
37. A method for generating high-methionine seed storage protein variants, the method comprising: a. generating an in silica population of high-methionine seed storage protein variants by inputting the 3D structural coordinates and/or amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage polypeptide 3D structure and/or sequential information; b. calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silica population; and c. selecting from the in silica population one or more candidate high-methionine seed storage polypeptide variants, the one or more selected candidate high-methionine seed storage polypeptide variants having i. a predicted solubility score that is at least 80% of a predicted solubility score for the candidate seed storage protein, ii. a predicted stability score that is at least 80% of a predicted stability score for the candidate seed storage polypeptide, iii. a predicted aggregation propensity score less than 50% of an aggregation propensity score for the candidate seed storage protein, or iv. any combination thereof.
38. The method of claim 37, wherein the candidate seed storage polypeptide is a glycinin polypeptide.
39. The method of claim 38, wherein the glycinin polypeptide comprises an amino acid sequence that is at least 95% identical to any one of SEQ ID NOs: 4 and 18-86.
40. The method of any one of claims 37-39, wherein the 3D structural coordinates of the candidate seed storage polypeptide is a predicted 3D structure.
41. The method of claim 40, wherein the predicted 3D structure is generated using AlphaFold2.
42. The method of any one of claims 37-41, wherein the method further comprises d. expressing one or more of the selected high-methionine seed storage polypeptide variants in a model organism; e. determining the solubility, the stability, or a combination thereof of the one or more of the selected high-methionine seed storage polypeptide variants; and f. selecting high-methionine seed storage polypeptide variants having a solubility score, a stability score, or both, in the model organism that is at least 80% of a solubility score or stability score for the candidate seed storage polypeptide.
43. The method of claim 42, wherein the method further comprises expressing a high-methionine variant selected in (f) in a soybean plant.
44. The method of claim 42 or 43, wherein the model organism is E. coli.
45. A method for increasing seed methionine content in a plant, the method comprising expressing in a plant one or more of the high-methionine seed storage protein variants selected in any one of claims 42-45 in a plant.
46. A polynucleotide encoding a modified glycinin polypeptide, the modified glycinin polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103, 111,
112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154, 155, 156,
158, 160, 162, 164, 169, 171, 172, 173, 174, 176, 177, 180, 184 192, 196, 209, 211, 215,
223, 225, 228, 229, 230, 235, 246, 247, 249, 253, 255, 257 258 269, 270, 277, 279, 302, 303, 308, 310, 313, 316, 321, 322, 323, 324, 326, 328 335 338 339, 341, 343, 349, 351, 354, 356, 357, 359, 361, 363, 364, 365, 371, 372 373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 411, 412, 414, 416, 420, 421, 423, 425, 430, 434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
47. The polynucleotide of claim 46, wherein at least one of the 10 or more modifications comprises a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 51, 53, 61, 66, 70, 71, 80, 145, 162, 164, 172, 209, 313, 335, 341, 343, 351, 354, 356, 361, 412, 414, 416, 436, 442, 462, and 468
48. A modified glycinin polypeptide comprising an amino acid sequence having at least 90% sequence identity to SEQ ID NO: 4 and comprising 10 or more modifications, the 10 or more modifications comprising a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 11, 20, 21, 23, 24, 26, 28, 33, 34, 35, 40, 43, 48, 49, 51, 52, 53, 59, 61, 64, 66, 70, 71, 75, 80, 81, 84, 85, 95, 97, 98, 99, 100, 103,
111, 112, 113, 115, 117, 121, 129, 131, 135, 143, 145, 147, 149, 150, 151, 153, 154, 155,
156, 158, 160, 162, 164, 169, 171, 172, 173, 174, 176, 177, 180, 184 192, 196, 209, 211,
215, 223, 225, 228, 229, 230, 235, 246, 247, 249, 253, 255, 257 258 269, 270, 277, 279,
302, 303, 308, 310, 313, 316, 321, 322, 323, 324, 326, 328 335 338 339, 341, 343, 349,
351, 354, 356, 357, 359, 361, 363, 364, 365, 371, 372 373 376 382 383 385, 391, 393, 396, 399, 401, 402, 403, 405, 406, 409, 410, 41 1, 412, 414, 416, 420, 421, 423, 425, 430,
434, 436, 442, 443, 454, 456, 461, 462, 463, 464, and 468.
49. The modified glycinin polypeptide of claim 48, wherein at least one of the 10 or more modifications comprises a substitution of a methionine at a position corresponding to SEQ ID NO: 4 selected from the group consisting of position 51, 53, 61, 66, 70, 71, 80, 145, 162, 164, 172, 209, 313, 335, 341, 343, 351, 354, 356, 361, 412, 414, 416, 436, 442, 462, and 468
50. A protein composition produced from the seeds of any one of claims 1-36 and 45.
51. A protein composition produced from the seeds of the plants of any one of claims 1-36 and
45.
52. The protein composition of claim 51 or 52, wherein the composition is an animal feed.
53. The protein composition of claim 51 or 52, wherein the composition is human food.
54. An animal feed comprising the protein composition of claim 50 or 51 .
55. A human food comprising the protein composition of claim 50 or 51.
56. A method of feeding an animal, the method comprising administering a feed comprising the protein composition of claims 50-52 or the animal feed of claim 54 to the animal in a feeding regimen.
57. The method of claim 56, wherein the feeding regimen does not include providing compositions comprising supplementary methionine.
58. The method of claim 56 or 57, wherein the animal is a chicken or a pig.
59. A method for generating high-essential amino acid seed storage protein variants, the method comprising: a. generating an in silico population of high-essential amino acid seed storage protein variants by inputting the 3D structural coordinates and/or amino acid sequence of a candidate seed storage polypeptide into an artificial intelligence model (Al model), the Al model trained to calculate the per-residue probability of an amino acid by using encoded geometrical information of the candidate seed storage polypeptide 3D structure and/or sequential information; b. calculating a predicted solubility score, a predicted stability score, a predicted aggregation propensity score, or any combination thereof for members of the in silico population; and c. selecting from the in silico population one or more candidate high-essential amino acid seed storage polypeptide variants, the one or more selected candidate high- essential amino acid seed storage polypeptide variants having i. a predicted solubility score that is at least 80% of a predicted solubility score for the candidate seed storage protein, ii. a predicted stability score that is at least 80% of a predicted stability score for the candidate seed storage polypeptide, iii. a predicted aggregation propensity score less than 50% of an aggregation propensity score for the candidate seed storage protein, or iv. any combination thereof.
60. The method of claim 59, wherein the high-essential amino acid seed storage protein variant comprises a substation introducing at least one essential amino acid selected from the group consisting of arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine, or any combination thereof.
61. A high-essential amino acid seed storage protein variant produced by the method of claim 59.
PCT/US2025/024242 2024-04-11 2025-04-11 Glycinin variants for improving the nutritional value of soybeans Pending WO2025217495A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463632608P 2024-04-11 2024-04-11
US63/632,608 2024-04-11

Publications (1)

Publication Number Publication Date
WO2025217495A1 true WO2025217495A1 (en) 2025-10-16

Family

ID=97350754

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/024242 Pending WO2025217495A1 (en) 2024-04-11 2025-04-11 Glycinin variants for improving the nutritional value of soybeans

Country Status (1)

Country Link
WO (1) WO2025217495A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050079494A1 (en) * 2003-04-09 2005-04-14 Monsanto Technology Llc Enhanced proteins and methods for their use
US6936696B2 (en) * 2001-09-17 2005-08-30 Monsanto Company Enhanced proteins and methods for their use
US20070157335A1 (en) * 2003-03-07 2007-07-05 Pioneer Hi-Bred International, Inc. Altering Protein Functional Properties Through Terminal Fusions
US20230232763A1 (en) * 2020-04-23 2023-07-27 Pioneer Hi-Bred International, Inc. Soybean with altered seed protein
US20230380373A1 (en) * 2020-10-28 2023-11-30 Pioneer Hi-Bred International, Inc. Leghemoglobin in soybean

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6936696B2 (en) * 2001-09-17 2005-08-30 Monsanto Company Enhanced proteins and methods for their use
US20070157335A1 (en) * 2003-03-07 2007-07-05 Pioneer Hi-Bred International, Inc. Altering Protein Functional Properties Through Terminal Fusions
US20050079494A1 (en) * 2003-04-09 2005-04-14 Monsanto Technology Llc Enhanced proteins and methods for their use
US20230232763A1 (en) * 2020-04-23 2023-07-27 Pioneer Hi-Bred International, Inc. Soybean with altered seed protein
US20230380373A1 (en) * 2020-10-28 2023-11-30 Pioneer Hi-Bred International, Inc. Leghemoglobin in soybean

Similar Documents

Publication Publication Date Title
US20220364107A1 (en) Agronomic trait modification using guide rna/cas endonuclease systems and methods of use
US20220177900A1 (en) Genome modification using guide polynucleotide/cas endonuclease systems and methods of use
Li et al. Editing of an alpha-kafirin gene family increases, digestibility and protein quality in sorghum
AU2021286555B2 (en) Heterozygous CENH3 monocots and methods of use thereof for haploid induction and simultaneous genome editing
US11965168B2 (en) Leghemoglobin in soybean
CN111433363A (en) Plants having increased abiotic stress tolerance and polynucleotides and methods for increasing abiotic stress tolerance in plants
Sheng et al. Advances in Genome Editing Through Haploid Induction Systems
KR20250078994A (en) Increased leaf biomass and nitrogen use efficiency by NTP2 regulation
US20230232763A1 (en) Soybean with altered seed protein
WO2025217495A1 (en) Glycinin variants for improving the nutritional value of soybeans
WO2025049884A1 (en) High protein legumes with enhanced essential amino acids
WO2025049913A1 (en) Modified seed oil content in soybean
US20250011801A1 (en) Promoter elements for improved polynucleotide expression in plants
EP4438726A2 (en) Compositions and methods comprising plants with increased seed amino acid content
WO2024026348A1 (en) Production of leghemoglobin in plants
CN117858952A (en) How to edit banana genes
OZIAS-AKINS et al. Breeding Versus Bioengineering of Hypoallergenic Peanuts
Morton Biochemical and proteomic profiling of maize endosperm texture and protein quality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25787301

Country of ref document: EP

Kind code of ref document: A1